Information technology — Artificial intelligence for multimedia — Part 1: Vision and scenarios

Technologies de l'information — Intelligence artificielle pour le multimédia — Partie 1: Vision et scénarios

General Information

Status: Not Published

ICS: 35.040.40 - Coding of audio, video, multimedia and hypermedia information
: 35.240.01 - Application of information technology in general

Technical Committee: ISO/IEC JTC 1/SC 29 - Coding of audio, picture, multimedia and hypermedia information
Drafting Committee: ISO/IEC JTC 1/SC 29 - Coding of audio, picture, multimedia and hypermedia information

Current Stage: 6000 - International Standard under publication
Start Date: 21-Apr-2026
Completion Date: 25-Apr-2026

Overview

ISO/IEC TR 23888-1: Information technology - Artificial intelligence for multimedia - Part 1: Vision and scenarios provides a foundational perspective on the role of artificial intelligence (AI) and neural network (NN) technologies in multimedia processing and coding. This technical report is the first part of the ISO/IEC 23888 (commonly known as MPEG-AI) suite, addressing industry needs for standardized approaches as AI methods become central in multimedia standards.

The document outlines the current landscape, highlights multimedia scenarios where AI methods are transformative, identifies technical challenges, and sets the direction for future standardization activities. By clarifying terminology and key assumptions, ISO/IEC TR 23888-1 supports industry, researchers, and developers in aligning with best practices for AI-enabled multimedia technologies.

Key Topics

Vision for AI in Multimedia:
AI and NN-based methods are increasingly integrated in content encoding, feature extraction, and media analysis. MPEG-AI acts as an umbrella for AI-focused standardization in multimedia processing.
Interaction Modalities:
- AI as a multimedia coding tool: Use of neural networks for video compression, content representation, and content descriptors.
- Multimedia for AI consumption: Multimedia representations optimized for use by AI systems, including model compression and feature extraction frameworks.
Technical Working Assumptions:
- Interoperability: Emphasis on standardized bitstream specifications and model formats (e.g., ONNX, NNEF) to enable seamless deployment of AI models across platforms.
- Reproducibility: Recommendations for reproducible experimentation, including thorough documentation, use of seed values, data versioning, containerization, and experiment logging.
- Bit-exactness: Discussion on challenges of achieving bit-exact neural network reproducibility across hardware and software environments, and strategies for consistency.
Performance Considerations:
- Metrics like complexity (e.g., multiply-accumulate operations per sample), model size, and efficiency to balance quality and computational demands, crucial for resource-constrained devices.

Applications

AI for multimedia has broad applications in the digital ecosystem, supporting new services and improving existing ones:

AI-based Video Coding: Leveraging neural networks to enhance compression efficiency and quality for streaming or storage, vital for content delivery networks and mobile devices.
3D Graphics and Scene Representation: Enhancing the realism and interactivity of virtual reality, augmented reality, and metaverse environments.
AI Model Compression: Reducing the size and computational demands of neural network models to enable deployment in edge and IoT devices such as smart cameras and sensors.
Feature Extraction and Coding: Enabling efficient feature representation for downstream AI tasks such as automated recognition, recommendation, and surveillance systems.
Distributed AI Processing: Standardizing descriptions of media for distributed, cloud, or federated AI architectures.

These applications are relevant across multiple industry sectors, including entertainment, digital health, automotive, smart cities, and telecommunications.

Related Standards

ISO/IEC TR 23888-1 aligns with and builds upon several international standards and ongoing initiatives:

ISO/IEC 15938 (MPEG-7): Multimedia content description interface, referenced for AI-based content descriptors.
ISO/IEC 22989: Key definitions for artificial intelligence, adopted for core terminology.
ONNX and NNEF: Model exchange formats facilitating interoperability of AI models in multimedia processing.
ISO/IEC 15938-17: Provides specifications for AI model transmission in multimedia systems.

By providing vision and identifying scenarios, ISO/IEC TR 23888-1 informs future efforts in multimedia AI standards, ensuring robust interoperability, efficiency, and transparency for the rapidly evolving field of artificial intelligence in multimedia.

Buy Documents

ISO/IEC DTR 23888-1 - Information technology — Artificial intelligence for multimedia — Part 1: Vision and scenarios/9/2026 - Page 1 preview

ISO/IEC DTR 23888-1 - Information technology — Artificial intelligence for multimedia — Part 1: Vision and scenarios/9/2026 - Page 2 preview

ISO/IEC DTR 23888-1 - Information technology — Artificial intelligence for multimedia — Part 1: Vision and scenarios/9/2026 - Page 3 preview

Draft

ISO/IEC DTR 23888-1 - Information technology — Artificial intelligence for multimedia — Part 1: Vision and scenarios/9/2026

Release Date:09-Feb-2026

English language (28 pages)

sale 15% off

REDLINE ISO/IEC DTR 23888-1 - Information technology — Artificial intelligence for multimedia — Part 1: Vision and scenarios/9/2026 - Page 1 preview

REDLINE ISO/IEC DTR 23888-1 - Information technology — Artificial intelligence for multimedia — Part 1: Vision and scenarios/9/2026 - Page 2 preview

REDLINE ISO/IEC DTR 23888-1 - Information technology — Artificial intelligence for multimedia — Part 1: Vision and scenarios/9/2026 - Page 3 preview

Draft

REDLINE ISO/IEC DTR 23888-1 - Information technology — Artificial intelligence for multimedia — Part 1: Vision and scenarios/9/2026

Release Date:09-Feb-2026

English language (28 pages)

sale 15% off

Get Certified

Connect with accredited certification bodies for this standard

BSI Group

BSI (British Standards Institution) is the business standards company that helps organizations make excellence a habit.

UKAS United Kingdom Verified

Visit Website

NYCE

Mexican standards and certification body.

EMA Mexico Verified

Visit Website

Frequently Asked Questions

What is ISO/IEC TR 23888-1?

ISO/IEC TR 23888-1 is a draft published by the International Organization for Standardization (ISO). Its full title is "Information technology — Artificial intelligence for multimedia — Part 1: Vision and scenarios". This standard covers: Information technology — Artificial intelligence for multimedia — Part 1: Vision and scenarios

What is the scope of ISO/IEC TR 23888-1?

Information technology — Artificial intelligence for multimedia — Part 1: Vision and scenarios

What ICS categories does ISO/IEC TR 23888-1 belong to?

ISO/IEC TR 23888-1 is classified under the following ICS (International Classification for Standards) categories: 35.040.40 - Coding of audio, video, multimedia and hypermedia information; 35.240.01 - Application of information technology in general. The ICS classification helps identify the subject area and facilitates finding related standards.

How can I access ISO/IEC TR 23888-1?

ISO/IEC TR 23888-1 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.

Standards Content (Sample)

FINAL DRAFT
Technical
Report
ISO/IEC DTR
23888-1
ISO/IEC JTC 1/SC 29
Information technology — Artificial
Secretariat: JISC
intelligence for multimedia —
Voting begins on:
2026-02-23
Part 1:
Vision and scenarios
Voting terminates on:
2026-04-20
Technologies de l'information — Intelligence artificielle pour le
multimédia —
Partie 1: Vision et scénarios
RECIPIENTS OF THIS DRAFT ARE INVITED TO SUBMIT,
WITH THEIR COMMENTS, NOTIFICATION OF ANY
RELEVANT PATENT RIGHTS OF WHICH THEY ARE AWARE
AND TO PROVIDE SUPPOR TING DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
LOGICAL, COMMERCIAL AND USER PURPOSES, DRAFT
INTERNATIONAL STANDARDS MAY ON OCCASION HAVE
TO BE CONSIDERED IN THE LIGHT OF THEIR POTENTIAL
TO BECOME STAN DARDS TO WHICH REFERENCE MAY BE
MADE IN NATIONAL REGULATIONS.
Reference number
ISO/IEC DTR 23888-1:2026(en) © ISO/IEC 2026

FINAL DRAFT
ISO/IEC DTR 23888-1:2026(en)
Technical
Report
ISO/IEC DTR
23888-1
ISO/IEC JTC 1/SC 29
Information technology — Artificial
Secretariat: JISC
intelligence for multimedia —
Voting begins on:
Part 1:
Vision and scenarios
Voting terminates on:
Technologies de l'information — Intelligence artificielle pour le
multimédia —
Partie 1: Vision et scénarios
RECIPIENTS OF THIS DRAFT ARE INVITED TO SUBMIT,
WITH THEIR COMMENTS, NOTIFICATION OF ANY
RELEVANT PATENT RIGHTS OF WHICH THEY ARE AWARE
AND TO PROVIDE SUPPOR TING DOCUMENTATION.
© ISO/IEC 2026
IN ADDITION TO THEIR EVALUATION AS
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
LOGICAL, COMMERCIAL AND USER PURPOSES, DRAFT
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
INTERNATIONAL STANDARDS MAY ON OCCASION HAVE
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
TO BE CONSIDERED IN THE LIGHT OF THEIR POTENTIAL
or ISO’s member body in the country of the requester.
TO BECOME STAN DARDS TO WHICH REFERENCE MAY BE
MADE IN NATIONAL REGULATIONS.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland Reference number
ISO/IEC DTR 23888-1:2026(en) © ISO/IEC 2026

© ISO/IEC 2026 – All rights reserved
ii
ISO/IEC DTR 23888-1:2026(en)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 General AI Concepts and Terminology .1
3.2 Abbreviated terms .4
4 Vision on artificial intelligence for multimedia . 5
5 Technical working assumptions and general expectations (challenges) . 6
5.1 Interoperability.6
5.2 Functional aspects .6
5.2.1 Reproducibility of experiments .6
5.2.2 Bit-exact neural network models.7
5.2.3 Example of experiment and evaluation environment .7
5.3 Performance aspects .7
5.3.1 Complexity .7
5.3.2 Efficiency .7
6 Technologies and use cases . 8
6.1 General .8
6.2 AI-based Video Coding . .8
6.2.1 General .8
6.2.2 Key performance indicators for neural network-based tools .8
6.2.3 Neural network-based in-loop filters .9
6.2.4 Neural network-based intra prediction .9
6.2.5 Neural network-based postfilters .9
6.3 AI-based 3D Graphics Coding .10
6.3.1 AI-based 3D dynamic point cloud coding .10
6.4 AI Model Compression . 12
6.4.1 General . 12
6.4.2 Use cases . 13
6.5 Video Coding for AI-based Downstream Tasks .17
6.5.1 General .17
6.5.2 Use cases .17
6.6 Feature Coding for AI models . 23
6.6.1 General . 23
6.6.2 Use cases . 25
6.7 Description of media for distributed AI processing . 25
6.7.1 General . 25
6.7.2 Use case . 25
Annex A (informative) Non-normative usage of artificial intelligence for multimedia .27
Bibliography .28

© ISO/IEC 2026 – All rights reserved
iii
ISO/IEC DTR 23888-1:2026(en)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations,
governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of document should be noted. This document was drafted in accordance with the editorial rules of the ISO/
IEC Directives, Part 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
ISO and IEC draw attention to the possibility that the implementation of this document may involve the
use of (a) patent(s). ISO and IEC take no position concerning the evidence, validity or applicability of any
claimed patent rights in respect thereof. As of the date of publication of this document, ISO and IEC had not
received notice of (a) patent(s) which may be required to implement this document. However, implementers
are cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents and https://patents.iec.ch. ISO and IEC shall not be held
responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www.iso.org/iso/foreword.html.
In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
A list of all parts in the ISO 23888 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.

© ISO/IEC 2026 – All rights reserved
iv
ISO/IEC DTR 23888-1:2026(en)
Introduction
This document is part of the ISO/IEC 23888 series (also known as MPEG-AI) on artificial intelligence (AI) for
multimedia.
© ISO/IEC 2026 – All rights reserved
v
FINAL DRAFT Technical Report ISO/IEC DTR 23888-1:2026(en)
Information technology — Artificial intelligence for
multimedia —
Part 1:
Vision and scenarios
1 Scope
This document presents the role of artificial intelligence (AI) and neural network (NN) technologies in
multimedia coding and processing activities. It describes the current perspectives on AI for multimedia
and identifies working assumptions and technical challenges expected from working with AI and NN-based
technologies. This document highlights a variety of multimedia coding activities, key scenarios and gaps
that are to be addressed by further standardization efforts.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1 General AI Concepts and Terminology
3.1.1
AI component
functional element that constructs an AI system (3.1.4)
[SOURCE: ISO/IEC 22989:2022, 3.1.2]
3.1.2
activation function
function applied to the weighted combination of all inputs of a neuron (3.1.15)
Note 1 to entry: Activation function allows neural networks to learn complicated features in the data. They are
typically non-linear.
[SOURCE: ISO/IEC 22989:2022, 3.4.1]
3.1.3
artificial intelligence
AI
research and development of mechanisms and applications of AI systems (3.1.4)
[SOURCE: ISO/IEC 22989:2022, 3.1.3, modified — Note 1 to entry has been removed.]

© ISO/IEC 2026 – All rights reserved
ISO/IEC DTR 23888-1:2026(en)
3.1.4
artificial intelligence system
AI system
engineered system that generates outputs such as content, forecasts, recommendations or decisions for a
given set of human-defined objectives
[SOURCE: ISO/IEC 22989:2022, 3.1.4, modified — Note 1 to entry has been removed.]
3.1.5
convolutional neural network
CNN
neural network (3.1.14) using convolution (3.1.6) in at least one of its layers
[SOURCE: ISO/IEC 22989:2022, 3.4.2, modified — removed "feed forward"]
3.1.6
convolution
mathematical operation involving a sliding dot product or cross-correlation of the input data.
[SOURCE: ISO/IEC 22989:2022, 3.4.3]
3.1.7
deep learning
deep neural network learning
approach to creating rich hierarchical representations through the training of neural
networks (3.1.14) with many hidden layers.
Note 1 to entry: Deep learning is a subset of ML (3.1.11)
[SOURCE: ISO/IEC 22989:2022, 3.4.4]
3.1.8
downstream task
The downstream task is a task that depends on the output of a previous task or process. It involves applying
a pre-trained model to a new problem.
3.1.9
inference
reasoning by which conclusions are derived from known premises
Note 1 to entry: In AI, a premise is either a fact, a rule, a model, a feature or raw data.
Note 2 to entry: The term "inference" refers both to the process and its result.
[SOURCE: ISO/IEC 22989:2022, 3.1.17]
3.1.10
model
physical, mathematical or otherwise logical representation of a system, entity, phenomenon, process or data
[SOURCE: ISO/IEC 22989:2022, 3.1.23]
3.1.11
machine learning
ML
process of optimizing model parameters (3.1.17) through computational techniques, such that the model's
(3.1.10) behaviour reflects the data or experience
[SOURCE: ISO/IEC 22989:2022, 3.3.5]

© ISO/IEC 2026 – All rights reserved
ISO/IEC DTR 23888-1:2026(en)
3.1.12
machine learning algorithm
algorithm to determine parameters (3.1.17) of a machine learning model (3.1.10) from data according to
given criteria
EXAMPLE
Consider a simple deblocking filter consisting of a single convolution layer, which creates an output image
II , where I is the input image and theta is the parameter matrix of the convolution kernel, which are to
outin in
be learned from a set of pairs of input and output images.
[SOURCE: ISO/IEC 22989:2022, 3.3.6, modified — Example has been replaced]
3.1.13
machine learning model
mathematical construct that generates an inference (3.1.9) or prediction based on input data or information
EXAMPLE
For a simple deblocking filter II learned from image pairs (I , I ), where the model is represented by the
outin in out
set of parameters defined as θ .
Note 1 to entry: A machine learning model results from training based on a machine learning algorithm (3.3.6).
[SOURCE: ISO/IEC 22989:2022, 3.3.7, modified — Example has been replaced]
3.1.14
neural network
NN
neural net
artificial neural network
network of one or more layers of neurons (3.1.15) connected by weighted links with
adjustable weights, which takes input data and produces an output.
[SOURCE: ISO/IEC 22989:2022, 3.4.8], modified — Notes have been removed]
3.1.15
neuron
primitive processing element which takes one or more input values and produces an
output value by combining the input values and applying an activation function (3.1.2) on the result
[SOURCE: ISO/IEC 22989:2022, 3.4.9, modified – note 1 to entry has been removed]
3.1.16
multilayered perceptron
neural network consisting of a group of source nodes, one or more hidden layers, and one output layer, and
using a monotonic activation function
Note 1 to entry: Each artificial neuron in a multilayered perceptron is a single-layer perceptron.
Note 2 to entry: Multilayered perceptrons can implement any Boolean function.
[SOURCE: ISO/IEC 2382:2015, modified — Note 3 to entry removed, "feedforward network" replaced by
"neural network"]
3.1.17
parameter
model parameter
internal variable of a model (3.1.17) that affects how it computes its outputs
Note 1 to entry: Examples of parameters include the weights in a neural network and the transition probabilities in a
Markov model.
© ISO/IEC 2026 – All rights reserved
ISO/IEC DTR 23888-1:2026(en)
[SOURCE: ISO/IEC 22989:2022, 3.3.8]
3.2 Abbreviated terms
The following abbreviated terms are used in this document:
AI artificial intelligence
AR augmented reality
AVC advanced video coding
BIM building information modelling
CCTV closed circuit television
DASH dynamic adaptive streaming over HTTP
DNN deep neural networks
FCM feature coding for machines
GOP group of pictures
HEVC high efficiency video coding
HTTP hypertext transfer protocol
ICT information and communication technologies
IoT internet of things
IoV internet of vehicles
IR infra-red
kMAC kilo multiplication-accumulation
LFNST low-frequency non-separable transform
LIDAR laser imaging detection and ranging
MIP matrix interpolation prediction
ML machine learning
mMTC massive machine type communications
MPEG Moving Picture Expert Group
MR mixed reality
NN neural network
NNC neural network coding
NNEF Neural Network Exchange Format
ONNX® Open Neural Network Exchange Format
PSNR peak signal to noise ratio

© ISO/IEC 2026 – All rights reserved
ISO/IEC DTR 23888-1:2026(en)
QoE quality of experience
RGB red-green-blue
RPR reference picture resampling
SEI supplemental enhancement information
VCM video coding for machines
VR virtual reality
VTM VVC test model
VVC versatile video coding
V2X vehicle to everything
XR extended reality
4 Vision on artificial intelligence for multimedia
ISO/IEC JTC1/SC29 has started to embrace AI-based methods since the development of the compact video
[1]
descriptors, ISO/IEC 15938-15:2019, in 2019. Today, many standardisation projects are exploring the
interplay of AI and multimedia and start adopting AI or NN-based technologies. As these technologies
become a core enabler in emerging standards, MPEG-AI has been established as an umbrella for such
standardization activities.
MPEG-AI considers two aspects of interaction between AI and multimedia,
— AI as multimedia coding tool: This includes AI-based multimedia representations and coding. Examples
include techniques for obtaining AI-based content descriptors and AI-based information compression
and coding such as AI-based video compression.
— Multimedia for consumption by AI: Any multimedia representation that can be consumed by an AI system.
This covers both AI-based and non-AI based representations that facilitate the delivery of content to an
AI system. Examples include techniques for compressing a neural network. Additionally, it includes the
guidelines or frameworks for applying existing coding tools to facilitate AI-based content usage. Such
guidelines are accessible in the form of technical reports, e.g. applications of versatile video coding to
[8]
machine video consumption .
AI-based standards can solely base on AI methods or use them for one or more tools of a processing chain,
combining the complementary strengths of different methods in a hybrid architecture. MPEG-AI standards
specifies one or more of the following technologies:
— Multimedia representation. AI-based methods are used to encode multimedia content or derived
features and descriptors efficiently. Multimedia content includes traditional representations such as
video, audio, and graphics, but also representations relevant for realising the metaverse, such as implicit
scene representations.
— Analysis and processing. AI-based methods are used for the extraction of features or descriptors, for the
reconstruction and processing (e.g. improvement) of multimedia content, or the selection of parameters
for content distribution.
— Supporting technologies. Technologies for the efficient representation and deployment of neural
network models, and metadata representations for AI-based processes.
The use of AI-based technologies comes with additional computational complexity, so that trading off
improvement in terms of a task-specific performance metric (e.g. coding gain) vs. computation effort is to
be made. MPEG-AI takes these issues into account based on the requirements of the specific tasks. Another

© ISO/IEC 2026 – All rights reserved
ISO/IEC DTR 23888-1:2026(en)
aspect is the dependency on specific frameworks or representation of neural networks, in particular for the
inference. A discussion of this issue and the assumptions made in MPEG-AI is provided in Clause 5.
The technologies expected to be addressed by MPEG-AI relate to media representation, analysis, synthesis,
and processing. It is recognized that there are ethical concerns related to the introduction of AI and NN-
based methods. The standardisation process fosters transparency and thorough assessment of technologies,
aiming at high robustness and trustworthiness of the resulting technologies.
5 Technical working assumptions and general expectations (challenges)
5.1 Interoperability
Whether an end-to-end or a Hybrid codec architecture is designed, the interoperability between receiver
and senders needs to be achieved. As in traditional coding, the bitstream at the split point (usually after the
entropy coder) needs to be well specified.
Additional information for AI coding interoperability needs to be considered. For example:
The deployment of trained neural network models relies on interoperable model formats. While this aspect
[9]
is addressed by exchange formats such as the Neural Network Exchange Format (NNEF) or the Open
1) [10]
Neural Network Exchange Format (ONNX® ), support for compressed parameters is very limited in
these formats. For applications such as the deployment of neural networks for object detection to smart
cameras or for post filtering in video decoders, an interoperable representation of compressed models needs
to be specified in order to reduce the needed bandwidth.
Neural network-based post filters are effective components in video decoders, and updated models can
be provided to optimise for specific content properties. If the model is transmitted in compressed format,
either as part of the video bitstream or out of band, an interoperable bitstream format, such as the one
[2]
specified in ISO/IEC 15938-17, needs to be used. In addition, appropriate signalling in the bitstream needs
to be included in the bitstream to indicate the model format and how it is transmitted.
5.2 Functional aspects
5.2.1 Reproducibility of experiments
The reproducibility of experiments is a challenge for neural network-based AI models due to the training
strategies and scale of experiments. To facilitate the reproducibility of technical contributions some key
strategies are considered during the development of the reference software and experimentations, including,
— Detailed documentation: providing comprehensive documentation that includes information on
dependencies, libraries, and specific versions to help others recreate the same environment.
— Seed values for randomization: setting seed values for random number generators. This ensures that
random processes will be reproducible.
— Data versioning: ensures that everyone accesses the exact same dataset.
— Containerization: allows encapsulating the complete environment and guarantees that the code runs
consistently across different systems. Example of such containerization tools are docker, apptainer, and
the like.
— Hyperparameters: Clearly stating the hyperparameters of a model facilitates replication of experiments.
— Experiment logging: complementing experiments with logs that include parameters, metrics, and results
allows people to better track the development of a solution within a reference software.
1) ONNX is the trademark of a product owned by LF PROJECTS, LLC. This information is given for the convenience of
users of this document and does not constitute an endorsement by ISO/IEC of the product named.

© ISO/IEC 2026 – All rights reserved
ISO/IEC DTR 23888-1:2026(en)
— Checkpoints and model weights: saving and sharing the model checkpoints allows others to compare
their obtained results and compare their results in a better-informed fashion.
5.2.2 Bit-exact neural network models
Achieving bit-exact reproducibility in neural networks is a challenge due to various factors, including
hardware differences, parallelization, and non-deterministic operations. The standards happen not to
be able to enforce bit-exact reproducibility due to the complications of hardware dependencies. Instead,
they can consider approaches for alleviating the concerns by providing informative guidelines or careful
considerations during development of the standard and reference software. Such guidelines are not
necessarily to be essential to software/hardware specific implementation of the standard and considered
as informative annexes, unless proven otherwise. For example, it is a common knowledge that lower
complexity neural networks’ weights and activations such as integer quantized can help achieving more
consistent results. Yet, there exist additional multiple methodologies for achieving more consistent results,
e.g. by aligning operation orders, AI-supporting software/hardware infrastructures, or enforcing a proper
training strategy, etc., and the performance varies based on underlying software/hardware platform.
Therefore, one adopts such methodologies (e.g. quantized models) during development, though such specific
approaches for obtaining a more consistent result which are not be suitable for all software/hardware
platforms. The standards pay attention to the limitations that enforcing a specific implementation poses to
the rate of adoption and success of a standard. Each standard carefully discusses these aspects within the
scope of its purpose.
5.2.3 Example of experiment and evaluation environment
Following is an experiment and evaluation environment setup example to ensure experiment reproducibility
and bit-exact evaluation:
— Hardware environment: list CPU and GPU specification, hardware memory, virtual memory.
— Software environment: List operating system kernel, coding environment, software buffer.
— Neural Network description: neural network definition and type, version
— Training description: dataset type, version, training algorithm, number of epochs.
— Bit-exact evaluation: md5 checksum, use of dump file for crosscheck.
5.3 Performance aspects
5.3.1 Complexity
In addition to the encoding and decoding time, the following metrics are considered:
— Complexity in terms of multiply-accumulate operations per sample, or kMAC/sample. This number
is to be kept low enough to allow decoding in future handheld devices without emptying the battery
too quickly. It is also important to keep kMAC/sample low to ensure economically feasible sizes when
implemented in silicon.
— Complexity in terms of number of parameters of the model. Since models need to be swapped in and out
of memory, keeping the parameter count low is be necessary to make a decoder implementable.
5.3.2 Efficiency
The AI-based systems pose challenges for conventional efficiency metrics or necessitate introducing new
efficiency metrics. For example, in video coding for machines, conventional metrics of video compression
such as BD-Rate is not sufficient and alternatives that can convey a machine task performance and
compression rate become relevant. It becomes, then, necessary that a proper metric is determined in each
activity. In determining a proper efficiency metric, some characteristics can be considered including, not
limited to, relevance, accuracy (correctness in reflecting true performance), consistency, comparability,
sensitivity, and robustness.
© ISO/IEC 2026 – All rights reserved
ISO/IEC DTR 23888-1:2026(en)
6 Technologies and use cases
6.1 General
Clause 6 describes the main scenarios that are considered AI relevant. They are used to derive the working
assumptions for artificial intelligence applications and services listed in Clause 5. The following topics are
described in this clause:
— AI model compression;
— Video coding for AI-based downstream tasks;
— Feature coding for AI models;
— Support of AI-based video processing by VSEI messages;
— AI based point cloud coding.
6.2 AI-based Video Coding
6.2.1 General
A video codec aims at reducing the bit rate of raw video while retaining a low level of distortion and keeping
complexity (operations per pixel, memory constraints etc.) reasonable. It does so by exploiting temporal and
spatial redundancies (e.g. copying samples from previous pictures or predicting samples from neighbouring
blocks). To some extent existing video codecs also use knowledge about how real video signals typically
behave, for example, deblocking can lower distortion since it is known that real-world raw video seldomly
includes blocking artifacts. Learning-based methods exploit such knowledge more directly by learning
statistical aspects of raw video. The versatile video coding standard VVC includes some AI-based methods
such as secondary transforms (LFNST) and matrix interpolation prediction (MIP), but these remain linear
due to complexity concerns. AI-based methods to improve video compression beyond VVC are expected to
be part of a future video coding standard with the focus is on (non-linear) neural network-based methods.
A distinction is made between video codecs that are entirely based on neural networks, so-called end-to-
end codecs, and so-called hybrid codecs where some parts are similar to existing video coding standards
and other parts are neural network based. The latter approach is described in this clause.
6.2.2 Key performance indicators for neural network-based tools
In this use case, neural network-based methods replace or enhance existing tools in an otherwise traditional
video codec. To progress the work VVC is used as a base, and neural network-based tools are inserted at
various places to investigate how bit rate is saved while maintaining picture quality. The neural network-
based tools are evaluated on a few key performance indicators:
— BD-rate, or Bjontegaard-Delta rate, is a measure of how much the bit rate is lowered at a constant quality
level as measured in PSNR. For neural network-based methods to be interesting, they are to provide a
sufficient BD-rate reduction compared to traditional methods.
— Complexity in terms of multiply-accumulate operations per sample, or kMAC/sample. This number
is to be kept low enough to allow decoding in future handheld devices without emptying the battery
too quickly. It is also important to keep kMAC/sample low to ensure economically feasible sizes when
implemented in silicon.
— Complexity in terms of number of parameters of the model. Since models are needed to be swapped in
and out of memory, keeping the parameter count low is necessary to make a decoder implementable.
There are also other performance indicators discussed, but the three above-mentioned ones are deemed the
most important.
© ISO/IEC 2026 – All rights reserved
ISO/IEC DTR 23888-1:2026(en)
6.2.3 Neural network-based in-loop filters
An in-loop filter is an entity that takes in the decoded picture and processes it to remove artifacts. Crucially,
it is the processed picture that is later used for prediction. This is important, since effort spent on a
certain picture can not only positively affect that picture, but also subsequent pictures that predict from
it. VVC includes three loop filters (deblocking, sample adaptive offset and adaptive loop filter). This can be
complemented by the addition of a non-linear neural network-based loop filter.
6.2.4 Neural network-based intra prediction
One way to take advantage of spatial redundancy is to predict samples from neighbouring, already coded
blocks. In video compression standards such as HEVC, this is done by assuming that patterns in the
surrounding samples (e.g. directly above and/or to the left of the current block) continue in the current block,
perhaps at an angle that is signalled. In VVC, a linear AI-based approach called MIP was introduced, where
learned matrices are multiplied by a vector containing the surrounding samples to obtain the prediction.
This can be complemented by a non-linear neural network as a way to predict samples. Surrounding samples
would be input to the neural network and the output would be a prediction of the desired samples.
6.2.5 Neural network-based postfilters
6.2.5.1 General
In contrast to an in-loop filter, a postfilter only affects the picture right before output to the display and does
not influence the prediction. Hence, a low-capable decoder can turn off the postfilter to save computation
without risking decoder drift, while a high-capable decoder decoding the same bit stream can keep it on.
A neural network-based postfilter aimed at off-line encoded content can be added. A base-model would
be finetuned for each sequence, and sequence-dependent parameters of the neural network would be
transmitted to the decoder. A capable decoder can use the transmitted parameters to update a pre-stored
base model and filter the sequence. A non-capable decoder can ignore and would not get increased picture
quality, but the video would still decode without drift.
Transferring sequence dependent information and parameters to a decoder can be done using supplementary
enhancement information (SEI) messages. SEI is an additional data inserted to the bitstream to carry some
extra information with the video content. The SEI can carry technical and additional information which a
decoder utilizes or depending on its capabilities ignore. SEI messages can be used to convey some parameters
related to postfilters.
6.2.5.2 Neural network-based super-resolution networks
In VVC, reference picture resampling (RPR) allows pictures to be resampled at different spatial resolutions
(i.e. changing resolution from the reference picture to the current picture). In VVC, traditional up- and down
sampling filters are used to convert between resolutions. A super-resolution neural network can instead be
used to scale the pictures when RPR is enabled.
AI-based video coding involves training neural network-based components as part of a codec. A neural
network can be used as part of visual data processing and coding (e.g. all-intra coding only or in capturing
inter-frame dependencies, loop filters or a post-filter). A codec that utilizes neural networks can follow a
fully AI-based architecture or a hybrid one where an NN-based component complements an already existing
video codec. An advantage of a neural network-based codec is the ability to be updated according to the
demands. In other words, the neural network components can be redeployed or updated in various scenarios.
[2]
The neural network compression standard, ISO/IEC 15938-17 is a versatile standard for compressing and
deploying a neural network or delivering a compressed weight update. The weight update compression
[2]
can be handy in scenarios where adjustments to current codec is needed. The ISO/IEC 15938-17 can be
considered as a means of delivery of neural networks, for example, post filters as part of the supplemental
enhancement information (SEI) message subclause 6.2.5.3.

© ISO/IEC 2026 – All rights reserved
ISO/IEC DTR 23888-1:2026(en)
6.2.5.3 Versatile supplement enhancement information messages for AI
A neural network can be available as a post-filter or an enhancement algorithm for various purposes. To be
able to conduct successful operations in conjunction with a video codec some complementary information
[3]
is required. ISO/IEC 23002-7, versatile supplemental enhancement information messages for coded video
bitstreams, has developed the support for delivery of such information. The support includes mechanisms
[2]
for availability of a neural network representation in terms of ISO/IEC 15938-17 bitstream or other neural
representations in terms of a URI for retrieving the neural network. More importantly, the standard defines
the supplementary enhancement information messages for signalling characteristics of the postfilter and
for the usage of such mechanism, i.e. when the neural post-filter is activated. The SEI messages include
important parameters for the successful operation of a postfilter, including, the correct order of samples in
input tensors, patch-wise processing configurations, resampling ratio for spatial resampling, identifiers for
specifying a NN post filter, and updates to them relative to a base, output colour format, and more.
6.3 AI-based 3D Graphics Coding
6.3.1 AI-based 3D dynamic point cloud coding
6.3.1.1 General
3D graphics content is growing significantly with the rapid development of 3D sensors (e.g. 3D scanners,
LiDAR, and vision-based 3D reconstruction technologies). As an outcome of sensor-based data acquisition,
point cloud – a raw 3D data format defined as a set of points has become popular. Each point is depicted
by its position using a 3D coordinate. As a set, the point cloud represents the geometry of an object or a
scene. Optionally, each point can be associated with attributes that is application dependent. For example,
RGB colour for Immersive Applications (such as VR and AR), and reflectance from LiDAR for Autonomous
Navigation and Robotics. The point cloud data allows further processing for more complete 3D modelling
(e.g. mesh representation). They can be also consumed directly by machine-oriented downstream perception
tasks, for example, detection, recognition, etc.
Often point clouds are composed of a sequence of frames representing a dynamic object or a dynamic scene.
This is especially the case for immersive applications (such as VR and AR), and LiDAR for autonomous
navigation and robotics.
AI-based eco-systems have been flourishing in a few application domains such as autonomous driving,
robotics and metaverse in the past decade. AI techniques have also recently demonstrated state-of-the-
art performance in 3D point cloud compression. However, unlike, for example, processing, enhancement,
rendering and perception tasks, there is no wide deployment of AI-based point cloud compression, and one
reason is the lack of a standard preventing interoperability between different vendors. Specifically, there is
a need for standardization on AI-based dynamic point cloud compression.
6.3.1.2 Use cases
6.3.1.2.1 Point clouds for immersive applications
Dense point clouds denote representations captured with sufficient density to provide realistic rendering
suitable for human viewing in immersive applications. Environments where point clouds and AI/ML based
technologies can find synergies include XR and gaming.
6.3.1.2.1.1 Extended reality (XR)
XR encompasses all immersive technologies, including virtual reality (VR), augmented reality (AR), mixed
reality (MR), and any other technology that blends the physical and virtual worlds, altering or enhancing
our perception of reality. In this context, point clouds serve as a foundational element for XR experiences by
providing accurate spatial information, enabling realistic interactions between digital and physical worlds,
and enhancing the overall immersive quality of these technologies. They can serve, for instance, as valuable
documentation tools for preserving cultural heritage, offer accessibility to individuals who might not have

© ISO/IEC 2026 – All rights reserved
ISO/IEC DTR 23888-1:2026(en)
the opportunity to visit physical locations, provide educational opportunities and enable highly immersive
exhibits.
6.3.1.2.1.2 Gaming
Gaming refers to the activity or process of playing electronic games, typically through various platforms
such as consoles, computers, mobile devices, or dedicated gaming systems. Graphical content holds
substantial importance in gaming for several reasons, like immersive experience, visual appeal, gameplay
clarity, storytelling, and atmosphere. Incorporating point cloud data into game environments allows for
highly detailed and realistic representations of real-world locations. This level of detail can significantly
enhance the visual fidelity and authenticity of in-game environments for deeper immersive experiences.
6.3.1.2.2 Point clouds for autonomous navigation and robotics
Sparse point clouds represent data captured by spinning LiDAR sensors at larger intervals, resulting in a less
detailed depiction of objects or scenes. This format is well-suited for autonomous navigation and robotics,
encompassing both indoor and outdoor scenarios.
Autonomous navigation refers t
...

ISO/IEC DTR 23888-1:2025(1)
ISO/IEC JTC 1/SC 29/WG 2
Secretariat: JISC
Date: 20252026-01-2527
Information technology – — Artificial intelligence for multimedia
– —
Part 1:
Vision and scenarios
DTRTechnologies de l'information — Intelligence artificielle pour le multimédia —
Partie 1: Vision et scénarios
FDIS stage
Warning for WDs and CDs
ISO #####-#:####(X)
This document is not an ISO International Standard. It is distributed for review and comment. It is subject to
change without notice and may not be referred to as an International Standard.
Recipients of this draft are invited to submit, with their comments, notification of any relevant patent rights of
which they are aware and to provide supporting documentation.

2 © ISO #### – All rights reserved

ISO/IEC CD TR DTR 23888-1:2025(1:(en)
© ISO/IEC 2026
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication
may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying,
or posting on the internet or an intranet, without prior written permission. Permission can be requested from either ISO
at the address below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: + 41 22 749 01 11
EmailE-mail: copyright@iso.org
Website: www.iso.orgwww.iso.org
Published in Switzerland
iv © ISO 2025 /IEC 2026 – All rights reserved
iv
ISO D TR/IEC DTR 23888-1:2025(1:(en)
Contents
Foreword . vii
Introduction . viii
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 General AI Concepts and Terminology . 1
3.2 Abbreviated terms . 4
4 Vision on artificial intelligence for multimedia . 6
5 Technical working assumptions and general expectations (challenges). 7
5.1 Interoperability . 7
5.2 Functional aspects . 8
5.3 Performance aspects . 9
6 Technologies and use cases . 9
6.1 General. 9
6.2 AI-based Video Coding . 10
6.3 AI-based 3D Graphics Coding . 12
6.4 AI Model Compression . 15
6.5 Video Coding for AI-based Downstream Tasks . 20
6.6 Feature Coding for AI models . 29
6.7 Description of media for distributed AI processing . 31
Annex A (informative) Non-normative usage of artificial intelligence for multimedia . 33
Bibliography . 34

Foreword . v
Introduction . vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 General AI Concepts and Terminology . 1
3.1.1 AI component . 1
3.1.2 activation function . 1
3.1.3 artificial intelligence . 1
3.1.4 artificial intelligence system . 1
3.1.5 convolutional neural network . 2
3.1.6 convolution . 2
3.1.7 deep learning . 2
3.1.8 downstream task . 2
3.1.9 inference . 2
3.1.10 model . 2
3.1.11 machine learning . 2
3.1.12 machine learning algorithm . 3
3.1.13 machine learning model . 3
3.1.14 neural network . 3
3.1.15 neuron . 3
3.1.16 multilayered perceptron . 3
v
ISO/IEC CD TR DTR 23888-1:2025(1:(en)
3.1.17 parameter . 4
3.2 Abbreviated terms . 4
4 Vision on artificial intelligence for multimedia . 5
5 Technical working assumptions and general expectations (challenges). 6
5.1 Interoperability . 6
5.2 Functional aspects . 6
5.2.1 Reproducibility of experiments . 6
5.2.2 Bit-exact neural network models . 7
5.2.3 Example of experiment and evaluation environment . 7
5.3 Performance aspects . 7
5.3.1 Complexity . 7
5.3.2 Efficiency . 7
6 Technologies and use cases . 8
6.1 General. 8
6.2 AI-based Video Coding . 8
6.2.1 Introduction . 8
6.2.2 Key performance indicators for neural network-based tools . 8
6.2.3 Neural network-based in-loop filters . 9
6.2.4 Neural network-based intra prediction . 9
6.2.5 Neural network-based postfilters . 9
6.3 AI-based 3D Graphics Coding . 10
6.3.1 AI-based 3D dynamic point cloud coding . 10
6.4 AI Model Compression . 12
6.4.1 Introduction . 12
6.4.2 Use cases . 13
6.5 Video Coding for AI-based Downstream Tasks . 17
6.5.1 Introduction . 17
6.5.2 Use cases . 17
6.6 Feature Coding for AI models . 23
6.6.1 Introduction . 23
6.6.2 Use cases . 24
6.7 Description of media for distributed AI processing . 24
6.7.1 Introduction . 24
6.7.2 Use case . 25
Annex A (informative) Non-normative usage of artificial intelligence for multimedia . 26
A.1 VCM and FCM . 26
A.2 Non-normative technology in NNC . 26
A.2.1 Introduction . 26
A.2.2 NNC in video and feature coding scenarios . 26
Bibliography . 27

vi © ISO 2025 /IEC 2026 – All rights reserved
vi
ISO D TR/IEC DTR 23888-1:2025(1:(en)
Foreword
ISO (the International Organization for Standardization) is a and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide federation of national standardsstandardization.
National bodies (that are members of ISO member bodies). The workor IEC participate in the development of
preparing International Standards is normally carried out through ISO technical committees. Each member
body interested in a subject for which a technical committee has been established has the right to be
represented on that committee. Internationalby the respective organization to deal with particular fields of
technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types of
ISO documentsdocument should be noted. This document was drafted in accordance with the editorial rules
of the ISO/IEC Directives, Part 2 (see www.iso.org/directives 2 (see www.iso.org/directives or
www.iec.ch/members_experts/refdocs).
Attention is drawnISO and IEC draw attention to the possibility that some of the elementsimplementation of
this document may beinvolve the subjectuse of (a) patent rights. ISO(s). ISO and IEC take no position
concerning the evidence, validity or applicability of any claimed patent rights in respect thereof. As of the date
of publication of this document, ISO and IEC had not received notice of (a) patent(s) which may be required to
implement this document. However, implementers are cautioned that this may not represent the latest
information, which may be obtained from the patent database available at www.iso.org/patents and
https://patents.iec.ch. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
Details of any patent rights identified during the development of the document will be in the Introduction
and/or on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html)
see www.iso.org/iso/foreword.html. In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1 [, Information technology],,
Subcommittee SC 29 [, Coding of audio, picture, multimedia and hypermedia information].
A list of all parts in the ISO 23888 series can be found on the ISO websiteand IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.htmlwww.iso.org/members.html and
www.iec.ch/national-committees.
vii
ISO/IEC CD TR DTR 23888-1:2025(1:(en)
Introduction
This document is a draft technical report on the artificial intelligence for multimedia and constitutes the first
part of the ISO/IEC 23888 series (also known as MPEG-AI) suite of standards for theon artificial intelligence
(AI) for multimedia.
The International Organization for Standardization (ISO) draws attention to the fact that it is claimed that
compliance with this document may involve the use of a patent.
ISO takes no position concerning the evidence, validity, and scope of this patent right.
The holder of this patent right has assured ISO that he/she is willing to negotiate licences under reasonable
and non-discriminatory terms and conditions with applicants throughout the world. In this respect, the
statement of the holder of this patent right is registered with ISO. Information may be obtained from the patent
database available at www.iso.org/patents.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights other than those in the patent database. ISO shall not be held responsible for identifying any or all such
patent rights.
viii © ISO 2025 /IEC 2026 – All rights reserved
viii
ISO/IEC DTR 23888-1:(en)
Information technology – — Artificial intelligence for multimedia – —
Part 1:
Vision and scenarios
1 Scope
This document provides the perspective onpresents the role of Artificial Intelligenceartificial intelligence (AI)
and Neural Networkneural network (NN) technologies in multimedia coding and processing activities. This
document fostersIt describes the visioncurrent perspectives on AI for multimedia and identifies the working
assumptions and technical challenges expected from working with AI and NN-based technologies. ItThis
document highlights a variety of multimedia coding activities, the key scenarios and gaps that willare to be
addressed by further standardization efforts.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https://www.iso.org/obp
Field Code Changed
— IEC Electropedia: available at https://www.electropedia.org/
Field Code Changed
3.1 General AI Concepts and Terminology
3.1.1
AI component
functional element that constructs an AI system (3.1.4)(3.1.4)
[SOURCE: ISO/IEC 22989:2022, 3.1.2]
3.1.13.1.2
activation function
function applied to the weighted combination of all inputs of a neuron (3.1.15)(3.1.15)
Note 1 to entry: Activation function allows neural networks to learn complicated features in the data. They are typically
non-linear.
[SOURCE: ISO/IEC 22989 :2022, 3.4.1]
3.1.23.1.3
artificial intelligence
AI
research and development of mechanisms and applications of AI systems (3.1.4)(3.1.4)
[SOURCE: ISO/IEC 22989:2022, 3.1.3, modified — Note 1 to entry has been removed.]
ISO/IEC DTR 23888-1:(en)
3.1.33.1.4
artificial intelligence system
AI system
engineered system that generates outputs such as content, forecasts, recommendations or decisions for a
given set of human-defined objectives
[SOURCE: ISO/IEC 22989:2022, 3.1.4, modified — Note 1 to entry has been removed.]
3.1.43.1.5
convolutional neural network
CNN
neural network (3.1.14)(3.1.14) using convolution (3.1.6)(3.1.6) in at least one of its layers
[SOURCE: ISO/IEC 22989:2022, 3.4.2, modified — removed "feed forward"]
3.1.53.1.6
convolution
mathematical operation involving a sliding dot product or cross-correlation of the input data.
[SOURCE: ISO/IEC 22989:2022, 3.4.3]
3.1.63.1.7
deep learning
deep neural network learning
approach to creating rich hierarchical representations through the training of neural
networks (3.1.14)(3.1.14) with many hidden layers.
Note 1 to entry: Deep learning is a subset of ML (3.1.11) (3.1.11)
[SOURCE: ISO/IEC 22989:2022, 3.4.4]
3.1.73.1.8
downstream task
The downstream task is a task that depends on the output of a previous task or process. It involves applying a
pre-trained model to a new problem.
3.1.83.1.9
inference
reasoning by which conclusions are derived from known premises
Note 1 to entry: In AI, a premise is either a fact, a rule, a model, a feature or raw data.
Note 2 to entry: The term "inference" refers both to the process and its result.
[SOURCE: ISO/IEC 22989:2022, 3.1.17]
3.1.93.1.10
model
physical, mathematical or otherwise logical representation of a system, entity, phenomenon, process or data
[SOURCE: ISO/IEC 22989:2022, 3.1.23]
3.1.103.1.11
machine learning
ML
process of optimizing model parameters (3.1.17)(3.1.17) through computational techniques, such that the
model's (3.1.10)(3.1.10) behaviour reflects the data or experience
2 © ISO 2025 /IEC 2026 – All rights reserved
ISO/IEC DTR 23888-1:(en)
[SOURCE: ISO/IEC 22989:2022, 3.3.5]
3.1.113.1.12
machine learning algorithm
algorithm to determine parameters (3.1.17)(3.1.17) of a machine learning model (3.1.10)(3.1.10) from data
according to given criteria
EXAMPLE
Consider a simple deblocking filter consisting of a single convolution layer, which creates an output image 𝐼𝐼 =𝐼𝐼 ⊗𝜃𝜃,
𝑜𝑜𝑜𝑜𝑜𝑜 𝑖𝑖𝑖𝑖
where Iin is the input image and theta is the parameter matrix of the convolution kernel, which are to be learned from a
set of pairs of input and output images.
[SOURCE: ISO/IEC 22989:2022, 3.3.6, modified — Example has been replaced]
3.1.123.1.13
machine learning model
mathematical construct that generates an inference (3.1.9)(3.1.9) or prediction based on input data or
information
EXAMPLE
For a simple deblocking filter 𝐼𝐼 =𝐼𝐼 ⊗𝜃𝜃 learned from image pairs (Iin, Iout), where the model is represented by the set
𝑜𝑜𝑜𝑜𝑜𝑜 𝑖𝑖𝑖𝑖
of parameters defined as 𝜃𝜃.
Note 1 to entry: A machine learning model results from training based on a machine learning algorithm (3.3.6).
[SOURCE: ISO/IEC 22989:2022, 3.3.7, modified — Example has been replaced]
3.1.133.1.14
neural network
NN
neural net
artificial neural network
network of one or more layers of neurons (3.1.15)(3.1.15) connected by weighted links
with adjustable weights, which takes input data and produces an output.
[SOURCE: ISO/IEC 22989:2022, 3.4.8], modified — Notes have been removed]
3.1.143.1.15
neuron
primitive processing element which takes one or more input values and produces an
output value by combining the input values and applying an activation function (3.1.2)(3.1.2) on the result
[SOURCE: ISO/IEC 22989:2022, 3.4.9, modified – note 1 to entry has been removed]
3.1.153.1.16
multilayered perceptron
neural network consisting of a group of source nodes, one or more hidden layers, and one output layer, and
using a monotonic activation function
Note 1 to entry: Each artificial neuron in a multilayered perceptron is a single-layer perceptron.
Note 2 to entry: Multilayered perceptrons can implement any Boolean function.
[SOURCE: ISO/IEC 2382:2015, modified — Note 3 to entry removed, "feedforward network modified to "
replaced by "neural netwok]network"]
ISO/IEC DTR 23888-1:(en)
3.1.163.1.17
parameter
model parameter
internal variable of a model (3.1.17)(3.1.17) that affects how it computes its outputs
Note 1 to entry: Examples of parameters include the weights in a neural network and the transition probabilities in a
Markov model.
[SOURCE: ISO/IEC 22989:2022, 3.3.8]

3.2 Abbreviated terms
The following abbreviated terms are used in this document:
AI artificial intelligence
AR augmented reality
AVC advanced video coding
BIM building information modelling
CCTV closed circuit television
DASH dynamic adaptive streaming over HTTP
DNN deep neural networks
FCM feature coding for machines
GOP group of pictures
HEVC high efficiency video coding
HTTP hypertext transfer protocol
ICT information and communication technologies
IoT internet of things
IoV internet of vehicles
IR infra-red
kMAC kilo multiplication-accumulation
LFNST low-frequency non-separable transform
LIDAR laser imaging detection and ranging
MIP matrix interpolation prediction
4 © ISO 2025 /IEC 2026 – All rights reserved
ISO/IEC DTR 23888-1:(en)
ML machine learning
mMTC massive machine type communications
MPEG Moving Picture Expert Group
MR mixed reality
NN neural network
NNC neural network coding
NNEF Neural Network Exchange Format
ONNX® Open Neural Network Exchange Format
PSNR peak signal to noise ratio
QoE quality of experience
RGB red-green-blue
RPR reference picture resampling
SEI supplemental enhancement information
VCM video coding for machines
VR virtual reality
VTM VVC test model
VVC versatile video coding
V2X vehicle to everything
XR extended reality
AI artificial intelligence
AR augmented reality
AVC advanced video coding
BIM building information modelling
CCTV closed circuit television
DASH dynamic adaptive streaming over HTTP
DNN deep neural networks
FCM feature coding for machines
GOP group of pictures
HEVC high efficiency video coding
HTTP hypertext transfer protocol
ICT information and communication technologies
ISO/IEC DTR 23888-1:(en)
IoT internet of things
IoV internet of vehicles
IR infra-red
kMAC kilo multiplication-accumulation
LFNST low-frequency non-separable transform
LIDAR laser imaging detection and ranging
MIP matrix interpolation prediction
ML machine learning
mMTC massive machine type communications
MPEG Moving Picture Expert Group
MR mixed reality
NN neural network
NNC neural network coding
NNEF Neural Network Exchange Format
ONNX® Open Neural Network Exchange Format
PSNR peak signal to noise ratio
QoE quality of experience
RGB red-green-blue
RPR reference picture resampling
SEI supplemental enhancement information
VCM video coding for machines
VR virtual reality
VTM VVC test model
VVC versatile video coding
V2X vehicle to everything
XR extended reality
4 Vision on artificial intelligence for multimedia
ISO/IEC JTC1/SC29 has started to embrace AI-based methods since the development of the compact video
[1]
descriptors, ISO/IEC 15938-15:2019 [1], in 2019., in 2019. Today, many standardisation projects are
exploring the interplay of AI and multimedia and start adopting AI or NN-based technologies. As these
technologies become a core enabler in emerging standards, MPEG-AI has been established as an umbrella for
such standardization activities.
MPEG-AI considers two aspects of interaction between AI and multimedia,
— AI as multimedia coding tool: This includes AI-based multimedia representations and coding. Examples
include techniques for obtaining AI-based content descriptors and AI-based information compression and
coding such as AI-based video compression.
6 © ISO 2025 /IEC 2026 – All rights reserved
ISO/IEC DTR 23888-1:(en)
— Multimedia for consumption by AI: Any multimedia representation that can be consumed by an AI system.
This covers both AI-based and non-AI based representations that facilitate the delivery of content to an AI
system. Examples include techniques for compressing a neural network. Additionally, it includes the
guidelines or frameworks for applying existing coding tools to facilitate AI-based content usage. Such
guidelines are accessible in the form of technical reports, e.g.,. applications of versatile video coding to
[8]
machine video consumption [8]. .
AI-based standards can solely base on AI methods or use them for one or more tools of a processing chain,
combining the complementary strengths of different methods in a hybrid architecture. MPEG-AI standards
specifies one or more of the following technologies:
— Multimedia representation. AI-based methods are used to encode multimedia content or derived
features and descriptors efficiently. Multimedia content includes traditional representations such as video,
audio, and graphics, but also representations relevant for realising the metaverse, such as implicit scene
representations.
— Analysis and processing. AI-based methods are used for the extraction of features or descriptors, for the
reconstruction and processing (e.g.,. improvement) of multimedia content, or the selection of parameters
for content distribution.
— Supporting technologies. Technologies for the efficient representation and deployment of neural
network models, and metadata representations for AI-based processes.
The use of AI-based technologies comes with additional computational complexity, so that trading off
improvement in terms of a task-specific performance metric (e.g.,. coding gain) vs. computation effort mustis
to be made. MPEG-AI takes these issues into account based on the requirements of the specific tasks. Another
aspect is the dependency on specific frameworks or representation of neural networks, in particular for the
inference. A discussion of this issue and the assumptions made in MPEG-AI is provided in clause 5.Clause 5.
The technologies expected to be addressed by MPEG-AI relate to media representation, analysis, synthesis,
and processing. It is recognized that there are ethical concerns related to the introduction of AI and NN-based
methods. The standardisation process fosters transparency and thorough assessment of technologies, aiming
at high robustness and trustworthiness of the resulting technologies.

5 Technical working assumptions and general expectations (challenges)
5.1 Interoperability
Whether an end-to-end or a Hybrid codec architecture is designed, the interoperability between receiver and
senders needs to be achieved. As in traditional coding, the bitstream at the split point (usually after the entropy
coder) needs to be well specified.
Additional information for AI coding interoperability needs to be considered. For example:
The deployment of trained neural network models relies on interoperable model formats. While this aspect is
[9]
addressed by exchange formats such as the Neural Network Exchange Format (NNEF) [9]) or the Open
ISO/IEC DTR 23888-1:(en)
1 1) [10]
Neural Network Exchange Format (ONNX® ) [10], ), support for compressed parameters is very limited
in these formats. For applications such as the deployment of neural networks for object detection to smart
cameras or for post filtering in video decoders, an interoperable representation of compressed models needs
to be specified in order to reduce the needed bandwidth.
Neural network-based post filters are effective components in video decoders, and updated models can be
provided to optimise for specific content properties. If the model is transmitted in compressed format, either
as part of the video bitstream or out of band, an interoperable bitstream format, such as the one specified in
[2]
ISO/IEC 15938-17[2],, needs to be used. In addition, appropriate signalling in the bitstream needs to be
included in the bitstream to indicate the model format and how it is transmitted.
5.2 Functional aspects
5.2.1 Reproducibility of experiments
The reproducibility of experiments is a challenge for neural network-based AI models due to the training
strategies and scale of experiments. To facilitate the reproducibility of technical contributions some key
strategies are considered during the development of the reference software and experimentations, including,
— Detailed documentation: providing comprehensive documentation that includes information on
dependencies, libraries, and specific versions to help others recreate the same environment.
— Seed values for randomization: setting seed values for random number generators. This ensures that
random processes will be reproducible.
— Data versioning: ensures that everyone accesses the exact same dataset.
— Containerization: allows encapsulating the complete environment and guarantees that the code runs
consistently across different systems. Example of such containerization tools are docker, apptainer, and
alikethe like.
— Hyperparameters: Clearly stating the hyperparameters of a model facilitates replication of experiments.
— Experiment logging: complementing experiments with logs that include parameters, metrics, and results
allows people to better track the development of a solution within a reference software.
— Checkpoints and model weights: saving and sharing the model checkpoints allows others to compare their
obtained results and compare their results in a better-informed fashion.
5.2.2 Bit-exact neural network models
Achieving bit-exact reproducibility in neural networks is a challenge due to various factors, including
hardware differences, parallelization, and non-deterministic operations. The standards happen not to be able
to enforce bit-exact reproducibility due to the complications of hardware dependencies. Instead, they can
consider approaches for alleviating the concerns by providing informative guidelines or careful
considerations during development of the standard and reference software. Such guidelines are not
necessarily to be essential to software/hardware specific implementation of the standard and considered as
informative annexes, unless proven otherwise. For example, it is a common knowledge that lower complexity
neural networks’ weights and activations such as integer quantized can help achieving more consistent
results. Yet, there exist additional multiple methodologies for achieving more consistent results, e.g.,. by

ONNX is the trademark of a product owned by LF PROJECTS, LLC. This information is given for the convenience of users
of this document and does not constitute an endorsement by ISO/IEC of the product named.
1)
ONNX is the trademark of a product owned by LF PROJECTS, LLC. This information is given for the convenience of
users of this document and does not constitute an endorsement by ISO/IEC of the product named.

8 © ISO 2025 /IEC 2026 – All rights reserved
ISO/IEC DTR 23888-1:(en)
aligning operation orders, AI-supporting software/hardware infrastructures, or enforcing a proper training
strategy, etc., and the performance varies based on underlying software/hardware platform. Therefore, one
adopts such methodologies (e.g.,. quantized models) during development, though such specific approaches
for obtaining a more consistent result which are not be suitable for all software/hardware platforms. The
standards pay attention to the limitations that enforcing a specific implementation poses to the rate of
adoption and success of a standard. Each standard carefully discusses these aspects within the scope of its
purpose.
5.2.3 Example of experiment and evaluation environment
Following is an experiment and evaluation environment setup example to ensure experiment reproducibility
and bit-exact evaluation:
— Hardware environment: list CPU and GPU specification, hardware memory, virtual memory.
— Software environment: List operating system kernel, coding environment, software buffer.
— Neural Network description: neural network definition and type, version
— Training description: dataset type, version, training algorithm, number of epochs.
— Bit-exact evaluation: md5 checksum, use of dump file for crosscheck.
5.3 Performance aspects
5.3.1 Complexity
In addition to the encoding and decoding time, the following metrics are considered:
— Complexity in terms of multiply-accumulate operations per sample, or kMAC/sample. This number mustis
to be kept low enough to allow decoding in future handheld devices without emptying the battery too
quickly. It is also important to keep kMAC/sample low to ensure economically feasible sizes when
implemented in silicon.
— Complexity in terms of number of parameters of the model. Since models need to be swapped in and out
of memory, keeping the parameter count low is be necessary to make a decoder implementable.
5.3.2 Efficiency
The AI-based systems pose challenges for conventional efficiency metrics or necessitate introducing new
efficiency metrics. For example, in video coding for machines, conventional metrics of video compression such
as BD-Rate is not sufficient and alternatives that can convey a machine task performance and compression
rate become relevant. It becomes, then, necessary that a proper metric is determined in each activity. In
determining a proper efficiency metric, some characteristics can be considered including, not limited to,
relevance, accuracy (correctness in reflecting true performance), consistency, comparability, sensitivity, and
robustness.
6 Technologies and use cases
6.1 General
Section 6Clause 6 describes the main scenarios that are considered AI relevant. They are used to derive the
working assumptions for artificial intelligence applications and services listed in clause 5.Clause 5. The
following topics are described in this clause:
ISO/IEC DTR 23888-1:(en)
6.21.1 AI Model Compression
— AI model compression;
— Video Codingcoding for AI-based Downstream Tasksdownstream tasks;
— Feature Codingcoding for AI models;
— Support of AI-based video processing by VSEI messages;
— AI based point cloud coding.

6.36.2 AI-based Video Coding
6.3.1 Introduction
6.2.1 General
A video codec aims at reducing the bit rate of raw video while retaining a low level of distortion and keeping
complexity (operations per pixel, memory constraints etc.) reasonable. It does so by exploiting temporal and
spatial redundancies, (e.g.,. copying samples from previous pictures or predicting samples from neighbouring
blocks. ). To some extent existing video codecs also use knowledge about how real video signals typically
behave, e.g.,for example, deblocking can lower distortion since it is known that real-world raw video seldomly
includes blocking artifacts. Learning-based methods exploit such knowledge more directly by learning
statistical aspects of raw video. The versatile video coding standard VVC includes some AI-based methods such
as secondary transforms (LFNST) and matrix interpolation prediction (MIP), but these remain linear due to
complexity concerns. AI-based methods to improve video compression beyond VVC are expected to be part of
a future video coding standard with the focus is on (non-linear) neural network-based methods.
A distinction is made between video codecs that are entirely based on neural networks, so-called end-to-end
codecs, and so-called hybrid codecs where some parts are similar to existing video coding standards and other
parts are neural network based. The latter approach is described in this clause.
6.3.26.2.2 Key performance indicators for neural network-based tools
In this use case, neural network-based methods replace or enhance existing tools in an otherwise traditional
video codec. To progress the work VVC is used as a base, and neural network-based tools are inserted at
various places to investigate how bit rate is saved while maintaining picture quality. The neural network-
based tools are evaluated on a few key performance indicators:
— BD-rate, or Bjontegaard-Delta rate, is a measure of how much the bit rate is lowered at a constant quality
level as measured in PSNR. For neural network-based methods to be interesting, they mustare to provide
a sufficient BD-rate reduction compared to traditional methods.
— Complexity in terms of multiply-accumulate operations per sample, or kMAC/sample. This number mustis
to be kept low enough to allow decoding in future handheld devices without emptying the battery too
quickly. It is also important to keep kMAC/sample low to ensure economically feasible sizes when
implemented in silicon.
— Complexity in terms of number of parameters of the model. Since models are needneeded to be swapped
in and out of memory, keeping the parameter count low is necessary to make a decoder implementable.
There are also other performance indicators discussed, but the three above-mentioned ones are deemed the
most important.
10 © ISO 2025 /IEC 2026 – All rights reserved
ISO/IEC DTR 23888-1:(en)
6.3.36.2.3 Neural network-based in-loop filters
An in-loop filter is an entity that takes in the decoded picture and processes it to remove artifacts. Crucially, it
is the processed picture that is later used for prediction. This is important, since effort spent on a certain
picture can not only positively affect that picture, but also subsequent pictures that predict from it. VVC
includes three loop filters (deblocking, sample adaptive offset and adaptive loop filter). This can be
complemented by the addition of a non-linear neural network-based loop filter.
6.3.46.2.4 Neural network-based intra prediction
On
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...

Information technology — Artificial intelligence for multimedia — Part 1: Vision and scenarios

Technologies de l'information — Intelligence artificielle pour le multimédia — Partie 1: Vision et scénarios

General Information

Overview

Key Topics

Applications

Related Standards

Buy Documents

ISO/IEC DTR 23888-1 - Information technology — Artificial intelligence for multimedia — Part 1: Vision and scenarios/9/2026

REDLINE ISO/IEC DTR 23888-1 - Information technology — Artificial intelligence for multimedia — Part 1: Vision and scenarios/9/2026

Get Certified

BSI Group

NYCE

Frequently Asked Questions

Standards Content (Sample)

Questions, Comments and Discussion

This May Also Interest You