Information technology — Coding of audio-visual objects — Part 3: Audio

ISO/IEC 14496-3:2009 integrates many different types of audio coding: natural sound with synthetic sound, low bitrate delivery with high-quality delivery and lossless coding, speech with music, complex soundtracks with simple ones, and traditional content with interactive and virtual-reality content. By standardizing individually sophisticated coding tools - as well as a novel, flexible framework for audio synchronization, mixing and downloaded post-production - ISO/IEC 14496-3:2009 creates adequate technology for a new, interactive world of digital audio. ISO/IEC 14496-3:2009, unlike previous audio standards created by ISO/IEC and other groups, does not target a single application such as real-time telephony or high-quality audio compression. Rather, it applies to every application requiring the use of advanced sound compression, synthesis, manipulation or playback. ISO/IEC 14496-3:2009 specifies state-of-the-art coding tools in several domains. As these tools are integrated with the other parts of ISO/IEC 14496, new possibilities for object-based audio coding, interactive presentation, dynamic soundtracks, and other sorts of new media are enabled. Since a single set of tools is used to cover the needs of a broad range of applications, interoperability is a natural feature of systems that build on ISO/IEC 14496-3:2009.

Technologies de l'information — Codage des objets audiovisuels — Partie 3: Codage audio

General Information

Status
Withdrawn
Publication Date
25-Aug-2009
Withdrawal Date
25-Aug-2009
Current Stage
9599 - Withdrawal of International Standard
Start Date
12-Dec-2019
Completion Date
19-Apr-2025
Ref Project

Relations

Standard
ISO/IEC 14496-3:2009 - Information technology -- Coding of audio-visual objects
English language
1381 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO/IEC 14496-3:2009 - Information technology -- Coding of audio-visual objects
English language
1381 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)


INTERNATIONAL ISO/IEC
STANDARD 14496-3
Fourth edition
2009-09-01
Information technology — Coding of
audio-visual objects —
Part 3:
Audio
Technologies de l'information — Codage des objets audiovisuels —
Partie 3: Codage audio
Reference number
©
ISO/IEC 2009
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.

©  ISO/IEC 2009
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2009 – All rights reserved

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC 14496-3 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
This fourth edition cancels and replaces the third edition (ISO/IEC 14496-3:2005), which has been technically
revised. It also incorporates the Amendments ISO/IEC 14496-3:2005/Amd.1:2007,
ISO/IEC 14496-3:2005/Amd.2:2006, ISO/IEC 14496-3:2005/Amd.3:2006, ISO/IEC 14496-3:2005/Amd.5:2007,
ISO/IEC 14496-3:2005/Amd.8, ISO/IEC 14496-3:2005/Amd.9:2008, and the Technical Corrigenda
ISO/IEC 14496-3:2005/Cor.2:2008, ISO/IEC 14496-3:2005/Cor.3:2008, ISO/IEC 14496-3:2005/Cor.4:2008,
ISO/IEC 14496-3:2005/Cor.5:2008, ISO/IEC 14496-3:2005/Amd.2:2006/Cor.1:2006,
ISO/IEC 14496-3:2005/Amd.2:2006/Cor.2:2008, ISO/IEC 14496-3:2005/Amd.2:2006/Cor.3:2008,
ISO/IEC 14496-3:2005/Amd.3:2006/Cor.1:2008.
ISO/IEC 14496 consists of the following parts, under the general title Information technology — Coding of
audio-visual objects:
⎯ Part 1: Systems
⎯ Part 2: Visual
⎯ Part 3: Audio
⎯ Part 4: Conformance testing
⎯ Part 5: Reference software
⎯ Part 6: Delivery Multimedia Integration Framework (DMIF)
⎯ Part 7: Optimized reference software for coding of audio-visual objects [Technical Report]
⎯ Part 8: Carriage of ISO/IEC 14496 contents over IP networks
⎯ Part 9: Reference hardware description [Technical Report]
⎯ Part 10: Advanced Video Coding
© ISO/IEC 2009 – All rights reserved iii

⎯ Part 11: Scene description and application engine
⎯ Part 12: ISO base media file format
⎯ Part 13: Intellectual Property Management and Protection (IPMP) extensions
⎯ Part 14: MP4 file format
⎯ Part 15: Advanced Video Coding (AVC) file format
⎯ Part 16: Animation Framework eXtension (AFX)
⎯ Part 17: Streaming text format
⎯ Part 18: Font compression and streaming
⎯ Part 19: Synthesized texture stream
⎯ Part 20: Lightweight Application Scene Representation (LASeR) and Simple Aggregation Format (SAF)
⎯ Part 21: MPEG-J Graphics Framework eXtensions (GFX)
⎯ Part 22: Open Font Format
⎯ Part 23: Symbolic Music Representation
⎯ Part 24: Audio and systems interaction [Technical Report]
⎯ Part 25: 3D Graphics Compression Model
Audio conformance and 3D graphics conformance will form the subjects of the future Parts 26 and 27,
respectively.
iv © ISO/IEC 2009 – All rights reserved

0 Introduction
0.1 Overview
This part of ISO/IEC 14496 (MPEG-4 Audio) is a new kind of audio standard that integrates many different types of
audio coding: natural sound with synthetic sound, low bitrate delivery with high-quality delivery, speech with music,
complex soundtracks with simple ones, and traditional content with interactive and virtual-reality content. By
standardizing individually sophisticated coding tools as well as a novel, flexible framework for audio
synchronization, mixing, and downloaded post-production, the developers of the MPEG-4 Audio standard have
created new technology for a new, interactive world of digital audio.
MPEG-4, unlike previous audio standards created by ISO/IEC and other groups, does not target a single
application such as real-time telephony or high-quality audio compression. Rather, MPEG-4 Audio is a standard
that applies to every application requiring the use of advanced sound compression, synthesis, manipulation, or
playback. The subparts that follow specify the state-of-the-art coding tools in several domains; however, MPEG-4
Audio is more than just the sum of its parts. As the tools described here are integrated with the rest of the MPEG-4
standard, exciting new possibilities for object-based audio coding, interactive presentation, dynamic soundtracks,
and other sorts of new media, are enabled.
Since a single set of tools is used to cover the needs of a broad range of applications, interoperability is a natural
feature of systems that depend on the MPEG-4 Audio standard. A system that uses a particular coder — for
example a real-time voice communication system making use of the MPEG-4 speech coding toolset — can easily
share data and development tools with other systems, even in different domains, that use the same tool — for
example a voicemail indexing and retrieval system making use of MPEG-4 speech coding.
The remainder of this clause gives a more detailed overview of the capabilities and functioning of MPEG-4 Audio.
First a discussion of concepts, that have changed since the MPEG-2 Audio standards, is presented. Then the
MPEG-4 Audio toolset is outlined.
0.2 Concepts of MPEG-4 Audio
As with previous MPEG standards, MPEG-4 does not standardize methods for encoding sound. Thus, content
authors are left to their own decisions as to the best method of creating bitstream payloads. At the present time,
methods to automatically convert natural sound into synthetic or multi-object descriptions are not mature; therefore,
most immediate solutions will involve interactively-authoring the content stream in some way. This process is
similar to current schemes for MIDI-based and multi-channel mixdown authoring of soundtracks.
Many concepts in MPEG-4 Audio are different than those in previous MPEG Audio standards. For the benefit of
readers who are familiar with MPEG-1 and MPEG-2 we provide a brief overview here.
0.2.1 Audio storage and transport facilities
In all of the MPEG-4 tools for audio coding, the coding standard ends at the point of constructing access units that
contain the compressed data. The MPEG-4 Systems (ISO/IEC 14496-1) specification describes how to convert
these individually coded access units into elementary streams.
There is no standard transport mechanism of these elementary streams over a channel. This is because the broad
range of applications that can make use of MPEG-4 technology have delivery requirements that are too wide to
easily characterize with a single solution. Rather, what is standardized is an interface (the Delivery Multimedia
Interface Format, or DMIF, specified in ISO/IEC 14496-6) that describes the capabilities of a transport layer and the
communication between transport, multiplex, and demultiplex functions in encoders and decoders. The use of
DMIF and the MPEG-4 Systems specification allows transmission functions that are much more sophisticated than
are possible with previous MPEG standards.
© ISO/IEC 2009 — All rights reserved v

However, LATM and LOAS were defined to provide a low overhead audio multiplex and transport mechanism for
natural audio applications, which do not require sophisticated object-based coding or other functions provided by
MPEG-4 Systems.
Table 0.1 gives an overview about the multiplex, storage and transmission formats currently available for MPEG-4
Audio within the MPEG-4 framework:
Table 0.1 — Format overview
Format Functionality defined in Functionality originally Description
MPEG-4: defined in:
M4Mux ISO/IEC 14496-1 - MPEG-4 Multiplex scheme
(normative)
LATM ISO/IEC 14496-3 - Low Overhead Audio Transport
(normative) Multiplex
ADIF ISO/IEC 14496-3 ISO/IEC 13818-7 Audio Data Interchange Format,
(informative) (normative) (AAC only)
MP4FF ISO/IEC 14496-12 - MPEG-4 File Format
(normative)
ADTS ISO/IEC 14496-3 ISO/IEC 13818-7 Audio Data Transport Stream,
(informative) (normative, exemplarily) (AAC only)
LOAS ISO/IEC 14496-3 - Low Overhead Audio Stream, based
(normative, exemplarily) on LATM, three versions are
available:
AudioSyncStream()
EPAudioSyncStream()
AudioPointerStream()
To allow for a user on the remote side of a channel to dynamically control a server streaming MPEG-4 content,
MPEG-4 defines backchannel streams that can carry user interaction information.
0.2.2 MPEG-4 Audio supports low-bitrate coding
Previous MPEG Audio standards have focused primarily on transparent (undetectable) or nearly transparent coding
of high-quality audio at whatever bitrate was required to provide it. MPEG-4 provides new and improved tools for
this purpose, but also standardizes (and has tested) tools that can be used for transmitting audio at the low bitrates
suitable for Internet, digital radio, or other bandwidth-limited delivery. The new tools specified in MPEG-4 are the
state-of-the-art tools that support low-bitrate coding of speech and other audio.
0.2.3 MPEG-4 Audio is an object-based coding standard with multiple tools
Previous MPEG Audio standards provided a single toolset, with different configurations of that toolset specified for
use in various applications. MPEG-4 provides several toolsets that have no particular relationship to each other,
each with a different target function. The profiles of MPEG-4 Audio specify which of these tools are used together
for various applications.
Further, in previous MPEG standards, a single (perhaps multi-channel or multi-language) piece of content was
transmitted. In contrast, MPEG-4 supports a much more flexible concept of a soundtrack. Multiple tools may be
used to transmit several audio objects, and when using multiple tools together an audio composition system is
provided to create a single soundtrack from the several audio substreams. User interaction, terminal capability, and
speaker configuration may be used when determining how to produce a single soundtrack from the component
objects. This capability gives MPEG-4 significant advantages in quality and flexibility when compared to previous
audio standards.
0.2.4 MPEG-4 Audio provides capabilities for synthetic sound
In natural sound coding, an existing sound is compressed by a server, transmitted and decompressed at the
receiver. This type of coding is the subject of many existing standards for sound compression. In contrast, MPEG-4
standardizes a novel paradigm in which synthetic sound descriptions, including synthetic speech and synthetic
vi © ISO/IEC 2009 — All rights reserved

Transmission Storage Multiplex

music, are transmitted and then synthesized into sound at the receiver. Such capabilities open up new areas of
very-low-bitrate but still very-high-quality coding.
0.2.5 MPEG-4 Audio provides capabilities for error robustness
Improved error robustness capabilities for all coding tools are provided through the error resilient bitstream payload
syntax. This tool supports advanced channel coding techniques, which can be adapted to the special needs of
given coding tools and a given communications channel. This error resilient bitstream payload syntax is mandatory
for all error resilient object types.
The error protection tool (EP tool) provides unequal error protection (UEP) for MPEG-4 Audio in conjunction with
the error resilient bitstream payload. UEP is an efficient method to improve the error robustness of source coding
schemes. It is used by various speech and audio coding systems operating over error-prone channels such as
mobile telephone networks or Digital Audio Broadcasting (DAB). The bits of the coded signal representation are
first grouped into different classes according to their error sensitivity. Then error protection is individually applied to
the different classes, giving better protection to more sensitive bits.
Improved error robustness for AAC is provided by a set of error resilience tools. These tools reduce the perceived
degradation of the
...


INTERNATIONAL ISO/IEC
STANDARD 14496-3
Fourth edition
2009-09-01
Information technology — Coding of
audio-visual objects —
Part 3:
Audio
Technologies de l'information — Codage des objets audiovisuels —
Partie 3: Codage audio
Reference number
©
ISO/IEC 2009
PDF disclaimer
PDF files may contain embedded typefaces. In accordance with Adobe's licensing policy, such files may be printed or viewed but shall
not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading a PDF file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create the PDF file(s) constituting this document can be found in the General Info relative to
the file(s); the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the files are suitable for
use by ISO member bodies. In the unlikely event that a problem relating to them is found, please inform the Central Secretariat at the
address given below.
This CD-ROM contains the publication ISO/IEC 14496-3:2009 in portable document format (PDF), which can
be viewed using Adobe® Acrobat® Reader.
Adobe and Acrobat are trademarks of Adobe Systems Incorporated.

©  ISO/IEC 2009
All rights reserved. Unless required for installation or otherwise specified, no part of this CD-ROM may be reproduced, stored in a retrieval
system or transmitted in any form or by any means without prior permission from ISO. Requests for permission to reproduce this product
should be addressed to
ISO copyright office • Case postale 56 • CH-1211 Geneva 20 • Switzerland
Internet copyright@iso.org
R
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.