Information technology — Internet of media things — Part 1: Architecture

This document describes the architecture of systems for the internet of media things. It also includes a comprehensive set of use cases that can be deployed on such an architecture.

Technologies de l'information — Internet des objets media — Partie 1: Architecture

General Information

Status
Published
Publication Date
18-Nov-2025
Current Stage
6060 - International Standard published
Start Date
19-Nov-2025
Due Date
18-Jan-2026
Completion Date
19-Nov-2025
Ref Project

Relations

Standard
ISO/IEC 23093-1:2025 - Information technology — Internet of media things — Part 1: Architecture Released:19. 11. 2025
English language
31 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)


International
Standard
ISO/IEC 23093-1
Third edition
Information technology — Internet
2025-11
of media things —
Part 1:
Architecture
Technologies de l'information — Internet des objets media —
Partie 1: Architecture
Reference number
© ISO/IEC 2025
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO/IEC 2025 – All rights reserved
ii
Contents Page
Foreword .v
Introduction .vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 Internet of media things terms .1
3.2 Internet of things terms .2
4 Architecture. 4
5 Use cases . 5
5.1 General .5
5.2 Smart spaces: Monitoring and control with network of audio-video cameras .8
5.2.1 General .8
5.2.2 Human tracking with multiple network cameras .8
5.2.3 Dangerous region surveillance system .8
5.2.4 Intelligent firefighting with IP surveillance cameras .9
5.2.5 Automatic security alert and title generation system using, time, GPS and visual
information .9
5.2.6 Pedestrian-car accident detection in video using prediction result description .10
5.2.7 Networked digital signs for customized advertisement .10
5.2.8 Digital signage and second screen use .10
5.2.9 Self-adaptive quality of experience for multimedia applications .11
5.2.10 Ultra-wide viewing video composition .11
5.2.11 Face recognition to evoke sensorial actuations. 12
5.2.12 Automatic video clip generation by detecting event information . 12
5.2.13 Temporal synchronization of multiple videos for creating 360° or multiple view
video . 12
5.2.14 Intelligent similar content recommendations using information from IoMT
devices . 13
5.2.15 Understand and explain events in video by instance segmentation . 13
5.2.16 Indoor/outdoor acoustic event detection . 13
5.2.17 Safety equipment detection on construction sites . 13
5.3 Smart spaces: Multi-modal guided navigation .14
5.3.1 General .14
5.3.2 Blind person assistant system .14
5.3.3 Elderly people assistance with consecutive vibration haptic devices .14
5.3.4 Personalized navigation by visual communication . 15
5.3.5 Personalized tourist navigation with natural language functionalities . 15
5.3.6 Smart identifier: Face recognition on smart glasses .16
5.3.7 Smart advertisement: QR code recognition on smart glasses .17
5.4 Smart audio/video environments in smart cities .17
5.4.1 General .17
5.4.2 Smart factory: Car maintenance assistance A/V system using smart glasses .17
5.4.3 Smart museum: Augmented visit using smart glasses.18
5.4.4 Smart house: enhanced perception modes .19
5.4.5 Smart house: control of home appliance devices . 20
5.4.6 Smart car: Head-light adjustment and speed monitoring to provide automatic
volume control . . 20
5.5 Smart audio/video environments in smart rural areas .21
5.5.1 General .21
5.5.2 Crop smart farming .21
5.5.3 Smart crop growth monitoring .21
5.5.4 Livestock smart farming . 22
5.6 Smart multi-modal collaborative health . 23

© ISO/IEC 2025 – All rights reserved
iii
5.6.1 General . 23
5.6.2 Increasing patient autonomy by remote control of left-ventricular assisted
devices . 23
5.6.3 Diabetic coma prevention. 23
5.6.4 Enhanced physical activity with smart fabrics networks.24
5.6.5 Medical assistance with smart glasses .24
5.6.6 Managing healthcare information for smart glasses . 25
5.6.7 Emergency health event detection with infrared camera . 26
5.6.8 Personalized detection of health danger by multimodal data sensing and
processing . 26
5.6.9 Multimodal question answer with blood pressure data.27
5.6.10 Indoor air quality prediction . 28
5.7 Blockchain usage for IoMT transactions authentication and monetizing . 28
5.7.1 General . 28
5.7.2 Reward function in IoMT people counting by using blockchains . 28
5.7.3 Content authentication with blockchains . 29
5.8 Metaverse usage of IoMT technologies . 29
5.8.1 General . 29
5.8.2 Human pose estimation for avatar animation . 29
5.8.3 Facial landmark detection for human avatar animation . 30
Bibliography .31

© ISO/IEC 2025 – All rights reserved
iv
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations,
governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of document should be noted. This document was drafted in accordance with the editorial rules of the ISO/
IEC Directives, Part 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
ISO and IEC draw attention to the possibility that the implementation of this document may involve the
use of (a) patent(s). ISO and IEC take no position concerning the evidence, validity or applicability of any
claimed patent rights in respect thereof. As of the date of publication of this document, ISO and IEC had
received notice of (a) patent(s) which may be required to implement this document. However, implementers
are cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents and https://patents.iec.ch. ISO and IEC shall not be held
responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www.iso.org/iso/foreword.html.
In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
This third edition cancels and replaces the second edition (ISO/IEC 23093-2:2022), which has been
technically revised.
The main changes are as follows:
— addition of complementary use cases;
— addition of sequence diagrams and mission state diagrams for the use-case description in order to
enhance the readability of the document.
A list of all parts in the ISO/IEC 23093 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.

© ISO/IEC 2025 – All rights reserved
v
Introduction
The ISO/IEC 23093 series provides an architecture and specifies application programming interfaces (APIs)
and compressed representation of data flowing between media things.
The APIs for the media things facilitate discovering other media things in the network, connecting and
efficiently exchanging data between media things. The APIs also provide means for supporting transaction
tokens in order to access valuable functionalities, resources, and data from media things.
Media things related information consists of characteristics and discovery data, setup information from
a system designer, raw and processed sensed data, and actuation information. The ISO/IEC 23093 series
specifies data formats of input and output for media sensors, media actuators, media storages, media
analysers, etc. Sensed data from media sensors can be processed by media analysers to produce analysed
data, and the media analysers can be cascaded in order to extract semantic information.
This document does not specify how the process of sensing and analysing is carried out but specifies the
interfaces between the media things. This document describes the architecture of systems for the internet
of media things.
© ISO/IEC 2025 – All rights reserved
vi
International Standard ISO/IEC 23093-1:2025(en)
Information technology — Internet of media things —
Part 1:
Architecture
1 Scope
This document describes the architecture of systems for the internet of media things. It also includes a
comprehensive set of use cases that can be deployed on such an architecture.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1 Internet of media things terms
3.1.1
audio
anything related to sound in terms of receiving, transmitting or reproducing it or of its specific frequency
3.1.2
camera
special form of an image capture device (3.1.6) that senses and captures photo-optical signals
3.1.3
display
visual representation of the output of an electronic device or the portion of an electronic device that shows
this representation, as a screen, lens or reticle
3.1.4
gesture
movement or position of the hand, arm, body, head or face that is expressive of an idea, opinion, emotion, etc.
3.1.5
haptics
input or output device that senses or actuates the body's movements by means of physical contact with the user
3.1.6
image capture device
device which is capable of sensing and capturing acoustic, electrical or photo-optical signals of a physical
entity that can be converted into an image

© ISO/IEC 2025 – All rights reserved
3.1.7
internet of media things
IoMT
special subset of IoT (3.2.8) whose main functionalities are related to media processing
3.1.8
IoMT device
IoT (3.2.8) device that contains more than one MThing (3.1.11)
3.1.9
IoMT system
MSystem
IoT (3.2.8) system whose main functionality is related to media processing
3.1.10
media
data that can be rendered, including audio, video, text, graphics, images, haptic and tactile information
Note 1 to entry: These data can be timed or non-timed.
3.1.11
media thing
MThing
thing (3.2.18) capable of sensing, acquiring, actuating, or processing of media or metadata
3.1.12
media token
virtual token for accessing functionalities, resources and data of media things
3.1.13
microphone
entity capable of capture and transform acoustic waves into changes in electric currents or voltage, used in
recording or transmitting sound
3.2 Internet of things terms
3.2.1
actuator
component which conveys digital information to effect a change of some property of a physical entity
3.2.2
component
modular, deployable and replaceable part of a system that encapsulates implementations
[1]
Note 1 to entry: A component can expose or use interfaces (local or on a network) to interact with other entities, see
ISO 19104. A component which exposes or uses network interfaces is called an endpoint.
3.2.3
digital entity
any computational or data element of an IT-based system
Note 1 to entry: It may exist as a service based in a data centre or cloud, or a network element or a gateway.
3.2.4
discovery
service to find unknown resources/entities/services based on a rough specification of the desired result
Note 1 to entry: It may be utilized by a human or another service; credentials for authorization are considered when
executing the discovery, see ISO/IEC 30141.

© ISO/IEC 2025 – All rights reserved
3.2.5
entity
anything (physical or non-physical) having a distinct existence
3.2.6
identifier
information that unambiguously distinguishes one entity (3.2.6) from another one in a given identity context
3.2.7
identity
characteristics determining who or what a person or thing is
3.2.8
internet of things
IoT
infrastructure of interconnected objects, people, systems and information resources together with
intelligent services to allow them to process information of the physical and the virtual world and to react
3.2.9
interface
shared boundary between two functional components, defined by various characteristics pertaining to the
functions, physical interconnections, signal exchanges, and other characteristics, as appropriate
Note 1 to entry: See ISO/IEC 13066-1.
3.2.10
IoT system
system that is comprised of functions that provide the system the capabilities for identification, sensing,
actuation, communication and management, and applications and services to a user
Note 1 to entry: See Reference [2].
3.2.11
network
entity that connects endpoints, sources to destinations, and may itself act as a value-added element in the
IoT system or services
3.2.12
process
procedure to carry out operations on data
3.2.13
physical entity
thing (3.2.18) that is discrete, identifiable and observable, and that has material existence in real world
3.2.14
resource
any element of a data processing system needed to perform required operations
Note 1 to entry: See ISO/IEC 2382.
3.2.15
sensor
device that observes and measures a physical property of a natural phenomenon or man-made process and
converts that measurement into a signal
Note 1 to entry: A signal can be electrical, chemical, etc., see ISO/IEC 29182-2.
3.2.16
service
distinct part of the functionality that is provided by an entity through interfaces

© ISO/IEC 2025 – All rights reserved
3.2.17
storage
capacity of a digital entity to store information subject to recall or the components of a digital entity in
which such information is stored
3.2.18
thing
any entity that can communicate with other entities
3.2.19
user
human or any digital entity that is interested in interacting with a particular physical object
3.2.20
visual
any object perceptible by the sense of sight
4 Architecture
The documents on MPEG IoMT developed by JTC 1/SC 29 (formally known the ISO/IEC 23093 series) are
composed of 6 parts.
The first four (namely Parts 1, 2, 3 and 4) encompass the definition of the architectural framework, the
specification of application programming interfaces (APIs) and compressed representation of data flowing
among media things, as well as reference software resources. The APIs for the media things facilitate
discovering other media things in the network, connecting, and efficiently exchanging data between media
things. The APIs also provide means for supporting transaction tokens to access valuable functionalities,
resources, and data from media things. This way, the media things can be represented in industrial
applications as abstract entities, providing their functionalities in a standard way.
Additionally, in Part 5, the MPEG IoMT further deal with an incremental industrial need, namely the
possibility of defining autonomous services over media things. In this respect, the mission to be performed
by various media things that collaboratively achieve a high-level task is described in a standard way that can
also be exchanged with external processing resources (like blockchains).
Finally, Part 6 provides standard representations for the input/output data of AI powering IoMT Analysers.
The global IoMT architecture is presented in Figure 1, which identifies a set of interfaces, protocols and
associated media-related information representations related to:
— user commands (setup information) between a system manager and an MThing, with reference to
interface 1;
— user commands (setup information) forwarded by an MThing to another MThing, possibly in a modified
form (e.g., subset of 1), with reference to interface;
— sensed data (raw or processed data) and actuation information, with reference to interface 2;
— wrapped interface 2 (e.g., for transmission), with reference to interface 2;
— MThing characteristics, discovery, with reference to interface 3;
— MThing mission commands and control information, with reference to interface 4 and 5;
— MThing media data formats and API for distributed AI processing, with reference to interface 6.

© ISO/IEC 2025 – All rights reserved
Figure 1 — IoMT architecture
5 Use cases
5.1 General
MPEG identified 43 use-cases for IoMT; they are structured in the following 7 main categories:
a) Smart spaces: Monitoring and control with network of audio-video cameras (see 5.2):
— human tracking with multiple network cameras;
— dangerous region surveillance system;
— intelligent firefighting with IP surveillance cameras;
— automatic security alert generation system using, time, GPS and visual information;
— pedestrian-car accident detection in video using prediction result description;
— networked digital signs for customized advertisement;
— digital signage and second screen use;
— self-adaptive quality of experience for multimedia applications;
— ultra-wide viewing video composition;
— face recognition to evoke sensorial actuations;
— automatic video clip generation by detecting event information;
— temporal synchronization of multiple videos for creating 360° or multiple view video;
— intelligent similar content recommendations using information from IoMT devices;
— understand and explain events in video by instance segmentation;
— indoor/outdoor acoustic event detector in MPEG-IoMT;
— safety equipment detection in construction sites.

© ISO/IEC 2025 – All rights reserved
b) Smart spaces: Multi-modal guided navigation (see 5.3):
— blind person assistant system;
— elderly people assistance with consecutive vibration haptic devices;
— personalized navigation by visual communication;
— personalized tourist navigation with natural language functionalities;
— smart identifier: face recognition on smart glasses;
— smart advertisement: QR code recognition on smart glasses.
c) Smart audio/video environments in smart cities (see 5.4):
— smart factory: car maintenance assistance A/V system using smart glasses;
— smart museum: augmented visit museum using smart glasses;
— smart house: light control, vibrating subtitle, olfaction media content consumption;
— smart house: enhanced perception modes;
— smart house: control of home appliance devices;
— smart car: head-light adjustment and speed monitoring to provide automatic volume control.
d) Smart audio/video environments in smart rural areas (see 5.5);
— crop smart farming;
— livestock smart farming.
e) Smart multi-modal collaborative health (see 5.6);
— increasing patient autonomy by remote control of left-ventricular assisted devices;
— diabetic coma prevention;
— enhanced physical activity with smart fabrics networks;
— medical assistance with smart glasses;
— managing healthcare information for smart glass;
— emergency health event detection with infrared cameras;
— multimodal question answer with blood pressure data;
— indoor air quality prediction.
f) Blockchain usage for IoMT transactions authentication and monetizing (see 5.7):
— reward function in IoMT by using blockchains;
— content authentication with blockchains.
g) Metaverse usage of IoMT technologies (see 5.8):
— human pose estimation for avatar animation;
— facial landmark detection for human avatar animation.
The specific way each use case integrates with and leverages the proposed architecture is presented in
Table 1.
© ISO/IEC 2025 – All rights reserved
Table 1 — Mapping between the use cases and the 6 interfaces defined by the IoMT architecture
Use case (1) (2) (3) (4) (5) (6)
Smart spaces: Monitoring and control with network of audio-video cameras
Human tracking with multiple network cameras X X X X X X
Dangerous region surveillance system X X X o o X
Intelligent firefighting with IP surveillance cameras X X X o o X
Automatic security alert and title generation system using, time, GPS and visual
X X X o o X
information
Pedestrian-car accident detection in video using prediction result description X X X o o X
Networked digital signs for customized advertisement X X X X X X
Digital signage and second screen use X X X o o X
Self-adaptive quality of experience for multimedia applications X X X
Ultra-wide viewing video composition X X X o o X
Face recognition to evoke sensorial actuations X X X o o X
Automatic video clip generation by detecting event information X X X o o X
Temporal synchronization of multiple videos for creating 360° or multiple view
X X X o o X
video
Intelligent similar content recommendations using information from IoMT de-
X X X o o X
vices
Understand and explain events in video by instance segmentation X X X o o X
Indoor/outdoor acoustic event detection X X X o o X
Safety equipment detection on construction sites X X X o o X
Smart spaces: Multi-modal guided navigation
Blind person assistant system X X X o o X
Elderly people assistance with consecutive vibration haptic devices X X X o o X
Personalized navigation by visual communication X X X o o X
Personalized tourist navigation with natural language functionalities X X X o o X
Smart identifier: Face recognition on smart glasses X X X o o X
Smart advertisement: QR code recognition on smart glasses X X X o o X
Smart audio/video environments in smart cities
Smart factory: Car maintenance assistance A/V system using smart glasses X X X o o X
Smart museum: Augmented visit using smart glasses X X X o o X
Smart house: Enhanced perception modes X X X
Smart house: Control of home appliance devices X X X o o X
Smart car: Head-light adjustment and speed monitoring X X X o o X
Smart audio/video environments in smart rural areas
Crop smart farming X X X o o X
Smart crop growth monitoring X X X X X X
Livestock smart farming X X X X X X
Smart multi-modal collaborative health
Increasing patient autonomy by remote control of left-ventricular assisted devic-
X X X
es
Diabetic coma prevention by monitoring networks of in-body/near body sensors X X X
Enhanced physical activity with smart fabrics networks X X X
Medical assistance with smart glasses X X X o o X
X interface is required for the use case implementation
o designates that the usage of the corresponding interface is optional.

© ISO/IEC 2025 – All rights reserved
TTabablele 1 1 ((ccoonnttiinnueuedd))
Use case (1) (2) (3) (4) (5) (6)
Managing healthcare information for smart glasses X X X o o X
Emergency health event detection with infrared camera X X X o o X
Personalized detection of health danger X X X X X X
Indoor air quality prediction X X X o o X
Blockchain usage for IoMT transactions authentication and monetizing
Reward function in IoMT people counting by using blockchains X X X o o X
Content authentication with blockchains X X X o o X
Metaverse usage of IoMT technologies
Human pose estimation for avatar animation X X X X X X
Facial landmark detection for human avatar animation X X X X X X
X interface is required for the use case implementation
o designates that the usage of the corresponding interface is optional.
5.2 Smart spaces: Monitoring and control with network of audio-video cameras
5.2.1 General
The large variety of sensors, actuators, displays and computational elements acting in our day-by-day
professional and private space in order to provide us with better and easier accessible services lead to
13 use cases of interest for IoMT, mainly related to the processing of video information.
5.2.2 Human tracking with multiple network cameras
As urban growth is today accompanied by an increase in the crime rate (e.g. theft, vandalism), many
local authorities consider surveillance systems as a possible tool to fight this phenomenon. A city video
surveillance system is an IoMT system that includes a set of IP surveillance cameras, a storage unit and a
human tracker unit.
A particular IP surveillance camera captures audio-video data and send them to both the storage and the
human tracker unit. When the human tracker detects a person, it traces the person and extracts the moving
trajectory.
If the person gets out of the visual scope of the first IP camera but stays in the area protected by the city
video surveillance system, another IP camera from this system can take control and keep capturing A/V
data of the corresponding person.
If the person gets out of the protected area, for example, the person enters a commercial centre, then the city
system searches whether this commercial centre is also equipped with a video surveillance system. If this
is the case, the city video surveillance system sets up a communication with the commercial centre video
surveillance system in order to allow another IP camera from the commercial centre video surveillance
centre to keep capturing A/V data of the corresponding person.
In both cases, the specific descriptors (e.g. moving trajectory information, appearance information, media
locations of detected moments) can be extracted and sent to the storage.
5.2.3 Dangerous region surveillance system
IoMT can serve as a basis for developing intelligent alerting services providing information or alerts, or
both, when a person approaches danger zones, for accident prevention. For instance, Figure 2 illustrates the
case of a home (private) environment where a child plays. Heterogeneous IoMT data (video, depth, audio,
temperature) are analysed to automatically generate an alert if the child approaches the dangerous area
around a hot oven.
© ISO/IEC 2025 – All rights reserved
Figure 2 — Example use-case of dangerous area surveillance system operating in a private (home)
environment
5.2.4 Intelligent firefighting with IP surveillance cameras
Figure 3 illustrates an example use-case of intelligent firefighting with IP surveillance cameras. In this case,
the fire station and the security manager can rapidly receive the fire/smoke detection alert, thereby averting
a potential fire hazard. Unlike conventional security systems, the outdoor scene captured by intelligent IP
surveillance cameras is immediately analysed and the fire/smoke incident is automatically alerted to the
fire station based on the analysed results of the captured scene.
Figure 3 — Example use-case of intelligent firefighting
5.2.5 Automatic security alert and title generation system using, time, GPS and visual information
In the sustainable smart city of Seoul, IoMT cameras (smart CCTV) are deployed around the city. These
cameras are continuously capturing video (24 hours/7 days). When unusual events such as a violent scene,
crowd scene, theft scene or busking scene occurs, the title generator (event description generator) generates
a security alert for immediate intervention. Additionally, a title for the video clip with time and place

© ISO/IEC 2025 – All rights reserved
information is also generated in real-time. The generated title is stored with the video clip in MStorage.
As an example scenario, consider a CCTV capturing videos (visual data), with time and GPS information.
The title generator analyses the video stream, selects a keyframe and combines time, GPS and keyframe to
generate a formatted title. The captured video with the generated title is sent to storage.
5.2.6 Pedestrian-car accident detection in video using prediction result description
A smart camera deployed in a city captures the accident scene and performs object detection, as illustrated
in Figure 4. The object detection network generates foreground-only images and prediction results
representing only people and cars from the input video. Prediction results are compressed with the IoMT
descriptor encoder, and foreground-only images are compressed with the video encoder and transmitted to
the auxiliary process stage. The received prediction results and foreground-only images are decoded and
fed to the corresponding artificial intelligence network at the auxiliary process stage. The pose estimation
network, object tracking network, and action recognition network generate inference results, compress
them with an IoMT descriptor encoder, and sends it to the emergency detector. The received inference
results are decoded by the IoMT descriptor decoder and input to the emergency detector. From the inference
results, a decision is made on whether the situation is an emergency or not. The video can be optionally sent
to the Emergency detector.
Figure 4 — Synopsys of the Pedestrian-car accident detection use case
5.2.7 Networked digital signs for customized advertisement
A camera can be either attached to or embedded in a digital screen displaying advertising content, so as to
be able to capture A/V data and send them to both a storage unit and a gaze tracking/ROI analysing unit.
When the gaze tracking/ROI analyser detects a person in front of the corresponding digital sign, it starts to
trace the eye position, calculates the corresponding region of interest on the currently played advertisement,
and deduces the person’s current interest (e.g. goods) on the advertisement. When the person moves to the
other digital sign, that new sign starts playing relevant advertisement according to the estimated person’s
interest data.
5.2.8 Digital signage and second screen use
This use case addresses the pedestrians who want to get additional information (e.g. product information,
characters, places) of content displayed on digital signs with their mobile phones (i.e. second screens), as
illustrated in Figure 5.
© ISO/IEC 2025 – All rights reserved
Figure 5 — Display signage and second screen use-case
5.2.9 Self-adaptive quality of experience for multimedia applications
The self-adaptive multimedia application is an application working on wearable device with a middleware
providing optimal quality of services (QoS) performance for each application, according to the static/
dynamic status of the application and/or system resources.
The user initially starts the self-adaptive multimedia application and updates the initial setup to guarantee
the application’s performance quality in a wearable device. The self-adaptive application needs the static/
dynamic status information between the wearable device and processing unit. And then the self-adaptive
application is normally running on wearable devices until a status change/update event is generated. These
events happen at the moment of detection of a performance level decrease and then the status information
request is sent to the processing unit.
The processing unit can support a heterogeneous type of wearable devices and it includes static/dynamic
system manager to optimize computing performance. The processing unit performs resource management
optimally, based on the performance requirement of self-adaptive application.
5.2.10 Ultra-wide viewing video composition
The ultra-wide viewing video composition is possible thanks to the videos captured from multiple cameras
equipped with multiple sensors (time, accelerator, gyro, GPS, and compass) along with a video composer,
storage and display devices as MThings. However, this is known to be a complex task requiring intensive
computational power. The selection and configuration of the individual MThings needed to achieve this
collaboratively is based on their availability and the hardware resources they need to be aware of. A
potential list of such resources includes, but it is not restricted to: CPU (Usage, Percentage), GPU (Available
memory, Megabytes), RAM (Free memory, Megabytes), Storage (Available capacity, Megabytes) and Battery
(Remaining volume, Percentage). For example, when there are multiple MThings available as shown in
Figure 6 and Figure 7, selecting the MAnalyser 1 and MSensor 1, which are less occupied with reasonable
performance, can be reasonable choices as this task is GPU intensive.

© ISO/IEC 2025 – All rights reserved
Figure 6 — Examples of hardware information of three MAnalyser
Figure 7 — Examples of hardware information of three MSensors
5.2.11 Face recognition to evoke sensorial actuations
An IP surveillance camera captures audio-video data and send them to both a storage unit and a face
recognizer unit. When the face recognizer detects and recognizes the face of a pre-registered person, it
activates a scent generator to spray some specific scent. The specific descriptors (e.g. detected face locations,
face descriptors, media locations of detected moments) can be alternatively extracted and sent to a storage
unit. In this use case, the scent generator can by replaced by any type of actuators (e.g. light bulbs, displays,
music players).
5.2.12 Automatic video clip generation by detecting event information
This use case describes automatic video clip generation by detecting event information from audio/video
streaming feed from a video camera. Usually, family or friends hold many events such as birthday parties,
wedding anniversaries or pyjama parties. By using surveillance cameras, these events can be detected and
pictures or videos taken at the event can be used to make a time-lapse video.
5.2.13 Temporal synchronization of multiple videos for creating 360° or multiple view video
A new video can be created by using videos captured by multiple cameras. Any camera has its own local
clock with various sensors and can record the shooting time based on the local clock. As each camera has a
different timeline, when creating a new video (e.g. 360° video) using time information (e.g., stitching) from
two different devices, some errors are likely to occur. The time-offset information between individual videos
can be cancelled by performing temporal synchronization using visual or audio information, or both, with
sensor data, thus obtaining a natural-looking video.

© ISO/IEC 2025 – All rights reserved
Moreover, if individual videos are transmitted through the network, people can watch the videos taken
from various viewpoints of the event. This means someone can watch just one video whilst another watches
multiple videos at the same time, and someone else can alternately watch videos.
5.2.14 Intelligent similar content recommendations using information from IoMT devices
Curre
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...