Binary floating-point arithmetic for microprocessor systems

Defines ways for new microprocessor systems to perform binary floating point arithmetic in software, in hardware or in any combination of hardware and software. Note: -For the price of this publication, please consult the ISO/IEC price-code list.

Binäre Gleitpunkt-Arithmetik für Mikroprozessor-Systeme

Arithmétique binaire en virgule flottante pour systèmes à microprocesseurs

Définit, pour les nouveaux systèmes à microprocesseurs, la façon de manipuler l'arithmétique binaire en virgule flottante sous forme de logiciel, de matériel ou d'une quelconque combinaison des deux. Note: - Pour le prix de cette publication, veuillez consulter la liste du code-prix ISO/CEI.

Binarna aritmetika s plavajočo vejico za mikroprocesorske sisteme (IEC 60559:1989)

General Information

Status
Published
Publication Date
31-Jul-1997
Current Stage
6060 - National Implementation/Publication (Adopted Project)
Start Date
01-Aug-1997
Due Date
01-Aug-1997
Completion Date
01-Aug-1997

Buy Standard

Standardization document
HD 592 S1:1997
English language
25 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (Sample)


SLOVENSKI STANDARD
01-avgust-1997
%LQDUQDDULWPHWLNDVSODYDMRþRYHMLFR]DPLNURSURFHVRUVNHVLVWHPH ,(&

Binary floating-point arithmetic for microprocessor systems
Binäre Gleitpunkt-Arithmetik für Mikroprozessor-Systeme
Arithmétique binaire en virgule flottante pour systèmes à microprocesseurs
Ta slovenski standard je istoveten z: HD 592 S1:1991
ICS:
35.160 Mikroprocesorski sistemi Microprocessor systems
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

NORME CEI
INTERNATIONALE IEC
INTERNATIONAL
Deuxième édition
STAN DARD
Second edition
1989-01
Arithmétique binaire en virgule flottante
pour systèmes à microprocesseurs
Binary floating-point arithmetic
for microprocessor systems
© IEC 1989 Droits de reproduction réservés — Copyright - all rights reserved
Aucune partie de cette publication ne peut être reproduite ni No part of this publication may be reproduced or utilized in
utilisée sous quelque forme que ce soit et par aucun any form or by any means, electronic or mechanical,
procédé, électronique ou mécanique, y compris la photo- including photocopying and microfilm, without permission in
copie et les microfilms, sans l'accord écrit de l'éditeur. writing from the publisher.
International Electrotechnical Commission 3, rue de Varembé Geneva, Switzerland
Telefax: +41 22 919 0300 e-mail: inmail@iec.ch IEC web site http: //www.iec.ch
CODE PRIX
Commission Electrotechnique Internationale
S
PRICE CODE
International Electrotechnical Commission
IEC
Me?NgyHapogHaR 3neKTpOTexH114eCNHA HOMNCCNA
Pour prix, voir catalogue en vigueur
• • For price, see current catalogue

559©IEC - 3 -
CONTENTS
Page
FOREWORD 5
PREFACE
Clause
1. Scope
7.
1.1 Implementation objectives
1.2 Inclusions
1.3 Exclusions
2. Definitions
3. Formats
3.1 Sets of values
3.2 Basic formats
3.3 Extended formats
3.4 Combinations of formats
4. Rounding
4.1 Round to nearest 19
4.2 Directed roundings 19
4.3 Rounding precision
5. Operations 21
5.1 Arithmetic 21
5.2 Square root 23
5.3 Floating-point format conversions 23
5.4 Conversions between floating-point and integer 23
5.5 Round floating-point number to integral value 23
4-4
5.6 Binary decimal conversion 23
5.7 Comparison 27
6. Infinity, NaNs and signed zero
6.1 Infinity arithmetic
6.2 Operations with NaNs
6.3 33 The sign bit
7. Exceptions
7.1 Invalid operations
7.2 Division by zero
7.3 Overflow
7.4 Underflow
7.5 39 Inexact
Traps 39
8.
8.1 41 Trap handler
8.2 Precedence
APPENDIX A - Recommended functions and predicates 43

©
559 IEC - 5
INTERNATIONAL ELECTROTECHNICAL COMMISSION
BINARY FLOATING -POINT ARITHMETIC
FOR MICROPROCESSOR SYSTEMS
FOREWORD
1) The formal decisions or agreements of the IEC on technical matters,
prepared by Technical Committees on which all the National Committees
having a special interest therein are represented, express, as nearly
as possible, an international consensus of opinion on the subjects
dealt with.
They have the form of recommendations for international use and they
2)
are accepted by the National Committees in that sense.
3) In order to promote international unification, the IEC expresses the
wish that all National Committees should adopt the text of the IEC
recommendation for their national rules in so far as national
conditions will permit. Any divergence between the IEC recommendation
and the corresponding national rules should, as far as possible, be
clearly indicated in the latter.
PREFACE
This standard has been prepared by Sub-Committee 47B: Microprocessor
systems, of IEC Technical Committee No. 47: Semiconductor devices. (This
Sub-Committee has been taken over by ISO/IEC JTC 1 . )
This second edition of IEC Publication 559 replaces the first edition
issued in 1982.
The text of this standard is based on the following documents:
Six Months' Rule Report on Voting
47B(CO)19 47B(CO)26
Full information on the voting for the approval of this standard can be
found in the Voting Report indicated in the above table.

559 ©I EC - 7 -
BINARY FLOATING-POINT ARITHMETIC
FOR MICROPROCESSOR SYSTEMS
1. Scope
1.1 Implementation objectives
It is intended that an implementation of a floating-point system
conforming to this standard can be realized entirely in software,
entirely in hardware, or in any combination of software and hardware.
It is the environment that the programmer or user of the system sees
that conforms or fails to conform to this standard. Hardware
components that require software support to conform shall not be said
to conform apart from such software.
1 .2 Inclusions
This standard specifies:
1) basic and extended floating-point number formats;
2) add, subtract, multiply, divide, square root, remainder and
compare operations;
3) conversions between integer and floating-point numbers;
4)
conversions between different floating-point formats;
5) numbers
conversions between basic format floating-point and
decimal strings, and
6) floating-point exceptions and their handling, including non-
numbers (NaNs) .
1.3 Exclusions
This standard does not specify:
1) formats of decimal strings and integers;
2) interpretation of the signs and significant fields of NaNs, or
3) binary decimal conversions to and from extended formats.
2. Definitions
Biased exponent
The sum of the exponent and a constant (bias) chosen to make the
biased exponent's range non-negative.

559 ©I EC - 9
Binary floating-point number
A bit-string characterized by three components: a sign, a signed
exponent, and a significand. Its numerical value, if any, is the signed
product of its significand and two raised to the power of its
exponent. In this standard a bit-string is not always distinguished
from a number it may represent.
Denormalized number
A nonzero floating-point number, the exponent of which has a
reserved value, usually the format's minimum, and the explicit or
implicit leading significant bit of which is zero.
Destination
The location for the result of a binary or unary operation. The
destination may be either explicitly designated by the user or implicitly
supplied by the system (e.g. intermediate results in sub-expressions
or arguments for procedures) . Some languages place the results of
intermediate calculations in destinations beyond the user's control.
Nonetheless, this standard defines the result of an operation in terms
of that destination's format as well as the operands' values.
Exponent
The component of a binary floating-point number that normally
signifies the integer power to which two is raised in determining the
value of the represented number. Occasionally the exponent is called
the signed or unbiased exponent.
Fraction
The field of the significand that lies to the right of its implied
binary point.
Mode
A variable that a user may set, sense, save and restore, to control
the execution of subsequent arithmetic operations. The default mode is
the mode that a program can assume to be in effect unless an
explicitly contrary statement is included either in the program or in its
specification.
The following modes shall be implemented:
1) rounding, to control the direction of rounding errors, and in
certain implementations.
2) rounding precision, to shorten the precision of results. The imple-
mentor may, at his option, implement the following modes:
3) traps disabled/enabled, to handle exceptions.

559 © IEC - 11 -
NaN
Not a number; a symbolic entity encoded in floating-point format.
There are two types of NaNs (see 6.2) . Signalling NaNs signal the
invalid operation exception (see 7.1) whenever they appear as
operands. Quiet NaNs propagate through almost every arithmetic
operation without signalling exceptions.
Result
The bit-string (usually representing a number) that is delivered to
the destination.
Significant
The component of a binary floating-point number which consists of
an explicit or implicit leading bit to the left of its implied binary point
and a fraction field to the right.
Shall
The word "shall" signifies that which is obligatory in any conforming
implementation.
Should
The word "should" signifies that which is strongly recommended as
being in keeping with the intent of the standard, although
architectural or other constraints beyond the scope of this standard
may, on occasion, render the recommendations impractical.
Status flag
A variable that may take two states, set and clear. A user may clear
a flag, copy it, or restore it to a previous state. When set, a status
flag may contain additional system-dependent information, possibly
inaccessible to some users. The operations of this standard may, as a
side-effect, set some of the following flags: inexact result, underflow,
overflow, divide by zero and invalid operation.
User
Any person, hardware, or program not itself specified by this
standard, having access to and controlling those operations of the
programming environment specified in this standard.
3. Formats
This standard defines four floating-point formats in two groups,
basic and extended, each having two widths, single and double. The
standard levels of implementation are distinguished by the combinations
of formats supported.
559©IEC - 13 -
3.1 Sets of values
This sub-clause concerns only the numerical values representable
within a format, not the encodings which are the subject of the
following sub-clauses. The only values representable in a chosen
format are those specified via the following three integer parameters:
P = number of significant bits (precision)
E
maximum exponent, and
max
E=
minimum exponent
min
Each format's parameters are displayed in Table 1. Within each
format just the following entities shall be provided:
Numbers of the form (-1) s2 E (bo bl b2 . bp_1)
where:
s is 0 or 1;
E is any integer between E . and
E inclusive, and each b. is 0
min max i
or o 1.
+oo
Two infinities, and -so;
at least one signalling NaN, and
at least one quiet NaN.
Table 1 - Summary of format parameters
Format
Parameter
Double
Single
Si Double
Single
Extended
Extended
P
24 ?32 53 >_64
E +127 ?+1 023 +1 023 >+16 383
ma x
E .
-126 <-1 022 -1 022 :5.-16 382
min
Exponent bias +1 023
+127 Unspeci- Unspeci-
fied fied
Exponent width (bits) 11 ?15
8 >11
Format width (bits) ?79
32 ?43 64
The foregoing description enumerates some values redundantly, for
example:
2 ° (1.0) = 2 1 (0.1) = 2 2(0.01) _ .
However, the encodings of such nonzero values may be redundant
only in extended formats (see 3.3) . The nonzero values of the form
±2 Emin (0•b bl 2 . b ) are called denormalized. Reserved exponents
p-1
559 © IE C - 15 -
may be used to encode NaNs, ±^, ±0, and denormalized numbers. For
any variable that has the value zero, the sign bit s provides an extra
bit of information. Although all formats have distinct representations
for +0, and -0, the signs are significant in some circumstances, like
division by zero, and not in others. In this standard 0 and 0. are
written without a sign when the sign does not matter.
3.2 Basic formats
Numbers in the single and double formats are composed of three
fields:
a 1-bit sign s,
a biased exponent e = E + bias, and
a fraction f = • b 1 b 2 . bp-1.
The range of the unbiased exponent E shall include every integer
E inclusive, and also two other
between two values and
E
min x
ma
reserved values: to encode ±0 and denormalized numbers, and

Emin -1
±oo
to encode and NaNs. The foregoing parameters appear in
+1
Emax
Table 1. Each nonzero numerical value has just one encoding. The
fields are interpreted as follows:
3.2.1 Single
A 32-bit single format number X is divided as shown in Figure 1.
The value y of X is inferred from its constituent fields thus:
1) If e = 255 and 0, then y is a NaN regardless of s
f #
If e = 255 and 0, then y = (-1)s.
2) f =
3) If 0 < e < 255, then v = (-1)s 2e-
(11)
(1)s 2-126 (0.f)
4) If e = 0 and 0, then y = (denormalized
f #
numbers)
5) If e = 0 and 0, then y = (-1) s 0 (zero)
f =
1 8 23 . widths
s e f
lsb . order
msb lsb msb
"msb" means "most significant bit"
"lsb" means "least significant bit"
Figure 1 - Single format
559 © I EC - 17 -
3.2.2 Double
A 64-bit double format number X is divided as shown in Figure 2.
The value of X is inferred from its constituent fields thus:
y
s
1) If e = 2 047 and 0, then y is a NaN regardless of
f #
2) If e = 2 047 and f = 0, then y = (-1)s.
2e-1 023
(1
= (-1)s
3) If 0 < e < 2 047, then y f)
2-1
= (-1)s
022 (0.f) (denormalized
4) If e = 0 and 0, then y
f #
numbers)
5) If e = 0 and 0, then y = (-1) s 0 (zero)
f =
... widths
1 11 52
s e f
lsb . order
msb lsb msb
Figure 2 - Double format
3.3 Extended formats
The single extended and double extended formats encode in an
implementation-dependent way the sets of values in 3.1 subject to the
constraints of Table 1. This standard allows an implementation to
encode some values redundantly, provided that redundancy is
transparent to the user in the following sense: an implementation shall
either encode every nonzero value uniquely or not distinguish
redundant encodings of nonzero values. An implementation may also
reserve some bit strings for purposes beyond the scope of this
standard; when such a reserved bit string occurs as an operand the
result is not specified by this standard.
An implementation of this standard is not required to provide (and
the user should not assume) that single extended formats have greater
range than double extended formats.
3.4 Combinations of formats
All implementations conforming to this standard shall support the
single format. Implementations should support the extended format
corresponding to the widest basic format supported, and need not
support any other extended format.*
Only if upward compatibility and speed are important issues should a
system supporting the double extended format also support the single
extended format.
SIST HD 592 S1:199
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.