Annex D (Normative)
D.1 Scope
This annex specifies only the computational operations associated with
mathematical functions not defined elsewhere in this document, such as
elementary or other analytical functions.
Such a mathematical function shall be unambiguously defined on a
sub-domain of the affinely extended real numbers, hereafter the domain,
with values in the affinely extended real numbers. Given such a
mathematical function, this annex specifies the behavior of the
corresponding computational function, hereafter termed
operation. Mathematical functions are language-defined.
[ Rationale:
(i) Given a proper mathematical definition of a function on the reals,
which would be the responsibility of a language, Annex D should be a
manual for the implementation of such a properly defined function.
We need to make a distinction between the pure mathematical function
and its floating-point implementation. Some of the issues we face,
e.g. 00 of 1^infty, are not floating-point issues, they are
mathematical ones. As such, they should be language-defined.
Some other issues, e.g. related to signed zeroes,
are floating-point issues. The purpose of this standard is to resolve
floating-point issues, not to define more general mathematics.
As an illustration, there seems to be a consensus that most
issues around power are resolved by considering three distinct
mathematical power functions, which are algebraicPower(x,n), root(x,n)
and analyticPower(x,y), where x and y are reals and n is a
nonnegative integer.
(ii) The mathematical functions that we consider are defined on a
sub-set of affinely extended real numbers with its conventional
topology.
We assume that the following definition, or an
equivalent one, appears elsewhere in the standard: "Infinity
arithmetic shall be defined as the affine extension of real
numbers, with two infinite values such that -infty < (every
finite number) < +infty.".
In most cases, there is no need for this affine extension in
the definition of the mathematical functions: infinities appear
naturally as limits for functions defined on the
reals. However, in the case when such limits, in the broad
mathematical sense, do not exist (e.g. 1^infty or
sinPi(infty)), this formulation allows a language to define the
value of the function at this "point" of the affinely extended
reals.
(iii) Rationale for the "not defined elsewhere": The floating-point
behavior defined by Annex D is intended to be self-consistent
and unambiguous. The rules its provides could be used to define
the behavior of addition(), multiplication(), division() and
squareRoot(), which are indeed well-established mathematical
functions. However, the behavior would be slightly different
from the current standard ones for corner-cases such as
squareRoot(-0), and there are good reasons for keeping such
cases.
]
D.2 Conforming operation
An operation shall return the result of the mathematical function
correctly rounded according to the prevailing rounding direction for
all operands in its domain, with one exception: if the correctly
rounded value does not belong to the image of the domain by the
mathematical function, and the prevailing rounding mode is
roundTiesToEven or roundTiesAway, the operation should return the
floating-point number belonging to the image of the domain and closest
to the value of the mathematical function.
[Rationale:
(i) Specifying correct rounding for operations yields bit-exact
results that permit high portability.
(ii) Range preservation is however considered a more important
mathematical property to preserve for round to nearest modes. It
prevents wide divergence between the behavior of the operator and
that of the mathematical function. A typical example is the
arctangent: a property of the mathematical functions tan
and atan is
tan(atan(x)) = x
and the given rule allows it to be translated to
FPtan(FPatan(x)) = x + small FP error
(iii) The rule applied in this special case still leads to an
unambiguous result (except maybe for pathological mathematical
functions), hence portability.
(iv) This special case rule is not used for directed rounding
modes. It is assumed that directed rounding modes are mostly used
for implementing interval arithmetic, and correct interval
arithmetic would be more difficult to implement with this special
case rule.
]
An operation associated with a mathematical function shall signal all
appropriate exceptions except for inexact. An operation is allowed not
to signal inexact when the result is inexact. However, if the
prevailing rounding mode is one of the directed rounding modes, the
operation shall not signal inexact for exact results.
[Rationale:
Deciding inexact may be costly in some cases, and as mathematical
functions defined herein have transcendental images on almost all
their arguments, signalling inexact is not of much use. However, not
signalling inexact on exact results is for free in the directed
rounding modes if correct rounding is to be achieved anyway.
]
Operations shall signal division by zero as per 8.3, overflow as
per 8.4 and underflow as per 8.5. They shall signal invalid as
per 8.2 when the input is invalid, as defined below.
For all operations, signalling NaN operands shall signal the invalid
exception. An operation shall return quiet NaN if one of its
arguments is a quiet NaN.
For an n-ary mathematical function f and floating-point numbers
X1,...,Xn, if f(X1,...,Xn) is defined as some real value y, then the
operation associated with f(X1,...,Xn) shall return y rounded according
to the prevailing rounding mode (if y is zero, the sign shall be chosen
according to the rules below).
[Rationale:
When the function has a real image for some input, the choice is
straightforward. Limit arguments must be used only if no real value
can be obtained just by "evaluating" the definition of the
mathematical function.
]
Otherwise, consider the limit (in the affinely extended real numbers)
of f(x1,...,xn) as (x1,...,xn) goes to (X1,...,Xn). When some of the
Xi's are +0 or -0, this limit shall be considered using only positive
(resp. negative) values of the corresponding xi's. In this case, if
the limit does not exist but limits exist for other signs of 0 and are
equal, the non-existing limit should also be considered as existing
and equal.
If no limit of f(x1,...,xn) exists, (X1,...,Xn) shall be considered as
an invalid operand. Note that this is the case when (X1,...,Xn) does
not belong to the closure of the domain of the function.
If this limit exists and is a real number c, the operation associated
with f(X1,...,Xn) shall return c rounded according to the prevailing
rounding mode, with the same exceptional rule as above if the rounded
value does not belong to the image of the domain of the function. If
this limit is one of +infinity or -infinity, the operation shall
return this infinity.
[Rationale:
In cases where the definition of the mathematical function does not
allow one to determine a value, a limit argument must be used.
We consider limits in the mathematical sense: exhibiting one specific
sequence that converges to a value is not sufficient. Indeed, using
sequences converging to (1, infty), pow(x,y) can be made to converge
toward any value c (some of which useful in some contexts). Note that
the same holds for multiplication(0, infty). The previous standard
defines this as NaN, and our choice is consistent with that.
Interpreting the sign of 0 as an indication on which half-space the 0
has been obtained is commonly accepted.
It may lead to issues related to the distinction between the true 0 and
an underflowed value flushed to 0. The following manages these issues.
]
If the value to be returned by the operation is 0, the following rules
shall apply for the sign of this 0. Let sgn be the mathematical
function defined as follows:
/
| 1 if x > 0
sgn(x) = | -1 if x < 0
| 0 if x = 0
\
Consider the limit of sgn(f(x1,...,xn)) as (x1,...,xn) goes to
(X1,...,Xn). When some of the Xi's are +0 or -0, the limit of
sgn(f(x1,...,xn)) shall be considered using only positive (resp.
negative) values of the corresponding xi's.
Let S be the limit set (the set of the cluster points) of
sgn(f(x1,...,xn)) when (x1,...,xn) goes to (X1,...,Xn).
* If S = { 1 } or S = { 1, 0 }: the operation associated with f shall
return +0.
* If S = { -1 } or S = { -1, 0 }: the operation associated with f shall
return -0.
* If S = { 1, -1 } or S = { 1, -1, 0 }: the operation associated with f
should return -0 if the prevailing rounding mode is roundTowardNegative,
and +0 otherwise.
* If S = { 0 }: the operation associated with f should return +0.
* If S is empty: the operation associated with f should return 0
with a sign obtained using the rules for the sign of zero without
taking into account the sign of the Xi's that are equal to zero.
[Rationale:
If the value of the function is zero, the sign of this 0 is best
determined by considering the continuous extension of the the sign
function of the mathematical function.
We use "shall" when mathematics dictate the rule without ambiguity
(two first items). In contrast, we use "should" when some arbitrary
choice must be made, with some preference for +0.
As before, the signs of 0 in input determine the half-spaces used to
obtain the limit. However, if the function is defined in 0 and
discontinuous in 0, limit arguments become useless.
In the case where the sign of 0 depends on the prevailing rounding
mode, our definition is consistent with the definition of the sign of
x - x.
]
Note: According to these rules, when f is an odd function, the
operation associated with f may return +0 on some x and also +0 on -x,
even in symmetrical rounding directions. Application programmers
should be aware of this.
[Remark:
sinPi, according to the rule, returns +0 on all integers -- it is the
case S = { 1, -1, 0 }
The "should" is there to allow implementations to favor sign symmetry
for such functions.
]
D.3 Conformance
Any operation that follows the specifications in D.2 of this annex
shall be said to conform. Any operation that does not shall be said
not to conform.
Such conformance is attained for each operation
individually: An implementation might conform for some operations only
(or none). Any operation may be implemented in a manner that does not
conform without affecting the conformance of any other operation or of
the implementation as a whole. Also, for given mathematical function,
an implementation might include an operation that conforms as well as
another that does not, and allow the user to choose between them.
Implementations should provide operations that return correctly
rounded results for as many of these mathematical functions as permits
efficient implementation, for all supported floating-point formats.
Languages should define which operations are required or recommended
to conform. When a language does not specify an operation as
conforming to this annex, each implementation should document the
domain, exceptional cases, and worst-case accuracies achieved, and
indicate whether the accuracies are proven or measured for a subset of
inputs. Operations implemented as specified in D.2 have the properties
that they preserve monotonicity. Operations implemented so as to not
conform to D.2 should still preserve monotonicity.
Note that the rules given in D.2 apply to usual mathematical
functions, including those listed in D.4.
If, for some mathematical function and some arguments, theses rules
are not applicable, the operation shall be said not to conform and
implementations should document the behavior of the operation in such
cases.
[Rationale:
There exist particular (pathological) mathematical functions for which
applying the rules in D.2 will give different results in different
mathematical contexts (topology, axiomatic choices, etc). Defining
such mathematical context is outside of the scope of this standard.
Most such functions are ad-hoc constructions and are not expected to
be common functions used in engineering and other floating-point
applications. In particular, their definition domain is often not
well-defined. An example is the function x -> x * sqrt(sin(1/x)), which
is infinitely often defined and infinitely often undefined around 0.
These are the only cases when the above rules do not specify an
unambiguous operation. The operation is to be considered outside the
scope of the IEEE 754 standard annex D. Implementations may conform to
other standards in this case.
]
D.4 Informative list of mathematical functions
In the following Z = {...,-3,-2,-1,0,1,2,3,...}
The definition domains are given in affinely extended real numbers.
- log x [0,infty]
- log2 x [0,infty]
- log10 x [0,infty]
- log1p x [-1,infty]
- log2p1 x [-1,infty]
- log10p1 x [-1,infty]
- exp x [-infty,infty]
- exp2 x [-infty,infty]
- exp10 x [-infty,infty]
- expm1 x [-infty,infty]
- exp2m1 x [-infty,infty]
- exp10m1 x [-infty,infty]
- sin x ]-infty,infty[
- cos x ]-infty,infty[
- tan x ]-infty,infty[ \ { Pi/2 + k*Pi, k in Z}
- sinpi x ]-infty,infty[
- cospi x ]-infty,infty[
- tanpi x ]-infty,infty[ \ { 1/2 + k, k in Z}
- sinh x [-infty,infty]
- cosh x [-infty,infty]
- tanh x [-infty,infty]
- asin x [-1,1]
- acos x [-1,1]
- atan x [-infty,infty]
- asinpi x [-1,1]
- acospi x [-1,1]
- atanpi x [-infty,infty]
- argsinh x [-infty,infty]
- argcosh x [0,infty]
- argtanh x [-1,1]
- atan2(x,y)
[-infty,infty] * [-infty,infty] \ {(0,0), (0,infty), (+/-infty,+/-infty)}
returns the angle in radians in [0, 2*Pi[ such that a line segment
forming this angle with the x axis passes through (x,y)
- hypot(x,y) = squareRoot(x2 + y2) [-infty,infty] * [-infty,infty]
- analyticPower(x,y) = exp(y * log(x))
[0,infty] * [-infty,infty] \ {(0,0), (infty,0), (1,+/-infty)}
/ x * x * ... * x (n times) if n >= 1
| 1/algebraicpower(x,-n) if n <= -1
- algebraicPower(x,n) = | 1 if n = 0
\ undefined otherwise
[-infty,infty] * Z
[Comment: the second argument n of algebraicPower is an integer. The
operation associated with this function might consider its second
argument as a language-defined integer.
]
/ principal n-th root of x if n >= 1 and if the root exists
- root(x,n) = | 1/root(x,-n) if n <= -1
\ undefined otherwise
[0,infty] * (Z \ {0}) U [-intfy,0[ * (Z \ 2Z)
[Comment: the second argument n of root is an integer. The operation
associated with this function might consider its second argument as
a language-defined integer.
]
/
| algebraicPower(x,y) if y is an integer
- conventionalPow(x,y) = |
| analyticPower(x,y) if y is not an integer
\
]-infty,0[*Z U [0,infty] * [-infty,infty] \ {(1,+/-infty)}
[Comment: this function is there because many existing programs rely
on the existence of such an operation. New programs are expected to
use the better analyticPower and algebraicPower. conventionalPow is
NOT compatible with the C99 pow on 1^infty, NaN0, 1^NaN. It is not
expected that many programs rely on these C99 properties. The
function may also be considered as purely Language defined and
therefore might be withdrawn from the list.
]
- erf x [-infty,infty]
- erfc x [-infty,infty]