Annex D (Normative) D.1 Scope This annex specifies only the computational operations associated with mathematical functions not defined elsewhere in this document, such as elementary or other analytical functions. Such a mathematical function shall be unambiguously defined on a sub-domain of the affinely extended real numbers, hereafter the domain, with values in the affinely extended real numbers. Given such a mathematical function, this annex specifies the behavior of the corresponding computational function, hereafter termed operation. Mathematical functions are language-defined. [ Rationale: (i) Given a proper mathematical definition of a function on the reals, which would be the responsibility of a language, Annex D should be a manual for the implementation of such a properly defined function. We need to make a distinction between the pure mathematical function and its floating-point implementation. Some of the issues we face, e.g. 00 of 1^infty, are not floating-point issues, they are mathematical ones. As such, they should be language-defined. Some other issues, e.g. related to signed zeroes, are floating-point issues. The purpose of this standard is to resolve floating-point issues, not to define more general mathematics. As an illustration, there seems to be a consensus that most issues around power are resolved by considering three distinct mathematical power functions, which are algebraicPower(x,n), root(x,n) and analyticPower(x,y), where x and y are reals and n is a nonnegative integer. (ii) The mathematical functions that we consider are defined on a sub-set of affinely extended real numbers with its conventional topology. We assume that the following definition, or an equivalent one, appears elsewhere in the standard: "Infinity arithmetic shall be defined as the affine extension of real numbers, with two infinite values such that -infty < (every finite number) < +infty.". In most cases, there is no need for this affine extension in the definition of the mathematical functions: infinities appear naturally as limits for functions defined on the reals. However, in the case when such limits, in the broad mathematical sense, do not exist (e.g. 1^infty or sinPi(infty)), this formulation allows a language to define the value of the function at this "point" of the affinely extended reals. (iii) Rationale for the "not defined elsewhere": The floating-point behavior defined by Annex D is intended to be self-consistent and unambiguous. The rules its provides could be used to define the behavior of addition(), multiplication(), division() and squareRoot(), which are indeed well-established mathematical functions. However, the behavior would be slightly different from the current standard ones for corner-cases such as squareRoot(-0), and there are good reasons for keeping such cases. ] D.2 Conforming operation An operation shall return the result of the mathematical function correctly rounded according to the prevailing rounding direction for all operands in its domain, with one exception: if the correctly rounded value does not belong to the image of the domain by the mathematical function, and the prevailing rounding mode is roundTiesToEven or roundTiesAway, the operation should return the floating-point number belonging to the image of the domain and closest to the value of the mathematical function. [Rationale: (i) Specifying correct rounding for operations yields bit-exact results that permit high portability. (ii) Range preservation is however considered a more important mathematical property to preserve for round to nearest modes. It prevents wide divergence between the behavior of the operator and that of the mathematical function. A typical example is the arctangent: a property of the mathematical functions tan and atan is tan(atan(x)) = x and the given rule allows it to be translated to FPtan(FPatan(x)) = x + small FP error (iii) The rule applied in this special case still leads to an unambiguous result (except maybe for pathological mathematical functions), hence portability. (iv) This special case rule is not used for directed rounding modes. It is assumed that directed rounding modes are mostly used for implementing interval arithmetic, and correct interval arithmetic would be more difficult to implement with this special case rule. ] An operation associated with a mathematical function shall signal all appropriate exceptions except for inexact. An operation is allowed not to signal inexact when the result is inexact. However, if the prevailing rounding mode is one of the directed rounding modes, the operation shall not signal inexact for exact results. [Rationale: Deciding inexact may be costly in some cases, and as mathematical functions defined herein have transcendental images on almost all their arguments, signalling inexact is not of much use. However, not signalling inexact on exact results is for free in the directed rounding modes if correct rounding is to be achieved anyway. ] Operations shall signal division by zero as per 8.3, overflow as per 8.4 and underflow as per 8.5. They shall signal invalid as per 8.2 when the input is invalid, as defined below. For all operations, signalling NaN operands shall signal the invalid exception. An operation shall return quiet NaN if one of its arguments is a quiet NaN. For an n-ary mathematical function f and floating-point numbers X1,...,Xn, if f(X1,...,Xn) is defined as some real value y, then the operation associated with f(X1,...,Xn) shall return y rounded according to the prevailing rounding mode (if y is zero, the sign shall be chosen according to the rules below). [Rationale: When the function has a real image for some input, the choice is straightforward. Limit arguments must be used only if no real value can be obtained just by "evaluating" the definition of the mathematical function. ] Otherwise, consider the limit (in the affinely extended real numbers) of f(x1,...,xn) as (x1,...,xn) goes to (X1,...,Xn). When some of the Xi's are +0 or -0, this limit shall be considered using only positive (resp. negative) values of the corresponding xi's. In this case, if the limit does not exist but limits exist for other signs of 0 and are equal, the non-existing limit should also be considered as existing and equal. If no limit of f(x1,...,xn) exists, (X1,...,Xn) shall be considered as an invalid operand. Note that this is the case when (X1,...,Xn) does not belong to the closure of the domain of the function. If this limit exists and is a real number c, the operation associated with f(X1,...,Xn) shall return c rounded according to the prevailing rounding mode, with the same exceptional rule as above if the rounded value does not belong to the image of the domain of the function. If this limit is one of +infinity or -infinity, the operation shall return this infinity. [Rationale: In cases where the definition of the mathematical function does not allow one to determine a value, a limit argument must be used. We consider limits in the mathematical sense: exhibiting one specific sequence that converges to a value is not sufficient. Indeed, using sequences converging to (1, infty), pow(x,y) can be made to converge toward any value c (some of which useful in some contexts). Note that the same holds for multiplication(0, infty). The previous standard defines this as NaN, and our choice is consistent with that. Interpreting the sign of 0 as an indication on which half-space the 0 has been obtained is commonly accepted. It may lead to issues related to the distinction between the true 0 and an underflowed value flushed to 0. The following manages these issues. ] If the value to be returned by the operation is 0, the following rules shall apply for the sign of this 0. Let sgn be the mathematical function defined as follows: / | 1 if x > 0 sgn(x) = | -1 if x < 0 | 0 if x = 0 \ Consider the limit of sgn(f(x1,...,xn)) as (x1,...,xn) goes to (X1,...,Xn). When some of the Xi's are +0 or -0, the limit of sgn(f(x1,...,xn)) shall be considered using only positive (resp. negative) values of the corresponding xi's. Let S be the limit set (the set of the cluster points) of sgn(f(x1,...,xn)) when (x1,...,xn) goes to (X1,...,Xn). * If S = { 1 } or S = { 1, 0 }: the operation associated with f shall return +0. * If S = { -1 } or S = { -1, 0 }: the operation associated with f shall return -0. * If S = { 1, -1 } or S = { 1, -1, 0 }: the operation associated with f should return -0 if the prevailing rounding mode is roundTowardNegative, and +0 otherwise. * If S = { 0 }: the operation associated with f should return +0. * If S is empty: the operation associated with f should return 0 with a sign obtained using the rules for the sign of zero without taking into account the sign of the Xi's that are equal to zero. [Rationale: If the value of the function is zero, the sign of this 0 is best determined by considering the continuous extension of the the sign function of the mathematical function. We use "shall" when mathematics dictate the rule without ambiguity (two first items). In contrast, we use "should" when some arbitrary choice must be made, with some preference for +0. As before, the signs of 0 in input determine the half-spaces used to obtain the limit. However, if the function is defined in 0 and discontinuous in 0, limit arguments become useless. In the case where the sign of 0 depends on the prevailing rounding mode, our definition is consistent with the definition of the sign of x - x. ] Note: According to these rules, when f is an odd function, the operation associated with f may return +0 on some x and also +0 on -x, even in symmetrical rounding directions. Application programmers should be aware of this. [Remark: sinPi, according to the rule, returns +0 on all integers -- it is the case S = { 1, -1, 0 } The "should" is there to allow implementations to favor sign symmetry for such functions. ] D.3 Conformance Any operation that follows the specifications in D.2 of this annex shall be said to conform. Any operation that does not shall be said not to conform. Such conformance is attained for each operation individually: An implementation might conform for some operations only (or none). Any operation may be implemented in a manner that does not conform without affecting the conformance of any other operation or of the implementation as a whole. Also, for given mathematical function, an implementation might include an operation that conforms as well as another that does not, and allow the user to choose between them. Implementations should provide operations that return correctly rounded results for as many of these mathematical functions as permits efficient implementation, for all supported floating-point formats. Languages should define which operations are required or recommended to conform. When a language does not specify an operation as conforming to this annex, each implementation should document the domain, exceptional cases, and worst-case accuracies achieved, and indicate whether the accuracies are proven or measured for a subset of inputs. Operations implemented as specified in D.2 have the properties that they preserve monotonicity. Operations implemented so as to not conform to D.2 should still preserve monotonicity. Note that the rules given in D.2 apply to usual mathematical functions, including those listed in D.4. If, for some mathematical function and some arguments, theses rules are not applicable, the operation shall be said not to conform and implementations should document the behavior of the operation in such cases. [Rationale: There exist particular (pathological) mathematical functions for which applying the rules in D.2 will give different results in different mathematical contexts (topology, axiomatic choices, etc). Defining such mathematical context is outside of the scope of this standard. Most such functions are ad-hoc constructions and are not expected to be common functions used in engineering and other floating-point applications. In particular, their definition domain is often not well-defined. An example is the function x -> x * sqrt(sin(1/x)), which is infinitely often defined and infinitely often undefined around 0. These are the only cases when the above rules do not specify an unambiguous operation. The operation is to be considered outside the scope of the IEEE 754 standard annex D. Implementations may conform to other standards in this case. ] D.4 Informative list of mathematical functions In the following Z = {...,-3,-2,-1,0,1,2,3,...} The definition domains are given in affinely extended real numbers. - log x [0,infty] - log2 x [0,infty] - log10 x [0,infty] - log1p x [-1,infty] - log2p1 x [-1,infty] - log10p1 x [-1,infty] - exp x [-infty,infty] - exp2 x [-infty,infty] - exp10 x [-infty,infty] - expm1 x [-infty,infty] - exp2m1 x [-infty,infty] - exp10m1 x [-infty,infty] - sin x ]-infty,infty[ - cos x ]-infty,infty[ - tan x ]-infty,infty[ \ { Pi/2 + k*Pi, k in Z} - sinpi x ]-infty,infty[ - cospi x ]-infty,infty[ - tanpi x ]-infty,infty[ \ { 1/2 + k, k in Z} - sinh x [-infty,infty] - cosh x [-infty,infty] - tanh x [-infty,infty] - asin x [-1,1] - acos x [-1,1] - atan x [-infty,infty] - asinpi x [-1,1] - acospi x [-1,1] - atanpi x [-infty,infty] - argsinh x [-infty,infty] - argcosh x [0,infty] - argtanh x [-1,1] - atan2(x,y) [-infty,infty] * [-infty,infty] \ {(0,0), (0,infty), (+/-infty,+/-infty)} returns the angle in radians in [0, 2*Pi[ such that a line segment forming this angle with the x axis passes through (x,y) - hypot(x,y) = squareRoot(x2 + y2) [-infty,infty] * [-infty,infty] - analyticPower(x,y) = exp(y * log(x)) [0,infty] * [-infty,infty] \ {(0,0), (infty,0), (1,+/-infty)} / x * x * ... * x (n times) if n >= 1 | 1/algebraicpower(x,-n) if n <= -1 - algebraicPower(x,n) = | 1 if n = 0 \ undefined otherwise [-infty,infty] * Z [Comment: the second argument n of algebraicPower is an integer. The operation associated with this function might consider its second argument as a language-defined integer. ] / principal n-th root of x if n >= 1 and if the root exists - root(x,n) = | 1/root(x,-n) if n <= -1 \ undefined otherwise [0,infty] * (Z \ {0}) U [-intfy,0[ * (Z \ 2Z) [Comment: the second argument n of root is an integer. The operation associated with this function might consider its second argument as a language-defined integer. ] / | algebraicPower(x,y) if y is an integer - conventionalPow(x,y) = | | analyticPower(x,y) if y is not an integer \ ]-infty,0[*Z U [0,infty] * [-infty,infty] \ {(1,+/-infty)} [Comment: this function is there because many existing programs rely on the existence of such an operation. New programs are expected to use the better analyticPower and algebraicPower. conventionalPow is NOT compatible with the C99 pow on 1^infty, NaN0, 1^NaN. It is not expected that many programs rely on these C99 properties. The function may also be considered as purely Language defined and therefore might be withdrawn from the list. ] - erf x [-infty,infty] - erfc x [-infty,infty]