A Machine Learning and Data Science Blog: Floating point arithmetic

Floating point subtraction:
How its works? Here are the steps involve in floating point subtraction.
Assume both the operands are in IEEE 754 floating point format. Performing floating point subtraction between ‘A’ and ‘B’
Single Precision, floating point representation. A = (-1)^ S x 2^Ae x Am
S = sign bit,
Ae = Exponent of operand A i.e. E – 127 (assuming Single precision),
Am = Mantissa. Similarly for operand B (S, Be, Bm).

Therefore A - B = (Am x 2^Ae - Bm x 2^Be).

Step by step procedure explaining floating point subtraction:
1. Aligning binary points of A and B
a. Compare Ae and Be. Take the lager and compute exponent difference Be – Ae (Be > Ae).
2. If Ae > Be right shift Bm that many positions to form Bm x 2^(Be – Ae).(or) If Be > Ae right shift Am that many positions to form Am x 2^(Ae – Be).
3. Compute the sum of aligned mantissa i.e.
Bm x 2^(Be – Ae) - Am (or) Am x 2^(Be – Ae) - Bm
4. If normalization of result is need, steps to perform
a. If result looks like (0.001001…) then reduce the exponent by left shifting the result.
b. If result looks like (101.01001…) then increase the exponent by right shifting the result.
Continue above steps (a or b) until MSB (hidden bit in IEEE 754 standard) is 1.
5. Check result exponent.
a. If larger than allowed exponent allowed return exponent overflow.
b. If smaller than allowed exponent allowed return exponent underflow.
6. If mantissa is equal to zero set exponent to zero.

Floating point multiplication:
How it works? Here are the steps involve in floating point multiplication.
Assume both the operands are in IEEE 754 floating point format. Performing floating point multiplication between ‘A’ and ‘B’

Single Precision, floating point representation. A = (-1)^ (AS) x 2^Ae x Am
Step by step procedure explaining floating point subtraction
1. If any of the operands are equal to zero, return result as zero.
2. Compute the sign: AS XOR BS
3. Multiply the mantissa’s : Am x Bm, and round it to allowed number of mantissa bits.
4. Compute the exponent of the result.
a. Result exponent = biased exponent A + biased exponent B – bias;
5. Normalize the result shift the mantissa, increment result exponent if needed
6. Check the result exponent:
a. If larger than maximum exponent allowed then return overflow.
b. If smaller than maximum exponent allowed then return underflow.

A Machine Learning and Data Science Blog

Floating point arithmetic - II

No comments:

Post a Comment

Related Posts

Twitter Updates

Random Posts

Disclaimer

Recent Comments