Representation of floating point numbers:

In the IEEE Single-precision representation of a real number, one bit used to represent sing , and it is set 0 for positive number and 1 for negative one. A representation of the exponent is stored in next 8bits and the remaining twenty-three bits are occupied by a representation of the mantissa of the number

Here are some examples:

How to represent real numbers in floating point format:

Examples:

1. Representing 23/4 in single precision floating point number.

=> 23/4 = 5.75

Converting above real number to binary form

=> 101.11 (5 in binary 101, .(2^-1 + 2^-2) = .75)

Representing above binary to SP floating point format (32bit)

[(-1)^S x 2^(E – 127) x 1.M]

=> 1.0111 x 2^2 relating this to above given equation

(Numeric ‘1’ before decimal point is called hidden bit as it is by default given in representation).

Sing S = 0; No. of bits used to represent exponent = 1

Exponent (E – 127) = 2 i.e. E = 129; No. of bits used to represent exponent = 8

Mantissa M = 0111000…. ; No. of bits used to represent Mantissa = 23

Finally 5.75 in SP floating point representations is as shown below 0|10000001|01110000000000000000000

Note: What if the fraction part of a real number cannot be expressed as sum of powers of two (as in the above example .75 = (1/2 + 1/4) ex: 7/5 is exactly 1.4, .4 cannot be expressed in terms of sums of power two, 7/5 has infinity binary expansion 1.011001100110011001100.

In a single precision representation, the expansion is rounded off at the twenty-third digit after the binary point.

2. Extracting real number from SP floating point number representation

11000100000100110000000000000000

1|10001000|0100110000000000000000

S|-----E----|-------------M---------------|

Sign = 1 i.e (-1)1 = -1 negative number

Exponent (10001000) = 127 + e, 136 = 127 + e i.e. exponent = 9;

Mantissa = 1.01001100000000000000000

i.e. Mantissa = (one plus, plus no one halves, plus one quarter (/14), plus no one eight, plus no one sixteenth, plus one thirty second, plus one sixty fourth,…all zeros)

=> (1 + 1/4 + 1/32 + 1/64) = X

=> (64 + 16 + 2 + 1) = X x 64;

=> X = 83/64;

So the complete number = -(83/64) x 29 = -664.00;

Introduction to Floating point representation IEEE 754

Ask your questions below.

Previous Next

I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Machine Learning , kindly contact us http://www.maxmunus.com/contact

ReplyDeleteMaxMunus Offer World Class Virtual Instructor led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.

For Demo Contact us.

Sangita Mohanty

MaxMunus

E-mail: sangita@maxmunus.com

Skype id: training_maxmunus

Ph:(0) 9738075708 / 080 - 41103383

http://www.maxmunus.com/