Representation of floating point numbers:
In the IEEE Single-precision representation of a real number, one bit used to represent sing , and it is set 0 for positive number and 1 for negative one. A representation of the exponent is stored in next 8bits and the remaining twenty-three bits are occupied by a representation of the mantissa of the number
Here are some examples:
How to represent real numbers in floating point format:
Examples:
1. Representing 23/4 in single precision floating point number.
=> 23/4 = 5.75
Converting above real number to binary form
=> 101.11 (5 in binary 101, .(2^-1 + 2^-2) = .75)
Representing above binary to SP floating point format (32bit)
[(-1)^S x 2^(E – 127) x 1.M]
=> 1.0111 x 2^2 relating this to above given equation
(Numeric ‘1’ before decimal point is called hidden bit as it is by default given in representation).
Sing S = 0; No. of bits used to represent exponent = 1
Exponent (E – 127) = 2 i.e. E = 129; No. of bits used to represent exponent = 8
Mantissa M = 0111000…. ; No. of bits used to represent Mantissa = 23
Finally 5.75 in SP floating point representations is as shown below 0|10000001|01110000000000000000000
Note: What if the fraction part of a real number cannot be expressed as sum of powers of two (as in the above example .75 = (1/2 + 1/4) ex: 7/5 is exactly 1.4, .4 cannot be expressed in terms of sums of power two, 7/5 has infinity binary expansion 1.011001100110011001100.
In a single precision representation, the expansion is rounded off at the twenty-third digit after the binary point.
2. Extracting real number from SP floating point number representation
11000100000100110000000000000000
1|10001000|0100110000000000000000
S|-----E----|-------------M---------------|
Sign = 1 i.e (-1)1 = -1 negative number
Exponent (10001000) = 127 + e, 136 = 127 + e i.e. exponent = 9;
Mantissa = 1.01001100000000000000000
i.e. Mantissa = (one plus, plus no one halves, plus one quarter (/14), plus no one eight, plus no one sixteenth, plus one thirty second, plus one sixty fourth,…all zeros)
=> (1 + 1/4 + 1/32 + 1/64) = X
=> (64 + 16 + 2 + 1) = X x 64;
=> X = 83/64;
So the complete number = -(83/64) x 29 = -664.00;
Introduction to Floating point representation IEEE 754
Ask your questions below.
Previous Next
News, Trends and discussions in the field of Machine Learning, Computer Vision and Data Science
Related Posts
Twitter Updates
Random Posts
- Digital Signal Processing E-books - (DSP Tutorials)
- Android Development for Embedded Devices - Live Webiners
- Getting Started with Stateflow: Part 1
- DSP Algorithms: Download - Mirrors
- Developing Radio Astronomy Instruments with Simulink Libraries
- Popular Image Recognition and Segmentation
- DSP
- MathWorks Training class in India this April, May, and June
- E-tutorials on Robotics
- Real-Time Simulation of Physical Systems; Accelerating Finite Element Analysis
No comments:
Post a Comment