Representation of floating point numbers:
In the IEEE Single-precision representation of a real number, one bit used to represent sing , and it is set 0 for positive number and 1 for negative one. A representation of the exponent is stored in next 8bits and the remaining twenty-three bits are occupied by a representation of the mantissa of the number
Here are some examples:
How to represent real numbers in floating point format:
Examples:
1. Representing 23/4 in single precision floating point number.
=> 23/4 = 5.75
Converting above real number to binary form
=> 101.11 (5 in binary 101, .(2^-1 + 2^-2) = .75)
Representing above binary to SP floating point format (32bit)
[(-1)^S x 2^(E – 127) x 1.M]
=> 1.0111 x 2^2 relating this to above given equation
(Numeric ‘1’ before decimal point is called hidden bit as it is by default given in representation).
Sing S = 0; No. of bits used to represent exponent = 1
Exponent (E – 127) = 2 i.e. E = 129; No. of bits used to represent exponent = 8
Mantissa M = 0111000…. ; No. of bits used to represent Mantissa = 23
Finally 5.75 in SP floating point representations is as shown below 0|10000001|01110000000000000000000
Note: What if the fraction part of a real number cannot be expressed as sum of powers of two (as in the above example .75 = (1/2 + 1/4) ex: 7/5 is exactly 1.4, .4 cannot be expressed in terms of sums of power two, 7/5 has infinity binary expansion 1.011001100110011001100.
In a single precision representation, the expansion is rounded off at the twenty-third digit after the binary point.
2. Extracting real number from SP floating point number representation
11000100000100110000000000000000
1|10001000|0100110000000000000000
S|-----E----|-------------M---------------|
Sign = 1 i.e (-1)1 = -1 negative number
Exponent (10001000) = 127 + e, 136 = 127 + e i.e. exponent = 9;
Mantissa = 1.01001100000000000000000
i.e. Mantissa = (one plus, plus no one halves, plus one quarter (/14), plus no one eight, plus no one sixteenth, plus one thirty second, plus one sixty fourth,…all zeros)
=> (1 + 1/4 + 1/32 + 1/64) = X
=> (64 + 16 + 2 + 1) = X x 64;
=> X = 83/64;
So the complete number = -(83/64) x 29 = -664.00;
Introduction to Floating point representation IEEE 754
Ask your questions below.
Previous Next
News, Trends and discussions in the field of Machine Learning, Computer Vision and Data Science
Related Posts
Twitter Updates
Random Posts
- Call for Papers: MATLAB EXPO 2013 India
- Object-Oriented Programming in MATLAB®
- TI Cache - L1P, L1D, L2 and DMA, EDMA configuration - (Mirrors)
- Accelerating NASA GN&C Flight Software Development; Making Test Data in MATLAB
- TI resources for your TMS320C5000 DSP
- Image Acquisition and Processing using MATLAB
- OpenVINO - Train and Deploy Neural Network (AI Model) in seconds onto IoT Edge device
- Designing and Implementing Signal Processing, Video Processing and Communication Systems using MATLAB and Simulink
- TI DSP code optimization - Optimizing your C code on TMS320C6000 TI DSPs
- OpenVINO Ready To Deploy AI Vision Module
No comments:
Post a Comment