Format for representing fixed point numbers

Fixed point numbers and number notation:

In Fixed-point numbers the imaginary binary point plays a significant role while interpreting numbers. When dealing with numbers the digits to the left of imaginary binary point represents the integer and the digits to the right of binary point represents fractions. Here we study a format for representing fixed point numbers which we will be using here on.

Q notation (QF):

In this notation the total numbers of fractional bits represent the number format. One bit is assumed for Sign (MSB always). The number format representation doesn’t convey any information regarding the word length.
The notation QF means a number with F bits dedicated for the fractional part. If W represents the word length of the processor, then QF number means, F Fractional Bits, W - (F+1) Integer Bits and
1 Sign Bit.
If F is the number of bits required to represent fraction part QF then
Number of bits to represent Integer Bits = W - (F+1)
By default '1' bit is used for sign representation.

For example,

Q15 means 15 Fractional bits, one Sign Bit and zero bits for integer part.  Ex:0.435.
Q14 means 14 Fractional bits, one Sign Bit and 1 bit for integer part. Ex: 1.435
Q1 means 1 Fractional bit, one Sign Bit and 14 bits for integer part. Ex: 16383.435

How to select a Q-point format, to represent a float value in fixed point format

For example consider a float value 12.435

Steps involved in selecting Q-format for the above given value:

1. Calculate the number of bits needed to represent integer part (QI) of the given float value (4 bits).
2. Calculate the word length needed to represent the float value in fixed point
a. 1 bit for sign representation
b. 4 bits for integer representation.
c. QF = 10, from equation discussed previous post ceiling (log2 (1/ ε)) for given example , ε = 0.0001
d. WL = QI + QF + S, therefore WL = (4 + 10 + 1) = 15 (bits) to guaranty both range and resolution.
3. So Q-format need to represent above float numbers is Q10 format.

Simple Procedure to identify Q-format for given float point value

Assuming that the word length used by the programmer to represent a float value in fixed point format is WL = 16. (This assumption should be made by programmer depending up on the maximum range of input float value).

Note: Float to fixed conversion comes at cost of precision loss.

Consider above example:

Float value = 12.435

(12)10 -> (1100)2 -> 4 bits are required to represent integer part.
Sign -> 1 bit for sing representation

As we have seen before, number of fractional bits taken by float value is the Q-point format with which the float value is represented in fixed point.

Q-format for the above float value is = (16 – (4 + 1)) i.e. QF = 11

So the above float value can be represented in Q11 format.

Fixed Point Representation: Q-formats

Q-Formats


Fixed point number representations are termed as Q-point format(As IEEE 754 standard for floating point representation).

Let us start with the basic notation of a Q-point for a fixed point number.

Convention is as follows

Q [QI]. [QF]

QI -> Integer bits

QF -> Fraction bits

So sum of QI and QF gives the total number of bits that is needed to represent a number in fixed point format.


QI + QF = Word length and this word length corresponds to variable widths supported on various processors. Typical word lengths would be 8, 16 and 32-bit.


For example: Q2.6 number would be an 8-bit number with 2 integer bits and 6 fraction bits.


Fixed point range – Integer portion

The range of a floating point variable in an algorithm sets the number of bits (QI) required to represent the integer portion of the number.


This relationship for unsigned numbers is defined by the equations:

0 ≤ α ≤ 2QI

Where α is floating point variable.

(We will see few examples on this in coming sessions)

If floating point number is a singed value.

-2(QI-1) ≤ α ≤ 2(QI-1)

Where α is floating point variable.


Resolution of a fixed point number – Fractional portion:

The resolution of a fixed point number is set by the remaining fraction bits (QF) for a given word length (WL) for the variable. For a given word length and dynamic range of a variable the resolution is limited. If higher resolution is needed for a given range then the WL of the variable must be increased to provide this resolution.


The resolution ε, of a fixed point number is as follows

ε = 1/ (2QF)


Therefore the number of fractional (QF) bits required for a particular resolution is defined by the equation.

QF = log2 (1/ ε)

However since QF is a integer values only, the results of logarithm result:

QF = celling (log2 (1/ ε))


We will discussion more and in detail math on fixed point numbers in coming posts.

Thanks and have a nice reading

International Conference In MIT,Manipal

International Conference In MIT,Manipal (Among Top 10 Deemed Universities)

An International conference is to be held in MIT,Manipal on December from 10 to 12 (more details in our conference website www.icedsp.org). MIT would be grateful to you if you can fund some sponsorship for the International conference. I am sure your esteemed organisation will benefit from the same.Your help in this regard is much appreciated. Please see the attached file for possible sponsor options.
For further details, please contact Dr. Somashekara Bhat, Department of Electronics and Communication Engineering , MIT Manipal.

Related Posts

Twitter Updates

Random Posts

share this post
Bookmark and Share
| More
Share/Save/Bookmark Share