I have created a new post to replace this one. Click here to see it.
Fixed point addition and subtraction are straightforward. Additions and subtractions are performed using integer operations. For example, if two 16 bit Q15 format numbers are added, the result is a Q15 number. But what about fixed about multiplication? What happens if two Q15 numbers are multiplied?
Let’s try an example. Take 0.5 multiplied by 0.25. In Q15 the number 0.5 is represented (in hexadecimal) as 0x8000 times 0.5 or 0x4000. Similarly, 0.25 is 0x2000. When we multiply these together, the product is 0x08000000. Obviously the result is not a Q15 number since the number of bits required is more than 16. The expected product, 0.125, is 0x1000 in Q15.
To see what is going on, define the following two Q15 numbers a and b:
where and are the integer representations of our numbers (0x4000 and 0x2000 in our example). The product of a and b is:
From the above, it can be seen that the product is a Q30 number. Going back to our example, 0x4000 times 0x2000 is 0x08000000, which is 0.125 times .
A general rule when multiplying a Qm format number by a Qn format number, is that the product will be a Q(m+n) number. The number of bits required to represent the product is at least (n+m) for unsigned multiplication and (n+m+1) for signed (twos complement) multiplication.
For the more general case of a Qa.b number times a Qc.d number, the product is Q(a+c).(b+d). The number of bits needed for the result is (a + b + c + d + 1) for signed numbers (and one less for unsigned numbers).
Consider the example of a Q16 unsigned multiplication between the two largest unsigned numbers that can be represented. The largest Q16 number is 65535/65536 = 0.9999847412109375. The product is 0xffff times 0xffff or 0xfffe0001. The result is a Q32 number requiring at least 32 bits. If we divide by then we get 0.99996948265470564365386962890625, the expected result.
There are a number of things that are done with the product of a multiplication, depending on the application. Some of the commonly seen options are:
Convert the product to a different Q format.
Use the product in the resulting Q format.
Add the product to a running sum in an accumulator register.
Convert the product to a different Q format, then add to a running sum.
Let’s look at some of these options for the case of signed multiplication using Q15 format numbers. For case 1, assume we want to multiply two Q15 numbers and get a Q15 result. The required operation is to take the Q30 product, and shift it right by 15 bits. The result can then be stored in 16 bits. There is also the option of rounding the product before shifting out the lower 15 bits (I may discuss rounding in a future post). Some CPU architectures are better set up to shift the product left by 1, and then store the upper 16 bits. This is almost exactly the same as shifting right by 15 bits and keeping the lower 16 bits.
Multiply-accumulate (MAC) operations are used a lot in many DSP algorithms. Many processors have one or more dedicated accumulator registers for this purpose (often with 32 or 40 bits). For the case of Q15 multiplies, each Q30 product can be summed to the accumulator.
I have seen a lot of code that shifts each product left by 1 when performing the MAC operations. Some DSP chips can do the left shift in hardware using a special mode of the ALU. In this case, the value in the accumulator is in Q31 format. Although very common, this method has a greater chance of overflow problems since each product is effectively two times bigger. I think this method became popular because certain older DSP chip architectures required the storing of the high 16 bits of the accumulator or product register, rather than having a single cycle instruction allowing a shift by 15 bits.
In summary, because multiplication operations are often a chief component of signal processing implementations, it is important to understand how they work. This is especially true for fixed point operations, where one must know the effect of multiplication on the format of the numbers themselves.
Fixed point representation is a method of storing numbers in binary format. It is widely used in DSP products for telecommunications. One reason to use to use fixed point format (rather than floating point) is for cost savings in the digital signal processing chips used for implementing a system. Another reason is to have greater precision than floating point for a given number of bits per number represented.
So what is fixed point? Fixed point refers to a method of representing numbers with a fractional part on an ALU that only handles integer operations. Companies will often market their DSP processors as either “fixed point” or “floating point” models. Fractional digits of a number are handled with integers by using an assumed scaling factor, such as . This is commonly called Q15 notation.
To understand how Q15 numbers are handled, it helps to understand the twos complement representation of integers. I won’t explain it here, but there are plenty of explanations on the web (on Wikipedia for example). Now assume we have a 16 bit twos complement integer a as follows, where s is a sign bit and is the digit at bit n:
For a positive integer a, the value of the number a is:
For a Q15 number, the weightings of each bit change. Let . Then the value of a positive Q15 number b is:
The range of numbers that can be handled is from -1 to 0.999969482421875, corresponding to the integers -32768 and 32767, which are the smallest and largest integers that can be represented in 16 bit twos complement.
Negative numbers are found by taking the twos complement of a given positive value, in the same manner as for integers. One way to do this is to flip all the bits (change all the ones to zeros, change all the zeros to ones) and then add one. For example, assume we have the Q15 number 0.5, which is 0x4000 in hexadecimal. Flipping the bits gives 0xBFFF, and adding one produces 0xC000. So a Q15 value of -0.5 is stored as 0xC000. One case that does not work is if one tries to negate a Q15 value of -1, since +1 is not possible in Q15. If you try flipping the bits and adding one, you will end up with a result of -1 instead of +1. This is an overflow case, and some DSP chips will produce the largest positive Q15 number (32767/32768) if -1 is negated.
Q15 isn’t the only possible representation of course. It is possible to use different scaling factors and different bit widths. For example, Q31 is common for 32 bit number widths. When using unsigned operations, it is possible to store a Q16 number in a 16 bit word. In this case we have:
It is also possible to mix integer and fractional parts in one word. For example, we could have a number with 3 integer bits and 12 fractional bits stored in a 16 bit word. Lets call this format Q3.12. The range of numbers in Q3.12 using 16 bit twos complement is to , or -8 to 7.999755859375.
And finally, there is no rule that says a Q15 number can’t be stored in a 32 bit register. In this case, there would be 17 bits used for the sign part of the number (for twos complement).
So in conclusion, there are a large variety of number formats that can be handled when using a processor or hardware with integer-based arithmetic. Fixed point representation relies on an assumed scaling factor and a given register width. These two variables determine where the decimal point is. Fixed point arithmetic was taught to me by my first boss during the first few days at my first full time job. As my boss used to say “real men know where to find the decimal point.” The difficulty in finding the decimal point will become more clear when I cover fixed point multiplication.
Hello and welcome to my blog. My goal for this blog is to provide information and short tutorials on the practical application of digital signal processing (DSP). I hope it will be useful to students, hobbyists, and others as well. My intention is to cover application of DSP, not to cover theory (which is already available in well established textbooks).
My background is in computer telephony and voice over IP. I spent many years implementing speech compression codecs, echo cancellers, voiceband modems and tone processing algorithms. These were run on a variety of platforms that handled between one and hundreds of channels of telephone calls.
Some topics I plan to write about include fixed point representation, overflow handling, frequency estimation, tone generation, power measurement, multi-rate filtering and adaptive filtering.