## Posted tagged ‘Q31’

### Fixed Point Division of two Q15 Numbers

September 20, 2010

Fixed point division is quite a bit more complicated than fixed point multiplication, and usually takes a lot more cycles than performing a multiplication. When dividing by a known value (a constant), it is usually better to multiply by the reciprocal than to do a division. And when dividing a fixed point number by an integer that is a power of two, a right shift can be used to implement a division. For example, to divide by 16, just shift your dividend right by 4 bits.

But there are cases where it is necessary to do a division by a calculated value. The easiest way to picture how the division should proceed is to think of the inverse of multiplying two Q15 numbers. The multiplication of two Q15 numbers produces a Q30 product. It then makes sense that a Q30 number divided by a Q15 number produces a Q15 result.

Let $a = \hat{a}/{2}^{30}, b = \hat{b}/{2}^{15}$

then $a/b = \frac{\hat{a}/{2}^{30}}{\hat{b}/{2}^{15}}=\left(\hat{a}/\hat{b} \right)/{2}^{15}$

So one procedure for finding a/b has the following steps:
1.convert the dividend a from Q15 to Q30 by shifting left by 15
2.divide the Q30 format number a by the Q15 format number b to get result in Q15

Let’s try an example. Let a = 0.03125 and b = 0.25, then c = a/b = 0.125. The Q15 numbers as hexadecimal integers will be a = 0x0400 and b = 0x2000. In step 1, a becomes 0x02000000 in Q30. In step 2, divide 0x02000000 by 0x2000 to get c = 0x1000 which is 4096 in decimal. As a check, find 4096/32768 = 0.125, the expected result.

In C language code, fixed point Q15 division can be coded as follows:

int16 a;
int16 b;
int16 c;

if ( abs(b) > abs(a) ) {
c = (int16)(((int32)a << 15) / ((int32)b));
}


The casting is very ugly, but this works. Note that I have restricted the result of the division to be less than one. Removing the restriction that the magnitude of the divisor is larger than that of the dividend has an effect on the number of bits required for the result. To see this, try dividing the largest positive Q15 number by the smallest positive Q15 number, which results in a large number with 15 digits in front of the fractional point:

(0x7FFF/0x8000) / (1/0x8000) = (0x7FFF * 0x8000 ) / 0x8000 = 0x3FFF8000 / 0x8000

The result (0x3FFF8000) requires 30 bits, and it will have 15 bits to the left of the fractional point and 15 bits to the right. That is, the most significant bit has a weighting of ${2}^{14}$ and the least significant bit has a weight of ${2}^{-15}$. In my work, I have almost always used Q15 division where the magnitude of the divisor is smaller than that of the dividend.

Along with looking ugly, the C code above for division is often inefficient. The C compiler will likely implement this as a division between two 32 bit numbers. When implementing division on a fixed point DSP chip, I have usually used assembly language coding and made use of a special purpose division instruction.

For example, the Texas Instruments TMS320C55x processor has the “subc” instruction or “conditional subtract.” To perform the type of division I have just described do the following:

1.make the dividend and divisor both positive and note original sign of each
2.load the dividend shifted left by 15 into an accumulator register
3.execute the conditional subtract of the divisor 16 times
4.store the result (in the lower 16 bits of the accumulator )
5.determine the correct sign for the result, and negate it if necessary

Note that a short cut is to load the dividend shifted left by 16 in the first step, and then execute the subc instruction 15 times. This works because it is known that the result will be positive.

Fixed point division is not difficult, but it can take a lot of cycles, and one needs to recognize the need to consider the range of the resulting output.

The C code and output below shows a couple of examples from this tutorial.

Example code:

#include <stdio.h>

typedef short int16;
typedef int int32;

int main( void )
{
int16 a;
int16 b;
int16 c;
int32 d;

// example 1: magnitude of divisor is greater than magnitude of dividend
printf("example 1: magnitude of divisor is greater than magnitude of dividend\n");

a = 0x0400;
b = 0x2000;

if ( abs(b) > abs(a) ) {
c = (int16)(((int32)a << 15) / ((int32)b));
} else {
printf("division error\n");
}

printf("a = %d, b = %d, c = %d\n",a,b,c);
printf("a = 0x%x, b = 0x%x, c = 0x%x\n",a,b,c);

// example 2: no restrictions on divisor other than not 0
printf("\nexample 2: no restrictions on divisor other than not 0\n");

a = 0x7fff;
b = 0x0001;

if ( b != 0 ) {
d = ((int32)a << 15) / ((int32)b);
} else {
printf("division by zero error\n");
}

printf("a = %d, b = %d, d = %d\n",a,b,d);
printf("a = 0x%x, b = 0x%x, d = 0x%x\n",a,b,d);

return 0;
}


Code output:

example 1: magnitude of divisor is greater than magnitude of dividend
a = 1024, b = 8192, c = 4096
a = 0x400, b = 0x2000, c = 0x1000

example 2: no restrictions on divisor other than not 0
a = 32767, b = 1, d = 1073709056
a = 0x7fff, b = 0x1, d = 0x3fff8000

### Rounding in Fixed Point Number Conversions

August 19, 2009

When converting from one fixed point representation to another, there is often a right shift operation to eliminate bits. (Or higher order bits are just stored without keeping the lower order bits.) This occurs when converting from a Q31 to a Q15 format number for example, since 16 bits need to be eliminated. Before throwing away the unused bits, sometimes it is desirable to perform a rounding operation first. This can improve the accuracy of results, and can prevent the introduction of a bias during conversion of a signal. Rounding is also an important operation when generating fixed point filter coefficients from floating point values, but that is not the subject of this post.

To illustrate rounding, I will use an example where six different signed Q7.8 numbers are converted to a signed Q15.0 number (a regular 16 bit integer). I will illustrate truncation (throwing away the least significant eight bits) and rounding. Recall that a Q7.8 number has seven integer bits and eight fractional bits. For the example, the six numbers will be 1.25, 1.5, 1.75, -1.25, -1.5 and -1.75.

The first thing to determine is how these numbers will be represented in a 16 bit integer register. Multiplying each by 256 (which is two to the power eight) gives the following result (in hexadecimal):

1.25 = 0x0140

1.5 = 0x0180

1.75 = 0x01C0

-1.25 = 0xFEC0

-1.5 = 0xFE80

-1.75 = 0xFE40

Now if the numbers are truncated, the result is found by shifting right by eight. Here are the results:

truncate(1.25) = 0x0001 = 1

truncate(1.5) = 0x0001 = 1

truncate(1.75) = 0x0001 = 1

truncate(-1.25) = 0xFFFE = -2

truncate(-1.5) = 0xFFFE = -2

truncate(-1.75) = 0xFFFE = -2

For the positive numbers, the result of truncation is that the fractional part is discarded. The negative number results are more interesting. The result is that the fractional part is lost, and the integer part has been reduced by one. If a series of these numbers had a mean of zero before truncation, then the series would have a mean of less than zero after truncation. Rounding is used to avoid this problem of introduced bias and to make results more accurate.

Truncation is not really the correct term for the example above. More accurately, a “floor” operation is being executed. A floor operation returns the greatest integer that is not greater than the operand.

In a common method of rounding, a binary one is added to the most significant bit of the bits that are to be thrown away. And then a truncation is performed. In the current example, we would add 0.5, represented as 128 decimal or 0x0080 in our 16 bit integer word. So the results in our example are as follows:

round(1.25) = (0x0140 + 0x80) >> 8 = 0x0001 = 1

round(1.5) = (0x0180 + 0x80) >> 8 = 0x0002 = 2

round(1.75) = (0x01C0 + 0x80) >> 8 = 0x0002 = 2

round(-1.25) = (0xFEC0 + 0x80) >> 8 = 0xFFFF = -1

round(-1.5) = (0xFE80+ 0x80) >> 8 = 0xFFFF = -1

round(-1.75) = (0xFE40 + 0x80) >> 8 = 0xFFFE = -2

These results are less problematic than using simple truncation, but there is still a bias due to the non-symmetry of the 1.5 and -1.5 cases. The amount of bias depends on the data set. Even if a set of data to be converted contained only positive values, there is still a bias introduced, because all of the values that end in exactly .5 are rounded to the next highest integer. One way to eliminate this bias is to round even and odd values differently (even and odd to the left of the rounding bit position).

For the more common conversion of Q31 to Q15 numbers, the rounding constant is one shifted left by fifteen, or 32768 decimal, or 0x8000 hexadecimal.

Some of the Texas Instrument DSPs have rounding instructions that can be performed on the accumulator register prior to saving a result to memory. For example, the TMS320C55x processor includes the ROUND instruction (full name is “round accumulator content”). The instruction has two different modes. The “biased” mode adds 0x8000 to the 40 bit accumulator register. The “unbiased” mode conditionally adds 0x8000 based on the value of the least significant 17 bits. It is designed to address the bias problems I described above. Wikipedia has a good discussion of rounding and bias errors (http://en.wikipedia.org/wiki/Rounding). The TMS320C55x is using the “round half to even” method of rounding for the unbiased mode, and “round half up” for the biased mode.

Although it seems simple on the surface, rounding in fixed point conversions has some important effects on the bias of resulting computations.