Thanks for taking a look at my blog. I didn’t discuss division by zero in my post, but the example code does do a check for it. Whenever I have done fixed point division in my own work, I’ve usually had a check that the magnitude of the divisor is larger than the dividend, which automatically excludes division by zero.

I don’t see any problem with defining division by zero as having a particular value, as in your work. I think in any case, computer code written in a high level language would need to explicitly check the divisor, since the arithmetic logic unit will not handle division by zero in a well defined way. Values close to zero for the divisor are also problematic, but can be handled for fixed point division if there are enough bits in the result. I show an example in my post above.

Cheers,

Shawn

Division by Zero z/0 = 0 in Euclidean Spaces

Hiroshi Michiwaki, Hiroshi Okumura and Saburou Saitoh

International Journal of Mathematics and Computation Vol. 28(2017); Issue 1, 2017), 1

-16.

http://www.scirp.org/journal/alamt http://dx.doi.org/10.4236/alamt.2016.62007

http://www.ijapm.org/show-63-504-1.html

http://www.diogenes.bg/ijam/contents/2014-27-2/9/9.pdf

http://okmr.yamatoblog.net/division%20by%20zero/announcement%20326-%20the%20divi

http://okmr.yamatoblog.net/

Relations of 0 and infinity

Hiroshi Okumura, Saburou Saitoh and Tsutomu Matsuura：

http://www.e-jikei.org/…/Camera%20ready%20manuscript_JTSS_A…

https://sites.google.com/site/sandrapinelas/icddea-2017

Thanks for checking out my blog.

If you multiply S8.7 and S9.6 numbers, then the integer part alone will require 17 bits. So with a 16 bit integer to store the result, you would be short one bit, plus you would need more bits for the sign. If it is okay to lose the upper 10 integer bits, then you could multiply to get Q17.13 (stored in 32 bits), shift right by 5 bits, and then keep the lower 16 bits. Two things to keep in mind: #1 The right shift should maintain the sign of the number by shifting in ones if the number is negative. #2 If the upper 17 bits (of the 32 bit shifted number) are not all zeros or all ones, then overflow has occurred and the result will not fit in 16 bits.

I hope you understand what I mean. Let me know if you don’t.

Cheers,

Shawn

Thanks for your explanation on fixed point multiplication.

I have a question :

Say I have two 16 bit number S.8.7 and S,9.6 formats. And I need to store the data in 16 bit itself (consider S.7.8). In this case how will the multiplication take place. the output format considers only 7bit for integer and has more bits for fraction unlike input formats. ]]>

Correction:

0.875 + 0.5 = 1.375

Example with Q3’s:

0111 + 0100 = 1011 = -0.625

Again with Q1.3:

00111 + 00100 = 01011 = +1.011 = 1.375

Still I find your the first text passage in this article very misleading, because you also have to take a possible overflow into account, no?

Also I want to say to say that I like your tutorial page in general and appreciate your efforts.

]]>So you need a Q1.3 number to represent the sum, which requires at least 5 bits for signed numbers, but could be done with 4 bits using unsigned integer math. The same is true with Q.15 numbers. If the sum is greater than 32767/32768 then you need more than 16 bits to store the result, if using signed integers. The result would not be Q.16, but Q1.15. The least significant bit still has the weight 2^(-15).

I hope that clarifies things a bit.

Cheers,

Shawn

How can this be correct? The result would be a Q16-number. Both addends need to be extended by their MSB to avoid information loss.

Example with Q3’s:

7 + 4 =

0111 + 0100 = 1011 = {-5}_10

Again:

0111 + 0100

00111 + 00100 = 01011 = {11}_10

The value 1 is actually a very small number in Q15 format, so you will end up getting 0 when you multiply by it, unless your coefficient is sufficiently large. Don’t forget that the samples and the coefficients are both in Q15 format.

Try the following set of numbers as input: {32767,0,0,0,0,0,0,0,0,0,0} This is approximately 1 (in Q15) followed by 10 zeros. The output should have the sequence {-39, -129, -127, 68, 218} in it.

I hope that helps.

Cheers,

Shawn

If I wanted to test out your code using a test vector, what should I use as inputs (as I don’t have a pcm file)? I have tried {1,2,3,4,5,6,7,8,9,10,11} as my input data of size 11 with the following filter coefficients (in Q15 format): {-39, -129, -127, 68, 218} with MAX_INPUT_LEN and MAX_FLT_LEN set accordingly to 11 and 5. I’m unsure how to interpret the results as well.

For example, in the first iteration of the outer for loop, it will multiply -39 with 1 (which would -0.0011983 * 1). Shouldn’t the output, in Q15 format, be -39, which corresponds to -0.0011902 (-39/(2^15)). However, in your implementation, after the right shift by 15, the result is 0.

Thanks

]]>