Tone Detection Using the Teager Kaiser Energy Operator (Part 2)

In this section I show how to use the Teager Kaiser Energy Operator (TKEO) to detect a particular frequency in the presence of noise. To reduce the effect of noise, it is helpful to apply a bandpass filter to the input before applying the TKEO. This step reduces the noise power for wideband noise sources, and the tone detection is more robust. Example C code is used to demonstrate a TKEO tone detector. The C code has been written to be somewhat efficient in terms of cycle usage. For example, it avoids the inverse sin that is normally used when applying the TKEO. The code uses floating point calculations and uses the DESA-1 algorithm.

The C code is broken up into three different files: tone_detect.h, tone_detect.c and test.c. The test program reads an input audio file, and for every block of samples, prints out whether a 2100 Hz tone has been detected. If the VERBOSE flag is defined, some other variables are printed as the program progresses. For the input file, use a headerless 16 bit PCM file sampled at 8000 Hz.

First let’s look at the test program, shown in the listing for test.c below:

/*
 * test.c
 *
 * Tone detection test program.
 *
 * Copyright 2011 Shawn Stevenson
 */

#include <stdio.h>
#include <stdint.h>
#include "tone_detect.h"

// convert 16 bit ints to doubles
void intToFloat( int16_t *input, double *output, int length )
{
    int i;

    for ( i = 0; i < length; i++ ) {
        output[i] = (double)input[i];
    }
}

// bandpass filter centred at 2100 Hz
#define FILTER_LEN  72
double bpCoeffs2100[ FILTER_LEN ] =
{
 0.000855037106816965, -0.007105100340628934, -0.00325542536061447,
 0.008648819562695398, 0.0002908507166964641, -0.008082866754749875,
 -2.832955591514462e-005, 0.01160845564698974, -0.002386174087604817,
 -0.009877209424556482, 0.003588542116928629, 0.009544822251205032,
 -0.004157804755432491, -0.005008754900179291, 0.002669720242375797,
 0.001327235424452726, 0.001838879719581862, 0.00440640778716743,
 -0.009195364108012829, -0.007705758791592889, 0.02023145758312881,
 0.01004379125906441, -0.03306027244792958, -0.0077541079049243,
 0.04764593013463465, 0.002396470906295246, -0.06083762082629814,
 0.008149286035965746, 0.07218880251258668, -0.02120115257580074,
 -0.07853592131739524, 0.03714527822488416, 0.08025956829330519,
 -0.05228681035888545, -0.0754630691423418, 0.06623565266616768,
 0.06623565266616768, -0.0754630691423418, -0.05228681035888545,
 0.08025956829330519, 0.03714527822488416, -0.07853592131739524,
 -0.02120115257580074, 0.07218880251258668, 0.008149286035965746,
 -0.06083762082629814, 0.002396470906295246, 0.04764593013463465,
 -0.0077541079049243, -0.03306027244792958, 0.01004379125906441,
 0.02023145758312881, -0.007705758791592889, -0.009195364108012829,
 0.00440640778716743, 0.001838879719581862, 0.001327235424452726,
 0.002669720242375797, -0.005008754900179291, -0.004157804755432491,
 0.009544822251205032, 0.003588542116928629, -0.009877209424556482,
 -0.002386174087604817, 0.01160845564698974, -2.832955591514462e-005,
 -0.008082866754749875, 0.0002908507166964641, 0.008648819562695398,
 -0.00325542536061447, -0.007105100340628934, 0.000855037106816965
};

int main( int argc, char *argv[] )
{
    int16_t intData[BLOCK];
    double inputData[BLOCK];
    int numWords;
    int sampleCount;
    char *inFileName;
    FILE *inFile;

    inFileName = NULL;

    if ( argc == 2 )
    {
        inFileName = argv[1];
        printf("input file name = %s\n",inFileName);
    }
    else
    {
        printf("Incorrect arguments, usage: tone_detect filename\n");
        return(1);
    }

    inFile = fopen(inFileName,"rb");
    if ( !inFile )
    {
        printf("Exiting.Cannot open input file %s\n",inFileName);
        fclose( inFile );
        return(1);
    }

    // initialization - frequecy thresholds at +/- 2% of 2100 Hz
    static toneDetClass toneDetector;
    toneDetectInit(FILTER_LEN, bpCoeffs2100, 2058.0, 2142.0, &toneDetector);

    // start counting frames
    sampleCount = 0;

    numWords = fread(intData, sizeof(int16_t), BLOCK, inFile );

    // until end of file
    while( numWords == BLOCK )
    {
        intToFloat( intData, inputData, numWords );
        printf("sample = %d ",sampleCount);

        // try detecting the tone
        int detection = toneDetect(inputData, &toneDetector);
        printf("Tone Detection result is %d\n\n",detection);

        sampleCount += BLOCK;
        numWords = fread(intData, sizeof(int16_t), BLOCK, inFile );
    }

    printf("\nFinished. sampleCount = %d\n",sampleCount);

    fclose( inFile );
    return 0;
}

The first function in test.c converts integers to double precision floating point. This is used to convert the integer samples in the input file to floating point.

Next comes a set of coefficients for a bandpass filter centred at 2100 Hz. When designing your own detector, note that the bandpass does not need to have a particularly sharp transition band. The bandpass filter need not be really narrow unless the tone detection has to work in a very noisy environment.

Finally comes the main function, which first reads the name of the input file and opens it. One of the first steps is to initialize the tone detector, which is done with a call to the toneDetectInit function. The arguments to this function are the filter length of the bandpass filter, the address of the coefficient table, the low frequency threshold, the high frequency threshold, and a pointer to a tone detector structure of type toneDetClass. In the example here, the low and high frequency thresholds are set to minus and plus two percent of the frequency to detect (which is 2100 Hz).

The main loop of the main function reads the samples in the input file one block at a time, and applies the tone detection as the data comes in. The block of integer samples is converted to doubles, and then passed to the toneDetect function. This function takes two arguments, a pointer to the input data, and a pointer to a tone detector structure (the one initialized earlier). It returns an integer value: 1 when a tone is detected and 0 otherwise.

Now onto more details about how the tone detection is implemented. Below is the listing for tone_detect.h, which contains definitions for a TKEO tone detector and an FIR filter.

/*
 * tone_detect.h
 *
 * Tone detection definitions.
 *
 * Copyright 2011 Shawn Stevenson
 */

#ifndef TONE_DETECT_H
#define TONE_DETECT_H

#include <stdint.h>

#define BLOCK   40      // samples processed each invocation
#define FS      8000    // sampling frequency

#define VARIANCE_THRESHOLD  4.0     // roughly (200Hz)^2
#define DETECT_TIME_THRESH  10      // 50 ms

// DESA-1 Detector Class
typedef struct
{
    double diff0;  // delayed differences
    double diff1;
    double diff2;
    double diff3;
    double x1;     // delayed inputs
    double x2;
    double x3;

} desaClass;

// FIR filter definitions

// maximum number of input samples
#define MAX_INPUT_LEN   BLOCK
// maximum length of filter than can be handled
#define MAX_FLT_LEN     150
// buffer to hold all of the input samples
#define BUFFER_LEN      (MAX_FLT_LEN - 1 + MAX_INPUT_LEN)

// FIR Filter Class
typedef struct
{
    int     filterLength;   // number of filter coefficients
    double *coeffs;         // bandpass filter coefficients
    double  inSamp[BUFFER_LEN]; // input sample history

} FIRClass;

// Tone Detector Class
typedef struct
{
    int         detectTime; // detection time in blocks
    double      lowThresh;  // lower frequency threshold
    double      highThresh; // higher frequency threshold
    desaClass   desa;       // DESA-1 detector
    FIRClass    bp;         // bandpass filter

} toneDetClass;

int toneDetect(double *input, toneDetClass *o);
void toneDetectInit(int filterLength, double *coeffs,
    double low, double high, toneDetClass *toneDet);

#endif

First come a number of constants. BLOCK is the block size, or number of samples processed in each invocation of the detector. It is set to 40, which corresponds to 5 ms at an 8000 Hz sampling rate. Next is FS the sampling frequency, which is 8000 Hz. VARIANCE_THRESHOLD is used to reduce false detections due to spurious inputs, and is helpful for reducing false detections on speech. The tone detector averages frequency estimates, and by ensuring that the variance of the estimates is sufficiently low, the detector is more likely to trigger on tones only. Finally comes DETECT_TIME_THRESH which is used to set the number of consecutive blocks that must contain the target tone before a positive result is output. It is set to 10, meaning ten blocks of 40 samples, or 50 ms.

Next comes the definition of the DESA detector class (or structure). The desaClass structure contains the state variables necessary for calculating the frequency estimates for the DESA-1 algorithm. Following this class are definitions for the bandpass filter (and a FIR filter class). The final structure definition is for the toneDetClass class. This structure has all the elements needed for the tone detector, including the FIR filter and the DESA detector.

And finally come the function prototypes for the tone detector and its initialization function.

Now for the main part of the program containing the detection code. Below is the listing for tone_detect.c.

/*
 * tone_detect.c
 *
 * Tone detection using the DESA-1 algorithm.
 *
 * Copyright 2011 Shawn Stevenson
 */

#include <stdio.h>
#include <stdint.h>
#include <memory.h>
#include <math.h>
#include "tone_detect.h"

#define VERBOSE // print debugging information

static void firInit(int filterLength, double *coeffs,
    FIRClass *o);
static void firFilt(double *input, double *output, int len,
    FIRClass *o );

/***********************************************************
 * Purpose: produce frequency estimate with DESA-1 algorithm
 *
 * Parameters:
 *      input       pointer to input samples
 *      mean        [out] the mean frequency measurement
 *      variance    [out] the variance of the measurements
 *      o           [in,out] the desa detector object
 **********************************************************/
static void desa1( double *input, double *mean,
    double *variance, desaClass *o)
{
    double num; // numerator
    double den; // denominator
    double freq[BLOCK]; // frequency estimates
    int i;

    // calculate the frequency estimate for each sample
    for ( i = 0; i < BLOCK; i++ )
    {
        o->diff0 = input[i] - o->x1;
        num = o->diff2 * o->diff2 - o->diff1 * o->diff3
            + o->diff1 * o->diff1 - o->diff0 * o->diff2;
        den = o->x2 * o->x2 - o->x1 * o->x3;
        freq[i] = num/den;
        // handle errors - division by zero
        if (isnan(freq[i]))
        {
            freq[i] = 0;
        }

        o->diff3 = o->diff2;
        o->diff2 = o->diff1;
        o->diff1 = o->diff0;
        o->x3 = o->x2;
        o->x2 = o->x1;
        o->x1 = input[i];
    }

    // calculate average frequency (mean times block size)
    *mean = 0.0;
    for ( i = 0; i < BLOCK; i++ )
    {
        *mean += freq[i];
    }

    // calculate the variance in the frequency estimates
    // (variance times the square of the block size)
    *variance = 0.0;
    for ( i = 0; i < BLOCK; i++ )
    {
        *variance += freq[i] * freq[i];
    }
    *variance *= (double)BLOCK;
    *variance -= (*mean * *mean);
}

/***********************************************************
* Purpose: initialize state variables for DESA-1 algorithm
*
* Parameters:
*      desa        [out] the desa detector object
**********************************************************/
static void desa1Init(desaClass *desa)
{
    memset(desa, 0, sizeof(desa));
}

/***********************************************************
* Purpose: detect a tone
*
* Parameters:
*      input       pointer to input samples
*      o           [in,out] the tone detector object
*
* Return value: 1 when tone is detected, 0 otherwise
**********************************************************/
int toneDetect(double *input, toneDetClass *o)
{
    static double bpOutput[BLOCK]; // bandpassed samples
    int result = 0;

    // bandpass filter the input
    firFilt(input, bpOutput, BLOCK, &o->bp);

    // apply DESA-1 to get frequency estimate
    double mean;
    double variance;
    desa1( bpOutput, &mean, &variance, &o->desa );
#ifdef VERBOSE
    printf("mean = %f var = %f detectTime = %d ",
        mean, variance, o->detectTime);
#endif

    // check estimate against frequency thresholds
    // and variance threshold
    if ((mean > o->lowThresh)  &&
        (mean < o->highThresh) &&
        (variance < VARIANCE_THRESHOLD))
    {
        // increment detection time and check for enough
        // consectutive detections
        o->detectTime += 1;
        if (o->detectTime >= DETECT_TIME_THRESH)
        {
            result = 1;
            o->detectTime = DETECT_TIME_THRESH;
        }
    }
    else
    {
        // no detection
        o->detectTime = 0;
        result = 0;
    }

    return result;
}

/***********************************************************
 * Purpose: Initialize the tone detector object
 *
 * Parameters:
 *      filterLength    number of bandpass filter coeffs
 *      coeffs          [in] pointer to filter coefficients
 *      low             low frequency threshold (Hz)
 *      high            high frequency threshold (Hz)
 *      toneDet         [out] tone detector object
 **********************************************************/
void toneDetectInit(int filterLength, double *coeffs,
    double low, double high, toneDetClass *toneDet)
{
    toneDet->detectTime = 0;
    toneDet->lowThresh = sin(M_PI*low/FS);
    toneDet->lowThresh *= (BLOCK * 8 * sin(M_PI*low/FS));
    toneDet->highThresh = sin(M_PI*high/FS);
    toneDet->highThresh *= (BLOCK * 8 * sin(M_PI*high/FS));
#ifdef VERBOSE
    printf("low frequency threshold %f\n",toneDet->lowThresh);
    printf("high frequency threshold %f\n",toneDet->highThresh);
    printf("variance threshold %f\n\n",VARIANCE_THRESHOLD);
#endif
    desa1Init(&toneDet->desa);
    firInit(filterLength, coeffs, &toneDet->bp);
}

/***********************************************************
 * Purpose: Initialize the FIR filter object
 *
 * Parameters:
 *      filterLength    number of taps in the filter
 *      coeffs          [in] pointer to filter coefficients
 *      o               [out] FIR filter object
 **********************************************************/
static void firInit(int filterLength, double *coeffs,
    FIRClass *o)
{
    o->filterLength = filterLength;
    o->coeffs = coeffs;
    memset( o->inSamp, 0, sizeof( o->inSamp ) );
}

/***********************************************************
 * Purpose: Apply FIR filter to a series of input samples
 *
 * Parameters:
 *      input           [in] pointer to input sample buffer
 *      output          [out] pointer to output sample buffer
 *      len             number of input samples to process
 *      o               [out] FIR filter object
 **********************************************************/
static void firFilt(double *input, double *output, int len,
    FIRClass *o )
{
    double acc;     // accumulator for MACs
    double *coeffp; // pointer to coefficients
    double *inputp; // pointer to input samples
    int n;
    int k;

    // put the new samples at the high end of the buffer
    memcpy( &o->inSamp[o->filterLength - 1], input,
        len * sizeof(double) );

    // apply the filter to each input sample
    for ( n = 0; n < len; n++ )
    {
        // calculate output n
        coeffp = o->coeffs;
        inputp = &o->inSamp[o->filterLength - 1 + n];
        acc = 0;
        for ( k = 0; k < o->filterLength; k++ )
        {
            acc += (*coeffp++) * (*inputp--);
        }
        output[n] = acc;
    }
    // shift input samples back in time for next time
    memmove( &o->inSamp[0], &o->inSamp[len],
        (o->filterLength - 1) * sizeof(double) );

}

Near the top is a #define for printing extra debugging information. Comment it out if you want to reduce the amount of printing when the program is run. The first function in the file is desa1 which implements the DESA-1 algorithm. The arguments to the function are a pointer to a block of input samples, a pointer to the mean of the frequency estimates (an output), a pointer to the variance of the frequency estimates (another output), and a pointer to detector object which stores the state of the algorithm.

The code starts out by applying the DESA-1 algorithm to the input samples to produce a block of frequency estimates freq[i]. Notice that the calculation leaves out a number of multiplications, a division, a square root, and the inverse sine. I did this to speed up the computations. Since the tone detector is going to look for a particular frequency, there is no need to follow the calculations through to get the answer in Hz. Instead, the thresholds for comparison will be presented in the same units as the estimates calculated.

Next the code calculates the average of the frequency estimates. The average smooths out short term variations in the frequency estimates. Again, a division is avoided by calculating the mean times the block size, instead of the actual mean. Also note that since the estimates are not in Hz (missing square root, and inverse sine) I am calculating the mean of a quantity that is non-linear as a function of the actual frequencies present. This is not a big problem since the values are nearly linear over a small range of values close to the frequency to be detected.

Finally, the desa1 function calculates the variance of the block of frequency estimates. The variance is used to ensure that the estimates are close to the frequency of interest. The mean on its own is not sufficient since it’s possible to have a block of samples with large deviations from the target, but with an average that is right on target. Again I have cut down on computations, so that the quantity calculated is actually the variance times the square of the block size.

Next comes the desa1Init function, which does nothing else but initialize the state of the DESA-1 detector.

The toneDetect function is a higher level tone detection function, which applies the bandpass filter to the input, applies the DESA-1 algorithm to the bandpassed samples, and then makes a detection decision based on the results. As arguments, it takes a pointer to the input samples, and a pointer to a tone detector object. It returns a value of 1 if the tone of interest has been detected, and 0 otherwise.

First the function applies a bandpass filter to the input samples, producing a block of bandpassed samples in the bpOutput array. Then DESA-1 is applied to the bandpassed samples to produce the mean and variance of the frequency estimates. Next, the mean and variance are compared to thresholds. If the mean is between the low and high thresholds, and the variance is below the variance threshold, then the detection time is incremented. The detection time is compared against a threshold to see if a minimum number of consecutive blocks have been detected. Once the detection time threshold has been met, the function will return a 1 (meaning that a tone is present). If the mean and variance don’t fall within the thresholds, then the detection time is cleared to zero, and the function returns a 0 result.

Next comes the toneDetectInit function. It is an important function because it initializes the thresholds used in the toneDetect function. The calling function passes in the low and high frequency thresholds in Hertz. In the sample program, the low threshold argument passed in is 2100 Hz minus two percent, and the high threshold is 2100 Hz plus two percent. The toneDetectInit function also initializes the bandpass filter.

The remaining functions in tone_detect.c implement the bandpass filter, and will not be discussed further. See this page for a description on how the filter works.

The code has been tested with several different frequencies with and without noise. Note that the performance degrades for low frequencies and for frequencies approaching 4000 Hz. This limitation is a property of the DESA algorithm, and a work-around is to resample the input to a higher or lower frequency (which also requires a change to the sampling frequency FS in tone_detect.h and re-designing the filter coefficients).

I also tried enabling and disabling the bandpass filter to see the effect on performance (to bypass the filter, pass input to the desa1 function instead of bpOutput in tone_detect.c). With additive white Gaussian noise, I saw an improvement of about 26 dB when the bandpass filter was used (for the parameters used in the example test.c). So there is in fact a large improvement in performance.

So there you have it, a tone detector using the Teager Kaiser Energy Operator that works in the presence of noise, has precise control of the frequency range, and has immunity to false detection due to speech.

4 Comments on “Tone Detection Using the Teager Kaiser Energy Operator (Part 2)”


  1. […] using the Teager Kaiser Energy Operator.  I have finally written something up for part two here.  Life with a young family has been quite busy for me.  If there is enough interest in the TKEO […]

  2. umair Says:

    I don’t understand is that why do you initialize the the filter object. One more thing can we do it as well with scratch memory and intermediate memory? Memset is just going to assign the memory. The main question I want to ask is what if we assume that out filter does have the memory before it begins the filtering operation. Just like we slide our filter through the input signal but we start from the extreme end.

    • Shawn Says:

      The filter memory (for the FIR filter) is initialized so that it doesn’t just have random values in it when the filter produces its first output. It is common to set the memory to all zeroes, but there are cases when other types of initialization are appropriate. Do you have a signal that continues for some time? Or is it short in duration?

      Assuming the filter has some memory before we start means you know something about what samples are there before you start. If you initialize the filter memory with the inputs, then you shouldn’t be starting with the first sample when you start the filtering. Is there a reason you want to do this? For example, I have done filtering requiring repeated initialization with inputs when filtering a spectrogram over frequencies instead of time.

  3. umair Says:

    I understand you. But I want to do filtering something like this:
    Suppose I have a signal of input length 24000 and I divide it into 6 blocks each of length 4000. Now suppose my signal is only on the positive x-axis and the number of taps I am using for the filter is 10(length of filter). Now, I move my filter from the left side inside the signal. When half of my filter is inside the block half is outside. When I am finished with the 1st block I will go to the next block, but for the next block half of my filter is inside the 2nd block but half inside the 1st block. Some idea like this. This is becoming difficult to program. Is there any way I can draw you a sketch of my explanation? I know its difficult to realize. About memory.. I set the two memories for the filters one is scratch memory and another is intermediate memory. The purpose of the scratch memory is to avoid stack overflow and the intermediate memory is used to store the output of the filter.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: