Some speech recognizers and machine learning algorithms need to quickly calculate the quantity:

when given only and . The straightforward algorithm first uses the exponential function to convert the log values to the linear domain representation and , then performs the sum, and finally uses the log function to convert back to the log domain:

These conversions between log and linear domains are slow and problems arise when is too large or small for a machine’s floating point representation.

Luckily, there is a clever approximation method that allows quick, accurate calculation with a relatively small precomputed table.

## Derivation

The table-based approximation can be derived by deciding to first look for a function such that:

Let’s consider the case where , without loss of generality. It looks like this:

Via algebra and some logarithm identities, solve for in terms of and :

The function can be pre-computed and stored in a table . A table is particularly apt because it is:

- Easily indexed as a function of , since this is a non-negative number
- Reusable across the entire range of , since it depends only on the relative difference between and
- Compact, since the contribution is small when

The table is pre-calculated as:

and used at run-time as:

Where a simple and quick-to-compute discretization function has been defined to round to table indexes. Let’s call the parameters sampling frequency (), rounding threshold (), and table size (). These affect the approximation accuracy and are described in the next section. For now, the figure below shows the original function and the table approximation when and :

## Approximation Error

Approximation error is affected by the table length and the discretization function. These are parameterized by the quantities:

- – table length
- – rounding threshold
- – “sampling frequency”

Since this blog post is stunningly long already, won’t be analyzed in detail. Predictably, increasing the sampling frequency decreases the approximation error at the cost of storing more samples in the table. Since the largest error occurs at the start of the table, increasing is the best way to lower the maximum error without changing the discretization function.

### Table length and

Longer tables are theoretically better than shorter tables. However, long tables can hurt cache performance and, after a certain point, the table contribution is so small that ignoring it barely affects error. Limiting the size of the table is a smart speed/precision trade-off.

It is possible to provide an upper-bound on the largest *useful* table when the answer to is stored in a limited precision representation. Consider that the smallest value that can be represented in single-precision floating point is . Be generous and allow all intermediate calculations to use arbitrary precision, and allow rounding the result before storing it in single-precision as the final answer. If the table element used contributes less than half of the smallest single-precision-representable value, this table element can never affect the stored result:

And solving for gives:

Which is the surprisingly small when .

### Rounding to an index and

The table index is proportional to . The parameter determines the discretization of this quantity, so that it can be mapped to an integer and used as a table index. When , the discretization operation is truncation. When , the discretization operation is standard rounding.

If fast evaluation is the one and only goal, use truncation by setting . Otherwise, consider the error due to the table approximation:

Which, for , looks like:

An looking at the error at the start of the table at particular values of gives:

One choice of would be the value that minimizes the total table error:

This integral is difficult to evaluate and minimize, in part due to the floor function and the discontinuities it introduces. Frequency and table length both affect the optimal value. The figures below illustrate the behavior of this equation. In an appendix at the end of this post, near-optimal values of are provided for different settings of .

The estimated optimal value of as varies:

## Log Base trick

Changing the logarithm base has the same effect on error as changing the sampling frequency . If the value of the logarithm base is not important in this part of the application, save a multiplication by setting and choosing the log base that achieves the same approximation error.

An attempt at an intuitive, practical explanation is that increasing maps the set of values to a larger set of integers. Changing the log base has the same effect, since:

That is equivalent to . Solving for the log base gives for .

## Conclusion: Inspiration and Resources

I orginally found this table-based approximation method in the Sphinx 4 source code. It is useful in several places, one of which is in calculating the likelihood of a frame of audio data given a Gaussian mixture model (GMM). This calculation lies in what is effectively the inner-most loop of the speech recognizer. It runs tens of thousands of times each second and speeding it up is a worthwhile optimization.

The source code of LogMath.java and its javadoc documents this beautiful little algorithm well. It would have never caught my attention without that careful documentation. From what I remember, the documentation describes a derivation, some types of log-base tricks, and a similar idea behind how the table length can be bounded. This post builds on those explanations, with a slightly better-framed derivation, an analysis of that is completely new, and pretty pictures.

This post is also much, much longer. I kept finding new and interesting tidbits as this became my hobby for the month. I couldn’t bring myself to leave some of the gems unindexed. And there are still interesting questions left to be answered. Some of these depend on the details of different applications but there are also general questions like: is it ever worth making a 2 dimensional table to approximate the calculation ?

## Appendix: Table of Good Values for

Good values of as varies, on the basis of minimizing error over the first 100 entries in the table:

lower-bound on error | ||
---|---|---|

1 | 0.588644 | 0.169006 |

2 | 0.54489 | 0.0861034 |

3 | 0.529998 | 0.057602 |

4 | 0.522519 | 0.043254 |

5 | 0.518022 | 0.0346227 |

6 | 0.515018 | 0.0288611 |

7 | 0.512875 | 0.0247426 |

8 | 0.511267 | 0.0216523 |

9 | 0.510017 | 0.0192478 |

10 | 0.509073* | 0.0173243* |

15 | 0.506008 | – |

20 | 0.504494 | – |

25 | 0.50357 | – |

30 | 0.502952 | – |

35 | 0.5025 | – |

40 | 0.502159 | – |

45 | 0.501895 | – |

50 | 0.501684 | – |

100 | 0.5009* | 0.00173275* |

200 | 0.500353 | – |

300 | 0.500245 | – |

400 | 0.50018 | – |

500 | 0.500143 | – |

600 | 0.500115 | – |

700 | 0.500081 | – |

800 | 0.500091 | – |

900 | 0.500019 | – |

1000 | 0.500062* | 0.0000950386* |

The numbers with asterisks were calculated over the first 1000 table entries. This gives a better estimate of the best value of for a table of this length or longer. With more entries, the optimal value of inches up, but not greatly. However, the error lower-bound is significantly affected when more samples are used, so only a few of the sufficiently-sampled lower-bounds were shown.

Tags: approximation, log, logarithm, math, optimization

## Leave a Reply