Some speech recognizers and machine learning algorithms need to quickly calculate the quantity:
when given only and
. The straightforward algorithm first uses the exponential function to convert the log values to the linear domain representation
and
, then performs the sum, and finally uses the log function to convert back to the log domain:
These conversions between log and linear domains are slow and problems arise when is too large or small for a machine’s floating point representation.
Luckily, there is a clever approximation method that allows quick, accurate calculation with a relatively small precomputed table.
Derivation
The table-based approximation can be derived by deciding to first look for a function such that:
Let’s consider the case where , without loss of generality. It looks like this:

On the left, and
are plotted in 3D, for
. The slice where
is shown in the 2D plot on the right. The difference between the two curves, shaded on the right, is the contribution from
.
Via algebra and some logarithm identities, solve for in terms of
and
:
The function can be pre-computed and stored in a table
. A table is particularly apt because it is:
- Easily indexed as a function of
, since this is a non-negative number
- Reusable across the entire range of
, since it depends only on the relative difference between
and
- Compact, since the contribution is small when
The table is pre-calculated as:
and used at run-time as:
Where a simple and quick-to-compute discretization function has been defined to round to table indexes. Let’s call the parameters sampling frequency (
), rounding threshold (
), and table size (
). These affect the approximation accuracy and are described in the next section. For now, the figure below shows the original function and the table approximation when
and
:
![log(x+y) and log(x) + T[i] log(x+y) and log(x) + T[i] shown in 3d and in 2d along the slice y=1](https://pwnetics.files.wordpress.com/2013/02/logxy-approximatelogxy.png?w=450&h=178)
On the left, and
are plotted in 3D, for
. The slice where
is shown in the 2D plot on the right. Note the discontinuities introduced by the floor function in the approximation.
Approximation Error
Approximation error is affected by the table length and the discretization function. These are parameterized by the quantities:
– table length
– rounding threshold
– “sampling frequency”
Since this blog post is stunningly long already, won’t be analyzed in detail. Predictably, increasing the sampling frequency decreases the approximation error at the cost of storing more samples in the table. Since the largest error occurs at the start of the table, increasing
is the best way to lower the maximum error without changing the discretization function.
Table length and 
Longer tables are theoretically better than shorter tables. However, long tables can hurt cache performance and, after a certain point, the table contribution is so small that ignoring it barely affects error. Limiting the size of the table is a smart speed/precision trade-off.
It is possible to provide an upper-bound on the largest useful table when the answer to is stored in a limited precision representation. Consider that the smallest value that can be represented in single-precision floating point is
. Be generous and allow all intermediate calculations to use arbitrary precision, and allow rounding the result before storing it in single-precision as the final answer. If the table element used contributes less than half of the smallest single-precision-representable value, this table element can never affect the stored result:
And solving for gives:
Which is the surprisingly small when
.
Rounding to an index and 
The table index is proportional to . The parameter
determines the discretization of this quantity, so that it can be mapped to an integer and used as a table index. When
, the discretization operation is truncation. When
, the discretization operation is standard rounding.
If fast evaluation is the one and only goal, use truncation by setting . Otherwise, consider the error due to the table approximation:
Which, for , looks like:

Two views of the table approximation error as and
vary. Error is greatest at small indexes. Error is predictably zero at the table indexes because these are the recorded samples from the underlying function. Note that, besides looking neat, that we can see some values of
are obviously better than others.
An looking at the error at the start of the table at particular values of gives:

Table approximation error for and
. Note the approximation error is zero at the table indexes and that truncation introduces worse error than rounding.
One choice of would be the value that minimizes the total table error:
This integral is difficult to evaluate and minimize, in part due to the floor function and the discontinuities it introduces. Frequency and table length both affect the optimal value. The figures below illustrate the behavior of this equation. In an appendix at the end of this post, near-optimal values of are provided for different settings of
.

Discretization error for different phi and omega. The piecewise horizontal function is defined by the table approximation; its values are the sampled values from the continuous function. At low frequencies, the continuous function is curvy enough that error is minimized when is set a tad above the value that would give standard rounding. Here,
is very close to the optimal value. However, as the sampling frequency increases, the continuous function is well-approximated locally by straight lines. So the optimal value of
approaches
, which is standard rounding.
The estimated optimal value of as
varies:

Estimated optimal as
varies. See the table in the appendix for the source data. Note that as
becomes large, traditional rounding gives lower errors.
Log Base trick
Changing the logarithm base has the same effect on error as changing the sampling frequency . If the value of the logarithm base is not important in this part of the application, save a multiplication by setting
and choosing the log base that achieves the same approximation error.
An attempt at an intuitive, practical explanation is that increasing maps the set of values
to a larger set of integers. Changing the log base has the same effect, since:
That is equivalent to
. Solving for the log base gives
for
.
Conclusion: Inspiration and Resources
I orginally found this table-based approximation method in the Sphinx 4 source code. It is useful in several places, one of which is in calculating the likelihood of a frame of audio data given a Gaussian mixture model (GMM). This calculation lies in what is effectively the inner-most loop of the speech recognizer. It runs tens of thousands of times each second and speeding it up is a worthwhile optimization.
The source code of LogMath.java and its javadoc documents this beautiful little algorithm well. It would have never caught my attention without that careful documentation. From what I remember, the documentation describes a derivation, some types of log-base tricks, and a similar idea behind how the table length can be bounded. This post builds on those explanations, with a slightly better-framed derivation, an analysis of that is completely new, and pretty pictures.
This post is also much, much longer. I kept finding new and interesting tidbits as this became my hobby for the month. I couldn’t bring myself to leave some of the gems unindexed. And there are still interesting questions left to be answered. Some of these depend on the details of different applications but there are also general questions like: is it ever worth making a 2 dimensional table to approximate the calculation ?
Appendix: Table of Good Values for 
Good values of as
varies, on the basis of minimizing error over the first 100 entries in the table:
lower-bound on error | ||
---|---|---|
1 | 0.588644 | 0.169006 |
2 | 0.54489 | 0.0861034 |
3 | 0.529998 | 0.057602 |
4 | 0.522519 | 0.043254 |
5 | 0.518022 | 0.0346227 |
6 | 0.515018 | 0.0288611 |
7 | 0.512875 | 0.0247426 |
8 | 0.511267 | 0.0216523 |
9 | 0.510017 | 0.0192478 |
10 | 0.509073* | 0.0173243* |
15 | 0.506008 | – |
20 | 0.504494 | – |
25 | 0.50357 | – |
30 | 0.502952 | – |
35 | 0.5025 | – |
40 | 0.502159 | – |
45 | 0.501895 | – |
50 | 0.501684 | – |
100 | 0.5009* | 0.00173275* |
200 | 0.500353 | – |
300 | 0.500245 | – |
400 | 0.50018 | – |
500 | 0.500143 | – |
600 | 0.500115 | – |
700 | 0.500081 | – |
800 | 0.500091 | – |
900 | 0.500019 | – |
1000 | 0.500062* | 0.0000950386* |
The numbers with asterisks were calculated over the first 1000 table entries. This gives a better estimate of the best value of for a table of this length or longer. With more entries, the optimal value of
inches up, but not greatly. However, the error lower-bound is significantly affected when more samples are used, so only a few of the sufficiently-sampled lower-bounds were shown.
Tags: approximation, log, logarithm, math, optimization
Leave a Reply