Rewrite complex log1p to improve precision. #26798

WarrenWeckesser · Jun 26, 2024

TL;DR: To implement log1p(z) for complex input z = x + y*i, use log(1+z) where that calculation doesn't lose precision. Otherwise, compute the real part of log1p(z) as 0.5*log1p(x**2 + 2*x + y**2), and use variables with doubled precision to avoid loss of precision in the expression x**2 + 2*x + y**2.

The formula for the complex logarithm of $z = x + iy$ is $\log z = \log |z| + i\arg z$, where $|z| = \left(x^2 + y^2\right)^{\frac{1}{2}}$, so for $\rm{log1p}(z)$ we have

$$ \begin{split} \textrm{log1p}(z) & = \log(1 + z) \\ & = \frac{1}{2}\log((x+1)^2 + y^2) + \textrm{atan2}(y, x)i \\ & = \log(\textrm{hypot}(x+1, y)) + \textrm{atan2}(y, x)i \end{split} $$

I'll call this the naive formula.

The computation of $\textrm{atan2(y, x)}$ is stable, so we do not have to do anything special to compute it accurately. The computation of the real part, $\frac{1}{2}\log((x+1)^2 + y^2)$, can lose precision in some regions of the complex plane. The example from gh-22609 is z = 1e-18 + 1e-18j, where $\log(1+z)$ returns 1e-18j but the correct value is 1e-18 + 1e-18j. (Reference values are computed with mpmath.) The reason for the incorrect result is clear from the expression $\log((x+1)^2 + y^2) = \log(1 + 2x + x^2 + y^2)$. When $x$ and $y$ are sufficiently small, the terms $2x$, $x^2$ and $y^2$ are smaller than the machine $\varepsilon$, so the result is $\log(1) = 0$. So clearly there is a problem with the naive formula when $x$ and $y$ are small. We should use $\textrm{log1p}(2x + x^2 + y^2)$, but that expression turns out to still have a problem with precision loss.

Another example is z = (-0.2892646601647662 + 0.7034589443707598j), where the correct value of log1p(z) is -3.951471916126033e-07 + 0.7802529500871193j, but the naive formula log(1 + z) (on a platform compiled with gcc 11.4.0) gives -3.9514719157314957e-07+0.7802529500871193j. The relative error of the real part is about 1e-10, and a similar relative error is observed in NumPy built on a Mac with Apple clang 14.0.3. The $x$ and $y$ values are not particularly small, but there is still loss of precision. Notice that the real part of the result is small; that occurs when $(x+1)^2 + y^2 = 1 + 2x + x^2 + y^2 \approx 1$. The Taylor expansion of the real log function log(u) at u=1 is $(u - 1) - (u - 1)^2/2 + \ldots$, so it is not surprising that there can be precision issues near 1. Moreover, for $1 + 2x + x^2 + y^2$ to be close to $1$, $2x + x^2 + y^2$ must be small. And that implies that there must be cancellation between $2x$ and $x^2 + y^2$ (so the use of $\textrm{log1p}(2x + x^2 + y^2)$ doesn't eliminate the precision problem). The region in the complex plane where this happens is near the unit circle centered at $z_0 = -1 + 0j$; that is, where $|z + 1| = 1$.

The severity of this loss of precision seems to depend on the details of the implementation of the complex log function in the standard library. On my Linux system, where I used gcc 11.4.0 to build NumPy, there is loss of precision near unit circle centered at -1+0j only in the arc corresponding to x > -0.5. On a Mac where I build NumPy with Apple clang version 14.0.3, there is loss of precision near the entire circle. In this PR, the loss of precision is handled by implementing a higher precision version of the calculation that is only used in a region near that unit circle.

The high precision calculation is implemented in C++ in log1p_complex.h. The essential idea is simple: compute the real part of the log with the formula $\textrm{log1p}(x^2 + 2x + y^2)$, and compute $x^2 + 2x + y^2$ with higher precision than provided by the input types. Single precision float is upgraded to double, and double is upgraded to long double if long double has at least 106 bits of precision. Lower precision double types, and the long double type, are upgraded by using a doubled type (usually called double-double, but in the case of long double, the doubled type could be called double-long double). Only the minimal set of functions needed to compute $x^2 + 2x + y^2$ with doubled variables is included.

The files clog1p_wrappers.cpp.src and clog1p_wrappers.h provide the shims for calling C++ code from C in umathmodule.c. The functions nc_log1p@c@(...) in funcs.inc.src check whether the input is close enough to the unit circle centered at -1+0j to use the high precision calculation, otherwise they simply call nc_log@c@(...) with input 1+z.

Tests are included for np.complex64, np.complex128, and the various flavors of platform-dependent np.clongdouble.

Because the new code uses log(1+z) for points away from the circle, it benefits from whatever behavior is included in the standard library's implementation (or NumPy's implementation if the standard library's version is blocklisted). In particular, this PR fixes an obscure bug in the main branch that occurs when the input $z = x + iy$ is so large that $\textrm{hypot}(x, y)$ is larger than the input type's maximum value. For example, with the main development branch:

In [9]: m = 0.95*np.finfo(np.float64).max

In [10]: z = m + m*1j

In [11]: z
Out[11]: np.complex128(1.7078084781191998e+308+1.7078084781191998e+308j)

In [12]: np.log1p(z)
<ipython-input-12-b175d6a32d41>:1: RuntimeWarning: overflow encountered in log1p
  np.log1p(z)
Out[12]: np.complex128(inf+0.7853981633974483j)

We got an overflow and the real part of the result is inf.

With the change in this pull request, we get the correct result (at least on the platform where I ran this code):

In [2]: m = 0.95*np.finfo(np.float64).max

In [3]: z = m + m*1j

In [4]: z
Out[4]: np.complex128(1.7078084781191998e+308+1.7078084781191998e+308j)

In [5]: np.log1p(z)
Out[5]: np.complex128(710.0779931892764+0.7853981633974483j)

Performance changes

The new version is slower, but that is not because of the use of the doubled types. It uses the standard library with log(1+z) whenever the high precision calculation is not needed, and that function (at least on my Linux box) is slower than the implementation of complex log1p in main. But the implementation in the main branch doesn't have any additional checks that might be in the standard library (see the "obscure bug" mentioned above).

Here are timings for computing log1p(z) for samples taken from a normal distribution around 0+0j with standard deviation 2. First, from the main branch:

np.__version__ = '2.1.0.dev0+git20240627.2f3da0d'
expression is "log1p(z)"
         n complex64 (ms)    complex128 (ms)   complex256 (ms)  
         5 0.000427          0.000456          0.000857         
        10 0.000512          0.000565          0.00136          
       100 0.002             0.00252           0.0108           
      1000 0.0172            0.022             0.106            
     10000 0.271             0.316             1.14             
    100000 2.85              3.27              11.9             
   1000000 28.6              32.8              120

The timing for this pull request:

np.__version__ = '2.1.0.dev0+git20240626.c83a187'
expression is "log1p(z)"
         n complex64 (ms)    complex128 (ms)   complex256 (ms)  
         5 0.00058           0.000476          0.000858         
        10 0.000802          0.000621          0.00138          
       100 0.00524           0.00427           0.0135           
      1000 0.0492            0.0382            0.128            
     10000 0.624             0.475             1.38             
    100000 6.34              4.79              13.9             
   1000000 63.6              48                139

The especially poor performance with np.complex64 also occurs with np.log(z), so it has nothing to do with this pull request, other than this change relying on log(z) for the "safe" case (see #26807).

The slowdown for np.complex128 and np.complex256 is the result of deferring to log(1+z) most of the time. The performance of log1p(z) is basically the same as log(1+z).

Closes numpygh-22609

charris · Jul 6, 2024

ISTR some literature about computing log1p, there were some tricks, but that was some time ago and I don't know if it would still apply.

seberg

It would be nice if we could push this problem off to the math libraries, but I guess that isn't to be...
Overall, it seems to me like we should probably go ahead here.

@WarrenWeckesser I wonder if you can ask someone from SciPy to have a look at the code as well?
It looks fine, but I didn't try to follow the math fully and would love a bit more mpmath tries/scan to be on the safe side (although I am sure you already did that).

I did make a few comments, the biggest one is that I don't trust that NO_OPT stuff, and I think volatile may do the trick.

seberg · Jul 25, 2024

numpy/_core/src/umath/clog1p_wrappers.cpp.src

+ * when the input is close to the unit circle centered at -1+0j.
+ */
+NPY_NO_EXPORT void
+clog1p@c@(@np_cplx_type@ *x, @np_cplx_type@ *r)


Would have a mild preference to giving this a different name. clog1p isn't clear that it is only relevant in parts of the domain.

Might be nice to avoid the .src style templating, I think that would mean making this a static (inline) templated function and then creating three explicit stubs that call it below.
Since each stub has a single line, I would hope it isn't significantly longer.

seberg · Jul 25, 2024

numpy/_core/src/umath/log1p_complex.h

@@ -0,0 +1,260 @@
+#ifndef LOG1P_COMPLEX_H


This is C++ only, I think. So let's rename it to .hpp?

Also please adjust the include guard to be the google-style full file path (see other files).

seberg · Jul 25, 2024

numpy/_core/src/umath/log1p_complex.h

+CPP_WRAP2(atan2)
+CPP_WRAP2(hypot)
+
+}


I guess we don't have a nicer C++ pattern/common spot for these yet, so this looks fine.

seberg · Jul 25, 2024

numpy/_core/src/umath/log1p_complex.h

+// __i386 are defined also turns off the use of the floating-point
+// unit '387'.  It is important that when the type is, for example,
+// `double`, these functions compute their results with 64 bit
+// precision, and not with 80 bit extended precision.


this makes me worried that this may break e.g. on MSVC (depending on optimization levels)!? Are there tests that would fail if it does?

I suspect you should rather use something like (may require 2 volatiles):

volatile T tmp = t - x; t = t - tmp;

I suppose that may fail with fast-math enabled. If that failure is "graceful" as in: you get bad precision, I would ignore it. If you compile NumPy with fast-math you should get test failures.

I was trying to find a failure mode for either (also the sse forcing), but beyond the intel compiler effectively defaulting to fast-math I guess, I couldn't.
The volatile always adds some store/load instructions, which I maybe avoids the use of the 80bit FPE registers as well (but maybe bad for speed?).

seberg · Jul 25, 2024

numpy/_core/tests/test_umath.py

+    #       print(w.imag)
+    #
+    @pytest.mark.skipif(np.finfo(np.longdouble).machep != -112,
+                        reason='reference values are for IEEE float128')


Happy keep the test, I am a bit surprised, since if I understand right, this test is for true quad precision. I am not sure that we have CI that runs with true quad precision.

seberg · Jul 25, 2024

numpy/_core/src/umath/umathmodule.c

@@ -30,6 +30,7 @@
 #include "string_ufuncs.h"
 #include "stringdtype_ufuncs.h"
 #include "special_integer_comparisons.h"
+#include "clog1p_wrappers.h"


Maybe nicer to include this in the funcs.inc.src code? Since it is used here only indirectly?
OTOH, I guess it may be that a few other imports should maybe be also be moved by that logic.

WarrenWeckesser marked this pull request as draft June 26, 2024 01:13

WarrenWeckesser force-pushed the log1p-dd-revised branch from 5ecd0b0 to 342a04a Compare June 26, 2024 20:29

Rewrite complex log1p to improve precision.

33860e6

Closes numpygh-22609

WarrenWeckesser force-pushed the log1p-dd-revised branch from 342a04a to 33860e6 Compare June 27, 2024 21:04

WarrenWeckesser marked this pull request as ready for review June 27, 2024 22:08

WarrenWeckesser added 00 - Bug component: numpy.ufunc C++ Related to introduction and use of C++ in the NumPy code base labels Jun 27, 2024

seberg reviewed Jul 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Rewrite complex log1p to improve precision. #26798

Rewrite complex log1p to improve precision. #26798

Uh oh!

WarrenWeckesser commented Jun 26, 2024 •

edited

Loading

Uh oh!

charris commented Jul 6, 2024

Uh oh!

seberg left a comment

Uh oh!

seberg Jul 25, 2024

Uh oh!

seberg Jul 25, 2024

Uh oh!

seberg Jul 25, 2024

Uh oh!

seberg Jul 25, 2024 •

edited

Loading

Uh oh!

seberg Jul 25, 2024

Uh oh!

seberg Jul 25, 2024

Uh oh!

seberg Jul 25, 2024

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Uh oh!

Rewrite complex log1p to improve precision. #26798

Are you sure you want to change the base?

Rewrite complex log1p to improve precision. #26798

Uh oh!

Conversation

WarrenWeckesser commented Jun 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charris commented Jul 6, 2024

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

seberg Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

seberg Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

seberg Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

seberg Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

seberg Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

seberg Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

WarrenWeckesser commented Jun 26, 2024 •

edited

Loading

seberg Jul 25, 2024 •

edited

Loading