Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Rewrite complex log1p to improve precision. #26798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
Loading
from

Conversation

WarrenWeckesser
Copy link
Member

@WarrenWeckesser WarrenWeckesser commented Jun 26, 2024

Closes gh-22609

TL;DR: To implement log1p(z) for complex input z = x + y*i, use log(1+z) where that calculation doesn't lose precision. Otherwise, compute the real part of log1p(z) as 0.5*log1p(x**2 + 2*x + y**2), and use variables with doubled precision to avoid loss of precision in the expression x**2 + 2*x + y**2.


The formula for the complex logarithm of $z = x + iy$ is $\log z = \log |z| + i\arg z$, where $|z| = \left(x^2 + y^2\right)^{\frac{1}{2}}$, so for $\rm{log1p}(z)$ we have

$$ \begin{split} \textrm{log1p}(z) & = \log(1 + z) \\ & = \frac{1}{2}\log((x+1)^2 + y^2) + \textrm{atan2}(y, x)i \\ & = \log(\textrm{hypot}(x+1, y)) + \textrm{atan2}(y, x)i \end{split} $$

I'll call this the naive formula.

The computation of $\textrm{atan2(y, x)}$ is stable, so we do not have to do anything special to compute it accurately. The computation of the real part, $\frac{1}{2}\log((x+1)^2 + y^2)$, can lose precision in some regions of the complex plane. The example from gh-22609 is z = 1e-18 + 1e-18j, where $\log(1+z)$ returns 1e-18j but the correct value is 1e-18 + 1e-18j. (Reference values are computed with mpmath.) The reason for the incorrect result is clear from the expression $\log((x+1)^2 + y^2) = \log(1 + 2x + x^2 + y^2)$. When $x$ and $y$ are sufficiently small, the terms $2x$, $x^2$ and $y^2$ are smaller than the machine $\varepsilon$, so the result is $\log(1) = 0$. So clearly there is a problem with the naive formula when $x$ and $y$ are small. We should use $\textrm{log1p}(2x + x^2 + y^2)$, but that expression turns out to still have a problem with precision loss.

Another example is z = (-0.2892646601647662 + 0.7034589443707598j), where the correct value of log1p(z) is -3.951471916126033e-07 + 0.7802529500871193j, but the naive formula log(1 + z) (on a platform compiled with gcc 11.4.0) gives -3.9514719157314957e-07+0.7802529500871193j. The relative error of the real part is about 1e-10, and a similar relative error is observed in NumPy built on a Mac with Apple clang 14.0.3. The $x$ and $y$ values are not particularly small, but there is still loss of precision. Notice that the real part of the result is small; that occurs when $(x+1)^2 + y^2 = 1 + 2x + x^2 + y^2 \approx 1$. The Taylor expansion of the real log function log(u) at u=1 is $(u - 1) - (u - 1)^2/2 + \ldots$, so it is not surprising that there can be precision issues near 1. Moreover, for $1 + 2x + x^2 + y^2$ to be close to $1$, $2x + x^2 + y^2$ must be small. And that implies that there must be cancellation between $2x$ and $x^2 + y^2$ (so the use of $\textrm{log1p}(2x + x^2 + y^2)$ doesn't eliminate the precision problem). The region in the complex plane where this happens is near the unit circle centered at $z_0 = -1 + 0j$; that is, where $|z + 1| = 1$.

The severity of this loss of precision seems to depend on the details of the implementation of the complex log function in the standard library. On my Linux system, where I used gcc 11.4.0 to build NumPy, there is loss of precision near unit circle centered at -1+0j only in the arc corresponding to x > -0.5. On a Mac where I build NumPy with Apple clang version 14.0.3, there is loss of precision near the entire circle. In this PR, the loss of precision is handled by implementing a higher precision version of the calculation that is only used in a region near that unit circle.

The high precision calculation is implemented in C++ in log1p_complex.h. The essential idea is simple: compute the real part of the log with the formula $\textrm{log1p}(x^2 + 2x + y^2)$, and compute $x^2 + 2x + y^2$ with higher precision than provided by the input types. Single precision float is upgraded to double, and double is upgraded to long double if long double has at least 106 bits of precision. Lower precision double types, and the long double type, are upgraded by using a doubled type (usually called double-double, but in the case of long double, the doubled type could be called double-long double). Only the minimal set of functions needed to compute $x^2 + 2x + y^2$ with doubled variables is included.

The files clog1p_wrappers.cpp.src and clog1p_wrappers.h provide the shims for calling C++ code from C in umathmodule.c. The functions nc_log1p@c@(...) in funcs.inc.src check whether the input is close enough to the unit circle centered at -1+0j to use the high precision calculation, otherwise they simply call nc_log@c@(...) with input 1+z.

Tests are included for np.complex64, np.complex128, and the various flavors of platform-dependent np.clongdouble.

Because the new code uses log(1+z) for points away from the circle, it benefits from whatever behavior is included in the standard library's implementation (or NumPy's implementation if the standard library's version is blocklisted). In particular, this PR fixes an obscure bug in the main branch that occurs when the input $z = x + iy$ is so large that $\textrm{hypot}(x, y)$ is larger than the input type's maximum value. For example, with the main development branch:

In [9]: m = 0.95*np.finfo(np.float64).max

In [10]: z = m + m*1j

In [11]: z
Out[11]: np.complex128(1.7078084781191998e+308+1.7078084781191998e+308j)

In [12]: np.log1p(z)
<ipython-input-12-b175d6a32d41>:1: RuntimeWarning: overflow encountered in log1p
  np.log1p(z)
Out[12]: np.complex128(inf+0.7853981633974483j)

We got an overflow and the real part of the result is inf.

With the change in this pull request, we get the correct result (at least on the platform where I ran this code):

In [2]: m = 0.95*np.finfo(np.float64).max

In [3]: z = m + m*1j

In [4]: z
Out[4]: np.complex128(1.7078084781191998e+308+1.7078084781191998e+308j)

In [5]: np.log1p(z)
Out[5]: np.complex128(710.0779931892764+0.7853981633974483j)

Performance changes

The new version is slower, but that is not because of the use of the doubled types. It uses the standard library with log(1+z) whenever the high precision calculation is not needed, and that function (at least on my Linux box) is slower than the implementation of complex log1p in main. But the implementation in the main branch doesn't have any additional checks that might be in the standard library (see the "obscure bug" mentioned above).

Here are timings for computing log1p(z) for samples taken from a normal distribution around 0+0j with standard deviation 2. First, from the main branch:

np.__version__ = '2.1.0.dev0+git20240627.2f3da0d'
expression is "log1p(z)"
         n complex64 (ms)    complex128 (ms)   complex256 (ms)  
         5 0.000427          0.000456          0.000857         
        10 0.000512          0.000565          0.00136          
       100 0.002             0.00252           0.0108           
      1000 0.0172            0.022             0.106            
     10000 0.271             0.316             1.14             
    100000 2.85              3.27              11.9             
   1000000 28.6              32.8              120   

The timing for this pull request:

np.__version__ = '2.1.0.dev0+git20240626.c83a187'
expression is "log1p(z)"
         n complex64 (ms)    complex128 (ms)   complex256 (ms)  
         5 0.00058           0.000476          0.000858         
        10 0.000802          0.000621          0.00138          
       100 0.00524           0.00427           0.0135           
      1000 0.0492            0.0382            0.128            
     10000 0.624             0.475             1.38             
    100000 6.34              4.79              13.9             
   1000000 63.6              48                139   

The especially poor performance with np.complex64 also occurs with np.log(z), so it has nothing to do with this pull request, other than this change relying on log(z) for the "safe" case (see #26807).

The slowdown for np.complex128 and np.complex256 is the result of deferring to log(1+z) most of the time. The performance of log1p(z) is basically the same as log(1+z).

@WarrenWeckesser WarrenWeckesser marked this pull request as draft June 26, 2024 01:13
@WarrenWeckesser WarrenWeckesser marked this pull request as ready for review June 27, 2024 22:08
@WarrenWeckesser WarrenWeckesser added 00 - Bug component: numpy.ufunc C++ Related to introduction and use of C++ in the NumPy code base labels Jun 27, 2024
@charris
Copy link
Member

charris commented Jul 6, 2024

ISTR some literature about computing log1p, there were some tricks, but that was some time ago and I don't know if it would still apply.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could push this problem off to the math libraries, but I guess that isn't to be...
Overall, it seems to me like we should probably go ahead here.

@WarrenWeckesser I wonder if you can ask someone from SciPy to have a look at the code as well?
It looks fine, but I didn't try to follow the math fully and would love a bit more mpmath tries/scan to be on the safe side (although I am sure you already did that).

I did make a few comments, the biggest one is that I don't trust that NO_OPT stuff, and I think volatile may do the trick.

* when the input is close to the unit circle centered at -1+0j.
*/
NPY_NO_EXPORT void
clog1p@c@(@np_cplx_type@ *x, @np_cplx_type@ *r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would have a mild preference to giving this a different name. clog1p isn't clear that it is only relevant in parts of the domain.

Might be nice to avoid the .src style templating, I think that would mean making this a static (inline) templated function and then creating three explicit stubs that call it below.
Since each stub has a single line, I would hope it isn't significantly longer.

@@ -0,0 +1,260 @@
#ifndef LOG1P_COMPLEX_H
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is C++ only, I think. So let's rename it to .hpp?

Also please adjust the include guard to be the google-style full file path (see other files).

CPP_WRAP2(atan2)
CPP_WRAP2(hypot)

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't have a nicer C++ pattern/common spot for these yet, so this looks fine.

// __i386 are defined also turns off the use of the floating-point
// unit '387'. It is important that when the type is, for example,
// `double`, these functions compute their results with 64 bit
// precision, and not with 80 bit extended precision.
Copy link
Member

@seberg seberg Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes me worried that this may break e.g. on MSVC (depending on optimization levels)!? Are there tests that would fail if it does?

I suspect you should rather use something like (may require 2 volatiles):

volatile T tmp = t - x;
t = t - tmp;

I suppose that may fail with fast-math enabled. If that failure is "graceful" as in: you get bad precision, I would ignore it. If you compile NumPy with fast-math you should get test failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to find a failure mode for either (also the sse forcing), but beyond the intel compiler effectively defaulting to fast-math I guess, I couldn't.
The volatile always adds some store/load instructions, which I maybe avoids the use of the 80bit FPE registers as well (but maybe bad for speed?).

# print(w.imag)
#
@pytest.mark.skipif(np.finfo(np.longdouble).machep != -112,
reason='reference values are for IEEE float128')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy keep the test, I am a bit surprised, since if I understand right, this test is for true quad precision. I am not sure that we have CI that runs with true quad precision.

@@ -30,6 +30,7 @@
#include "string_ufuncs.h"
#include "stringdtype_ufuncs.h"
#include "special_integer_comparisons.h"
#include "clog1p_wrappers.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe nicer to include this in the funcs.inc.src code? Since it is used here only indirectly?
OTOH, I guess it may be that a few other imports should maybe be also be moved by that logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug C++ Related to introduction and use of C++ in the NumPy code base component: numpy.ufunc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Inaccurate log1p for small complex input
3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.