RP2 Floating point accuracy #13075

Nov 27, 2023

peterhinch
Nov 27, 2023
Collaborator

I am in the late stages of developing some code for calculating sun and moon rise and set times and moon phases. There is a much floating point maths going on so I set about benchmarking it on various platforms. To my surprise the computed results varied. There were expected small variations between 64-bit and 32-bit platforms: around 20s on today's sunrise time. But the RP2 had a more significant error of 5 minutes on all the calculated values.

The 32-bit platforms were in extremely close mutual agreement, whether FP is implemented in hardware or software. RP2 was the only outlier.

Is this a known issue?

peterhinch · Nov 27, 2023

2dof
Nov 27, 2023

when I was testing esp32 vs RP2040 for Floating point calculation for Kalman filter algorithm i found out on Rasppbery pi forum some info about speed overall it is poor. I found that RP2040 ( with cortex M0 ) do not have FPU and from SDK docs it should meet IEEE 754 standard., But from My tests precision are not as good as in FPU and errors expand fast (due to type of math operation) - in RP2040 floating-point arithmetic routines is based on interpolators.

5 replies

peterhinch Nov 27, 2023
Collaborator Author

Interesting - thanks for that. I wonder if there is a benchmark for precision?

Re speed, I have the following results, all on standard clocks:

PBD-SF6W 78μs (hardware FP - 64 bit)
PBD-SF2W 109μs (hardware FP - 32 bit)
ESP32-S3 166μs (hardware FP - 32 bit) (I think)
RP2040 394μs (software FP - 32 bit)
Unix build on my laptop - for a laugh - 7μs (!)

GitHubsSilverBullet Nov 27, 2023

There's indeed a significant problem.

void test(void) {
    float value = 1;
    int n = 1;
    while (n <= 27) {
        printf("%3d %33.30f\n", n, value);
        value += 1.0 / (1<<n);
        n++;
    }
}
int main(void) {
    test();
    return 0;
}

produces:

  1  1.000000000000000000000000000000
  2  1.500000000000000000000000000000
  3  1.750000000000000000000000000000
  4  1.875000000000000000000000000000
  5  1.937500000000000000000000000000
  6  1.968750000000000000000000000000
  7  1.984375000000000000000000000000
  8  1.992187500000000000000000000000
  9  1.996093750000000000000000000000
 10  1.998046875000000000000000000000
 11  1.999023437500000000000000000000
 12  1.999511718750000000000000000000
 13  1.999755859375000000000000000000
 14  1.999877929687500000000000000000
 15  1.999938964843750000000000000000
 16  1.999969482421875000000000000000
 17  1.999984741210937500000000000000
 18  1.999992370605468750000000000000
 19  1.999996185302734375000000000000
 20  1.999998092651367187500000000000
 21  1.999999046325683593750000000000
 22  1.999999523162841796875000000000
 23  1.999999761581420898437500000000
 24  1.999999880790710449218750000000
 25  2.000000000000000000000000000000
 26  2.000000000000000000000000000000
 27  2.000000000000000000000000000000

whereas

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from array import array

def test():
    value = array('f', (1,))
    n = 1
    while n <= 27:
        print(f'{n:3d} {value[0]:33.30f}')
        value[0] += 1.0 / (1<<n)
        n += 1

test()

produces:


  1   1.00000000000000000000000000000
  2   1.50000000000000000000000000000
  3   1.75000000000000000000000000000
  4   1.87500000000000000000000000000
  5   1.93750000000000000000000000000
  6   1.96875000000000000000000000000
  7   1.98437500000000000000000000000
  8   1.99218750000000000000000000000
  9   1.99609375000000000000000000000
 10   1.99804687500000000000000000000
 11   1.99902343750000000000000000000
 12   1.99951171875000000000000000000
 13   1.99975585937500000000000000000
 14   1.99987792968750000000000000000
 15   1.99993896484375000000000000000
 16   1.99996948242187500000000000000
 17   1.99998474121093750000000000000
 18   1.99999237060546875000000000000
 19   1.99999618530273437500000000000
 20   1.99999809265136718750000000000
 21   1.99999904632568359375000000000
 22   1.99999952316284179687500000000
 23   1.99999980926513671875000000000
 24   1.99999990463256835937500000000
 25   2.00000000000000000000000000000
 26   2.00000000000000000000000000000
 27   2.00000000000000000000000000000

mendenm Nov 27, 2023

I would worry a little about this accuracy comparison, because the problem could be in the output formatter, too. You probably need to compare computed values to manually-constructed , exact values. These can be generated by assembling a mantissa and exponent with ldexp.

Note that in normal python, the output formatting is almost certainly being done with IEEE754 doubles, upconverted from the 32 bit floats in your array.

Note that the values you expect here should be readable by directly printing the floats as hex (get them as mem32 objects). With the suppressed 1 on IEEE754 formats, you should just see a bit walking across the hex representation, with all other bits zero.

2dof Nov 28, 2023

@peterhinch I forget to mention, that for any maths calculations where accuracy is needed, but platform calculation has some limitation I'm always looking for information in any Jet Propulsion Laboratory and articles/studies about maths calculation and implementation in early microprocessors (mostly code is in Fortran, but all steps and algorithms are explained and at usually goal is eliminating calculation errors.

For example: Implementation of the Sun Position Calculation,

mendenm Nov 28, 2023

Wow, I looked at the paper you cited. It is based on computing using an AM9511 FPU, which is the first FPU I did a lot of work on (circa 1978). I had it interfaced to my Apple II computer, and worked in a language (I helped write) called First, which was a compiled Forth-like language. The Am9511 had some odd foibles, including not using the suppressed '1' high bit in PDP-11 and modern IEEE754 floating point formats.

jimmo · Nov 27, 2023

rkompass
Nov 27, 2023

Two related discussions are here and here.
TLDR: @jimmo stated that

MP uses the internal FP routines of the RP2 ROM.
MP uses FP representation D which has full 32 bits , in contrast to Circuitpython, which is twice as fast with FP calculations but has only 30 bits per FP number with FP representation C.

So the problem should be in 1, the internal FP routines.

I wonder if there is a benchmark for precision?

Good question.
@peterhinch can you extract a piece of code that exhibits the differing results wrt. precision?

I did your python version of precision test @GitHubsSilverBullet here on a Blackpill (stm32f411) board and it shows exactly the same results. I would interpret that in agreement with the above, that the basic FP calculations involved (addition, division by powers of 2) are the same as on the Pico and the FP representation is the same too.

So, looking for the cause of the problem: Could the trigonometric computations differ?

I seem to remember that for the FP math of the Pico a special library was acquired, that is extraordinarily fast. It is not the usual math lib of the GNU C compiler. I did not find this reference now, though.

1 reply

jimmo Nov 28, 2023
Maintainer

@peterhinch FYI raspberrypi/pico-sdk#1255

Nov 27, 2023

robert-hh
Nov 27, 2023
Collaborator

I ran the Python test code of @GitHubsSilverBullet on various MicroPython boards. At boards with 32 bit floats, whether hardware or software including the RP2, the results were those of the second result list, at boards with 64 bit floats (double), it returned the first result list.
So maybe the observation @GitHubsSilverBullet made is not related to the initial post of @peterhinch .

0 replies

Nov 27, 2023

mendenm
Nov 27, 2023

So here is the accuracy test I suggested, to make sure basic addition was accurate:

import array
import uctypes
from machine import mem32

a = array.array('f', 1 for _ in range(32))

abase = uctypes.addressof(a)

for i in range(1,26):
    a[i] = a[i-1] + 1 / (1 << i)
    ai = mem32[abase + 4*i]
    print(i, f"{a[i]:20.15f} {ai-0x3f800000:08x}")

with result

>>> 
1    1.500000000000000 00400000
2    1.750000000000000 00600000
3    1.875000000000000 00700000
4    1.937500000000000 00780000
5    1.968750000000000 007c0000
6    1.984375000000000 007e0000
7    1.992187500000000 007f0000
8    1.996093750000000 007f8000
9    1.998046875000000 007fc000
10    1.999023437500000 007fe000
11    1.999511718750000 007ff000
12    1.999755859375000 007ff800
13    1.999877929687500 007ffc00
14    1.999938964843750 007ffe00
15    1.999969482421875 007fff00
16    1.999984741210938 007fff80
17    1.999992370605469 007fffc0
18    1.999996185302734 007fffe0
19    1.999998092651367 007ffff0
20    1.999999046325684 007ffff8
21    1.999999523162842 007ffffc
22    1.999999809265137 007ffffe
23    1.999999904632568 007fffff
24    2.000000000000000 00800000
25    2.000000000000000 00800000
>>>

which demonstrates that the internal math is working fine for this test case, but the output formatter is, indeed , limited by its own 32 bit precision

0 replies

mendenm · Nov 28, 2023

peterhinch
Nov 28, 2023
Collaborator Author

The problem is not with the output formatter: my application produces as output integers being datetimes expressed in seconds.

It can be reproduced by importing this module. You only need to look at the sun rise time of day 0. This is consistent within ~20s on all platforms except RP2, where the error is 5 minutes.

The code has one function get_mjd which is platform dependent because of the different epochs. Its purpose is to produce a platform independent integer datetime and I have checked that RP2 produces the same value as other platforms. The rest is just maths.

@jimmo I'm pretty confident that the trig functions are called with reasonable argument values (smaller than +-4π radians).

2 replies

mendenm Nov 28, 2023

@peterhinch yes, I knew your problem with the actual calculation wasn't the output formatter; it was just the accuracy test that was not meaningful because of that. I strongly suspect your problem is in the range reduction of the trig functions, which the pico sdk warns is not done to the usual accuracy. You may have to find an almanac calculation based on a different epoch, so that there aren't too many cycles of trig to wrap around anywhere, or at least so that any range reduction explicitly built in to the calculation doesn't have too much loss of accuracy. Single precision really doesn't work too well for calendar stuff.

peterhinch Nov 28, 2023
Collaborator Author

Single precision really doesn't work too well for calendar stuff.

As I said above, the difference between SP and DP is about 20s. Every SP platform I tested returned identical results (within my resolution of 1s) - albeit out by 20s relative to DP. This whether FP is done in hardware or software. The one exception was RP2.

peterhinch · Nov 28, 2023

mendenm
Nov 28, 2023

@peterhinch I went and looked at the module you posted. The 'quad' function is strongly subject to roundoff error. You should use the method in 'Numerical Recipes', in which you always compute the larger root in absolute magnitude (if b is negative, use -b + sqrt(disc), and if b is positive, use -b - sqrt(disc)), and then compute the other root as c/first root (since the a*c is the product of the roots). This is not susceptible to roundoff error.

Here is a modified version of 'quad()' which is formally stable:

def quad(ym, yz, yp):
    # See Astronomy on the PC P48-49
    # finds the parabola throuh the three points (-1,ym), (0,yz), (1, yp)
    # and returns the values of x where the parabola crosses zero
    # (roots of the quadratic)
    # and the number of roots (0, 1 or 2) within the interval [-1, 1]
    nz = 0
    a = 0.5 * (ym + yp) - yz
    b = 0.5 * (yp - ym)
    c = yz
    xe = -b / (2 * a)
    ye = (a * xe + b) * xe + c
    dis = b * b - 4.0 * a * c  # discriminant of y=a*x^2 +bx +c
    if dis > 0:  # parabola has roots
        if b < 0:
            z2 = (-b + sqrt(dis)) / (2*a)  # z2 is larger root in magnitude
        else:
            z2 = (-b - sqrt(dis)) / (2*a)
        z1 = (c/a) / z2 # z1 is always closer to zero
        if fabs(z1) <= 1.0:
            nz += 1
        if fabs(z2) <= 1.0:
            nz += 1
        if z1 < -1.0:
            z1 = z2
    return nz, z1, z2, ye

I checked, though, that this isn't actually your problem. This gives exactly the same answers as your quad, so apparently it is never dealing with roots that result from subtracting nearly-equal numbers.

1 reply

peterhinch Nov 28, 2023
Collaborator Author

Thanks for that - I will adapt the code. In the vast majority of cases there is only one root and the numbers differ substantially. The tricky case is where the Sun or Moon is grazing the horizon (in arctic latitudes at noon or midnight). In the two hour interval two roots can occur and the numbers can be quite similar.

Nov 28, 2023

rkompass
Nov 28, 2023

Data from this paper showing the errors in IEEE 754 single-precision format (binary32) computations relative to the least bit of resolution reveal that the precision quite often is not optimal. (It is much worse for more special functions btw.)
Optimal in this case would be 0.5 meaning that the rounding of intermediate results always proceeds to the nearest binary representation.

library GNU libc   IML     AMD   Newlib OpenLibm   Musl   Apple    LLVM   MSVC   FreeBSD   CUDA   ROCm
version  2.38    2023.2.0  4.1   4.3.0   0.8.1    1.2.4   13.4.1  16.0.6   2022    13.2   12.2.1  5.6.0
-------------------------------------------------------------------------------------------------------
sin     0.561    0.546   0.530   1.37    0.501    0.501   0.846   0.500   0.530    0.501   1.50   1.61
cos     0.561    0.548   0.729   2.91    0.501    0.501   0.862   0.500   0.530    0.501   1.52   1.61
tan     1.48     0.520   0.509   3.48    0.800    0.800   0.746   0.500   0.502    0.800   3.10   2.33
asin    0.898    0.528   0.781   0.926   0.743    0.743   0.634   0.500   0.861    0.743   1.36   2.54
acos    0.899    0.528   0.897   0.899   0.918    0.918   0.634   0.500   0.669    0.918   1.34   1.47
atan    0.853    0.541   0.501   0.853   0.853    0.853   0.722   0.500   0.501    0.853   1.21   2.10
atan2   1.52     0.550   0.584   1.52    1.55     1.55    0.722           0.584    1.55    2.18   2.01
sinh    1.89     0.538   0.500   2.51    1.83     1.83    0.601   0.500   0.501    1.83    2.94   0.922
cosh    1.89     0.506   1.03    2.51    1.36     1.03    0.589   0.500   0.500    1.36    2.34   0.567
tanh    2.19     0.514   1.56    2.19    2.19     2.19    0.817   0.500   1.27     2.19    1.82   1.41
asinh   1.78     0.527   0.518   1.78    1.78     1.78    0.515           1.99     1.78    1.78   0.573
acosh   2.01     0.501   0.504   2.01    2.01     2.01    0.502           2.89     2.01    2.18   0.564
atanh   1.73     0.507   0.547   1.73    1.73     1.73    0.511   0.500   2.35     1.73    3.16   0.574

I think that a combination of mpmath on the PC (for computing exact values of sufficiently high resolution) together with belay (for transferring the results of the same computations on the RPi Pico board transparently to the PC) might allow for a reproduction of such data with relatively low effort.

0 replies

peterhinch · Nov 28, 2023

mendenm
Nov 28, 2023

@peterhinch I will possibly upload a version of this in the afternoon that solves the roundoff problem. I think it is in the sin_alt routine, where a large number (51544.5) is subtracted from the complete time, including the hour. If that subtraction is done from the (integer, exact) mjd, and since 51544.5 also has an exact representation, one has a much smaller remaining number that one is adding the fractional hour to. This should gain at least a few bits of precision in the calculation of 't'. I am not next to my pico right now to see if it helps. Here is the relevant snippet (Not sure why this refuses to format... it is fine in preview): ```python def lmstt(self, t): # Takes the mjd and the longitude (west negative) and then returns # the local sidereal time in hours. Im using Meeus formula 11.4 # instead of messing about with UTo and so on # modified to use the pre-computed 't' value from sin_alt d = t * 36525 lst = 280.46061837 + 360.98564736629 * d + 0.000387933 * t * t - t * t * t / 38710000 return (lst % 360) / 15.0 + self.long / 15 def sin_alt(self, hour, func): # Returns the sine of the altitude of the object (moon or sun) # at an hour relative to the current date (mjd) mjd0 = self.mjd + hour / 24.0 t0 = (mjd0 - 51544.5) / 36525.0 mjd1 = (self.mjd - 51544.5) + hour / 24.0 t1 = mjd1 / 36525.0 print(f"sin_alt mjd0={mjd0:.7f} t0={t0:.7f} mjd1={mjd1:.7f} t1={t1:.7f}") dec, ra = func(t1) # hour angle of object: one hour = 15 degrees. Note lmst() uses longitude tau = 15.0 * (self.lmstt(t1) - ra) # sin(alt) of object using the conversion formulas salt = self.sglat * sin(radians(dec)) + self.cglat * cos(radians(dec)) * cos(radians(tau)) return salt ```

1 reply

peterhinch Nov 28, 2023
Collaborator Author

Thanks for that, I'll study it in detail tomorrow. That addresses one issue, improving the code.

However the second issue is the observation that RP2 seems to have lower precision than other 32-bit platforms.

Nov 28, 2023

peterhinch
Nov 28, 2023
Collaborator Author

I have reduced the code to a much simpler script which produces a single floating point value. It eliminates arguably contentious issues like the interpolator, epoch and time dependence.

Feel free to cut and paste this into other platforms:

from math import sin, cos, sqrt, atan, radians

LAT = 53.29756504536339  # Local defaults
LONG = -2.102811634540558

def frac(x):
    return x % 1

def minisun(t):
    p2 = 6.283185307
    coseps = 0.91748
    sineps = 0.39778

    M = p2 * frac(0.993133 + 99.997361 * t)
    DL = 6893.0 * sin(M) + 72.0 * sin(2 * M)
    L = p2 * frac(0.7859453 + M / p2 + (6191.2 * t + DL) / 1296000)
    SL = sin(L)
    X = cos(L)
    Y = coseps * SL
    Z = sineps * SL
    RHO = sqrt(1 - Z * Z)
    dec = (360.0 / p2) * atan(Z / RHO)
    ra = (48.0 / p2) * atan(Y / (X + RHO))
    if ra < 0:
        ra += 24
    return dec, ra


class RiSet:
    def __init__(self, lat=LAT, long=LONG):  # Local defaults
        self.sglat = sin(radians(lat))
        self.cglat = cos(radians(lat))
        self.long = long
        self.mjd = 60276  # MJD 28 Nov 2023
        print("Value",  self.rise())

    def lmst(self, mjd):
        d = mjd - 51544.5
        t = d / 36525.0
        lst = 280.46061837 + 360.98564736629 * d + 0.000387933 * t * t - t * t * t / 38710000
        return (lst % 360) / 15.0 + self.long / 15

    def sin_alt(self, hour, func):
        mjd = self.mjd + hour / 24.0
        t = (mjd - 51544.5) / 36525.0
        dec, ra = func(t)
        tau = 15.0 * (self.lmst(mjd) - ra)
        salt = self.sglat * sin(radians(dec)) + self.cglat * cos(radians(dec)) * cos(radians(tau))
        return salt

    def rise(self):
        sinho = sin(radians(-0.833))
        yp = self.sin_alt(9, minisun) - sinho
        return yp

r = RiSet()

Results on everything I could lay my hands on:

64-bit FP

Unix 0.1236665384827108 (presumed accurate)
PYBD-SF6W 0.1236665384827108 (0.0%)

32-bit FP

Pyboard 1.1 0.1252849 (+1.31%)
PYBD-SF2W 0.1252849 (+1.31%)
ESP32-S3 0.1252849 (+1.31%)
ESP32 0.1252849 (+1.31%)

Outliers

ESP8266 0.116815 (-5.5%)
RP2 0.1319534 (+6.7%)

Interesting that ESP8266 is similarly challenged.

0 replies

Nov 28, 2023

robert-hh
Nov 28, 2023
Collaborator

I could add:
SAMD21 and SAMD51: 0.1252849
W600; 0.1252849
nrf52840: 0.1252849
i.MX RT 1011 (32 bit FP): 0.1252849
i.MX RT1062 (64 bit FP): 0.1236665384827108

0 replies

rkompass · Nov 28, 2023

rkompass
Nov 28, 2023

I guessed, that the minisun() function was the culprit, but this is not so.

Changing this function for a printout of deviations:

    def sin_alt(self, hour, func):
        mjd = self.mjd + hour / 24.0
        t = (mjd - 51544.5) / 36525.0
        dec, ra = func(t)
        dec_prec = -21.28489097872491
        ra_prec = 16.264695874889682
        print('dec, ra:', dec, ra, 'deviations (%):', 100*(dec-dec_prec)/dec_prec, 100*(ra-ra_prec)/ra_prec)
        tau = 15.0 * (self.lmst(mjd) - ra)
        salt = self.sglat * sin(radians(dec)) + self.cglat * cos(radians(dec)) * cos(radians(tau))
        tau_prec = -44.06301269320757
        salt_prec = 0.10912845798021381
        print('tau, salt:', tau, salt, 'deviations (%):', 100*(tau-tau_prec)/tau_prec, 100*(salt-salt_prec)/salt_prec)
        return salt

where the _prec values are taken from 64bit Python shows that the big errors only introduce after the salt and tau computations.

dec, ra: -21.28487 16.26469 deviations (%): -0.0001075325 -7.036152e-05
tau, salt: -42.8231 0.1174152 deviations (%): -2.813956 7.593551
Value 0.1319534

on the Pico

1 reply

rkompass Nov 28, 2023

Following the same way it turned out that the lmst() function introduced the deviations:

    def lmst(self, mjd):
        d = mjd - 51544.5
        t = d / 36525.0
        d_prec = 8731.875
        t_prec = 0.23906570841889116
        d = d_prec
        t = t_prec
        print('d, t:', d, t, 'deviations (%):', 100*(d-d_prec)/d_prec, 100*(t-t_prec)/t_prec)
#        lst = 280.46061837 + 360.98564736629 * d + 0.000387933 * t * t - t * t * t / 38710000
        lst = 280.46061837 + 360.98564736629 * d + 0.000387933 * t * t - 2.5833118057349522e-08* t * t * t
        lst_prec = 3152362.0102370647
#        lst = lst_prec
        res = (lst % 360) / 15.0 + self.long / 15
        res_pico = 13.40981
        lst_prec = 3152362.0102370647
        res_prec = 13.327161695342511
        print('lst, res:', lst, res, 'deviations (%):', 100*(lst-lst_prec)/lst_prec, 100*(res-res_prec)/res_prec)
#        return res_pico
        return res

With output on Pico:
lst, res: 3152363.0 13.40981 deviations (%): 2.379168e-05 0.6201418

0.6 % error, whereas it is only 0.12 % on the stm Blackpill.

Setting starting values d and t to common values we find that the following two lines introduce the deviations between platforms:

d = 8731.875
t = 0.23906571
lst = 280.46061837 + 360.98564736629 * d + 0.000387933 * t * t - 2.5833118057349522e-08* t * t * t
res = (lst % 360) / 15.0 + -2.102811634540558 / 15
res_prec = 13.327161695342511
print('res:', res, 'deviation (%):', 100*(res-res_prec)/res_prec)

Blackpill (stm): res: 13.34315 deviation (%): 0.1199252
Pico: res: 13.40981 deviation (%): 0.6201418
Further processing increases the error about 10-fold. But this is not platform dependent any more. I I feed the wrong res value into the further Python (64 bit double) computation it yields the same wrong value.

rkompass · Nov 28, 2023

mendenm
Nov 28, 2023

So, I think I have at least cornered the numerical problem. In the lmst function,

        lst = 280.46061837 + 360.98564736629 * d + 0.000387933 * t * t - t * t * t / 38710000

is very subject to roundoff. For very tiny differences in the input, it gives wildly different ( a few degrees) results, which of course add up to minutes of time error. I will see if I can figure out a way to make it numerically much more stable in single precision.

1 reply

rkompass Nov 28, 2023

Same finding here :-)
Just noted that we have a (3152363.0 % 360) computation following which dramatically decreases precision.

Nov 28, 2023

mendenm
Nov 28, 2023

@peterhinch @rkompass I rewrote two routines to eliminate roundoff error and remove a large modular reduction. Try pasting these into the module (I've slightly cleaned it up since my first post):

    def lmstt(self, t):
        # Takes the mjd and the longitude (west negative) and then returns
        # the local sidereal time in hours. Im using Meeus formula 11.4
        # instead of messing about with UTo and so on
        # modified to use the pre-computed 't' value from sin_alt
        d = t * 36525
        df = frac(d)
        c1 = 360
        c2 = 0.98564736629
        dsum = c1 * df + c2 * d #no large integer * 360 here 
        lst = 280.46061837 + dsum + 0.000387933 * t * t - t * t * t / 38710000
        return (lst % 360) / 15.0 + self.long / 15

    def sin_alt(self, hour, func):
        # Returns the sine of the altitude of the object (moon or sun)
        # at an hour relative to the current date (mjd)
        mjd1 = (self.mjd - 51544.5) + hour / 24.0
        t1 = mjd1 / 36525.0
        #print(f"sin_alt mjd0={mjd0:.7f} t0={t0:.9f} mjd1={mjd1:.7f} t1={t1:.9f}")
        dec, ra = func(t1)
        # hour angle of object: one hour = 15 degrees. Note lmst() uses longitude
        tau = 15.0 * (self.lmstt(t1) - ra)
        # sin(alt) of object using the conversion formulas
        salt = self.sglat * sin(radians(dec)) + self.cglat * cos(radians(dec)) * cos(radians(tau))
    
        return salt

0 replies

mendenm · Nov 28, 2023

rkompass
Nov 28, 2023

The RP2 Pico transfers floats wrongly into the internal representation:

import array
import uctypes
from machine import mem32

def printfp32(s, x):
    a = array.array('f', (0,))
    a[0] = x
    a0 = mem32[uctypes.addressof(a)]
    sign = '-' if a0 >> 31 else ''
    exp = ((a0>>23) & 0xff) - 150
    frac = (0x800000 | (a0 & 0x7fffff))
    # print(f"{a[0]:12.6f} {a0:032b}")
    print(f'{s}   {x}: {sign} {frac} * 2**{exp}')
    

y = 360.98564736629
printfp32('y', y)

gives
y 360.9857: 11828778 * 2**-15 on stm32 (Blackpill) <--- correct, closest value
but
y 360.9858: 11828782 * 2**-15 on RP2 Pico. <- incorrect

this is a deviation of 4 in least significant bit units. Given the many constants in the code such errors may sum up.

Edit:
Same with ESP 8266:

y 360.986: 11828776 * 2**-15

3 replies

mendenm Nov 28, 2023

I was suspicious that this could be a contribution, too. 4 LSB is a pretty big error.

peterhinch Nov 29, 2023
Collaborator Author

I think this warrants an issue.

rkompass Nov 29, 2023

I'll raise an issue. The origins of the deviation of esp8266 and RP2 to stm boards are obscure to me.
I didn't find an str2float function in the RP2 SDK.

mendenm · Nov 29, 2023

peterhinch
Nov 29, 2023
Collaborator Author

@mendenm @rkompass I'm deeply impressed :) RP2 now produces 0.1236907, within 0.02% of the 64 bit result. PYBD is now within 0.003%.

I had been led to believe that the source of the algorithms, "Astronomy on the Personal Computer" by Montenbruck and Pfleger, was authoritative. It's certainly full of maths, sunrise and set being the nursery slopes. Clearly there is room for improvement in the algorithms (including quad()).

4 replies

mendenm Nov 29, 2023

@peterhinch It may well have worked much better when it was written, probably close to mjd 51544.5, since the 't' parameter would have been very close to zero at that point. That would eliminate the roundoff error in the modular reduction. Those equations should probably be updated to a more recent epoch.

rkompass Nov 29, 2023

I changed in get_mjd():
mjepoch = -10957.5 # 40587 - 51544.5 # Modified Julian date of C epoch (1 Jan 70)

and in sin_alt() accordingly:
mjd1 = self.mjd + hour / 24.0

but that does not improve any further. Still I think it's a reasonable change.

peterhinch Nov 29, 2023
Collaborator Author

I'll implement those.

I've incorporated your improvements to date in the actual code. The RP2040 and other 32-bit platforms produce rise and set times within 30s of the 64-bit results. This is (in my opinion) good enough for any likely microcontroller application. There is always the option of using a soft or hard 64-bit FPU.

peterhinch Nov 29, 2023
Collaborator Author

On further thought I'm unsure about the change to MJD. The modified Julian date is a known concept with wider significance: changing its value could cause confusion if anyone were to extend the module.

rkompass · Nov 29, 2023

scruss
Nov 29, 2023
Sponsor

If it turns out to be an RP2040 FP library issue (which it's increasingly looking like it may not be), the library used is Mark Owen's Qfplib-M0. Mark is extremely approachable.

5 replies

rkompass Nov 29, 2023

Thank you Steward. That was the library I remembered but couldn't find now. Still I see the str2float mentioned there (as an optinal external add-on) not included in the SDK.

dpgeorge Nov 30, 2023
Maintainer

The issue seems to be the pico-sdk's implementation of powf... it doesn't handle 10**-11 correctly.

raspberrypi/pico-sdk#1566

mendenm Nov 30, 2023

@dpgeorge Should the string-to-float conversion be modified to use an ipow routine (which is only a couple lines long, if it needs to be written out) instead of pow/powf? I suspect it would be both faster and more accurate.

dpgeorge Dec 18, 2023
Maintainer

No, I don't think that's necessary. The point is that powf is broken and needs to be fixed (and is now fixed in MicroPython's rp2 port), regardless of whether it's used for string-to-float conversion.

And because we try to optimise for size instead of speed (for non-critical things), it's best to reuse powf instead of adding a new ipow.

mendenm Dec 18, 2023

@dpgeorge OK, no argument on the appropriateness of fixing powf. ipow can be really tiny, but FP conversion speed probably isn't critical enough to be worth much space.

Nov 29, 2023

mendenm
Nov 29, 2023

Realistically, you can't do much better than that under any conditions, since local topography can make a much bigger change than that! We live in a bowl near a lake, with hills all around us. Normal calculations are way off. Incidentally, for a different project, which needed sunrise and sunset times, I used the 'astral' package: https://astral.readthedocs.io/en/latest/ but that was on a regular RPi4 with double precision math. I have not evaluated it at reduced precision ever. Edit: I just looked at astral. It has the same problem with not pre-reducing the integer part of the day count. It will give bad results on a single-precision machine, too. Of course, it was never written for that. However, I wonder if I should suggest the change, since it is harmless on normal machines, and an improvement on microcontrollers. From astral.julian ```python def geom_mean_long_sun(juliancentury: float) -> float: """Calculate the geometric mean longitude of the sun""" l0 = 280.46646 + juliancentury * (36000.76983 + 0.0003032 * juliancentury) return l0 % 360.0 ``` Marcus

0 replies

peterhinch · Nov 29, 2023

rkompass
Nov 29, 2023

I had the feeling that the computations were still unnecessarily involved:

from math import pi, sin, cos, sqrt, atan, radians

for i in range(20):
    z = i/20
    p2 = 2* pi  # 6.283185307
    rho = sqrt(1 - z*z)
    dec = (360.0 / p2) * atan(z / rho)
    sr = sin(radians(dec))
    print('z, sin(radians(dec))', z, sr)

for i in range(20):
    z = i/20
    p2 = 2* pi  # 6.283185307
    rho = sqrt(1 - z*z)
    dec = (360.0 / p2) * atan(z / rho)
    sr = cos(radians(dec))
    print('rho, cos(radians(dec))', rho, sr)

demonstrate that the atan(z / rho) is not necessary at all, because it is fed into sin() and cos() (after a conversion from radians to degrees and back again)
and by mathematical law
sin(atan(x)) = x/sqrt(1+x^2)
and cos(atan(x)) = 1/sqrt(1+x^2)
inserting sqrt(1-z^2) for rho something cancels out and in the end we have the equalities that are shown in the code.

So dec (declination) is interesting for itself, but not needed for the computations?
Same for ra (rectascension)?

24 replies

peterhinch Dec 7, 2023
Collaborator Author

You've lost me there for two reasons. Firstly the testing that revealed the error was done with lto==0. Secondly the round() function with default second arg yields an integer (number of decimal places is zero):

>>> round(1.5*3600)
5400
>>> round(1.75)
2
>>> round(1.33)
1

rkompass Dec 7, 2023

It could be that the simplification df = frac(0.5+h/24) in lstt() no longer is valid. I did it to introduce the un-reduced h/24 with full precision (role 2) into df. The 0.5 is the value that occurred always in the tests for frac(self.mjd - 51544.5). Could it be that it with is_up() has different values? Can these be re-introduced without making the error worse again?

mendenm Dec 7, 2023

@peterhinch No, round() with no second argument returns a float, whose fractional part is zero! That is not an int. And then, when that float (even zero) is added to the time.time(), the whole thing gets upconverted to float. Try: type(round(1.5))

robert-hh Dec 7, 2023
Collaborator

      MicroPython v1.22.0-preview.213.g14dc0f2d2 on 2023-12-05; Raspberry Pi Pico with RP2040
      Type "help()" for more information.
      >>> type(round(1.5))
      <class 'int'>
      >>> type(round(1.5,0))
      <class 'float'>

mendenm Dec 7, 2023

@robert-hh @peterhinch OK, Robert, you're right. I am embarrassed. I thought round(1.5,0) would return the same as round(1.5), and just did a shortcut typing it. CPython does the same thing (real int if no sec ond argument), which I had never noticed.

Nov 29, 2023

dpgeorge
Nov 29, 2023
Maintainer

The esp8266 is off because it uses truncated 30-bit floating point numbers, via MICROPY_OBJ_REPR_C.

For rp2... I'm not sure, will need investigation.

0 replies

rkompass · Dec 1, 2023

sophiedegran
Dec 1, 2023

What a famous discussion! Thank you all guy! ☺

1 reply

rkompass Dec 3, 2023

Nice you appreciate it.

Dec 1, 2023

rkompass
Dec 1, 2023

I did a string to float function in python. This is to exclude errors from wrong constants on the RP2 Pico.
It is a combination and simplification of the routine used by mpmath.
Here is the code, together with simple tests.
The 360.98564736629 is converted correctly on the Pico by this.

import array
import uctypes
from machine import mem32
import math

def bit_length(x): # Position of highest non-zero bit 
    r = 0
    while x > 0:
        x >>= 1
        r += 1
    return r

def str2f(s):
    s = s.lower().rstrip('l').replace('_', '')
    parts = s.split('e')
    if len(parts) == 1:
        ma = s
        e10 = 0
    else: # == 2
        ma = parts[0]
        e10 = int(parts[1])
    parts = ma.split('.')  # split mantissa
    if len(parts) == 2:
        a, b = parts[0], parts[1].rstrip('0') # before and after decimal point
        if a == '':
            a = '0'
        ma = a + b
        e10 -= len(b)
    m = int(ma.replace(' ', '').replace('+', ''))
    e2 = 0             # exponent of mantissa
    if m != 0:
        while not m & 1:
            m >>= 1
            e2 += 1
    if m == 0 or e10 >= 0:
        return math.ldexp(m*5**e10, e2+e10) # in every power of 10 is an equal power of 2
    else: 
        si = 0
        if m < 0:   # make m positive, si is sign
            m = -m
            si = 1
        mq = 5**-e10    # 10**-e10 = 5**-e10 * 2**-e10 , the latter change e2q:
        e2q = -e10
        extra = max(5, 24+5 - bit_length(m) + bit_length(mq))
        quo, rem = divmod(m<<extra, mq)
        if rem:
            quo = (quo<<1) + 1 # kind of rounding in extra last bit
            extra += 1
        exp = e2-e2q-extra
        return math.ldexp(-quo if si else quo, exp)

def printfp32(x, s=None):
    a = array.array('f', (0,))
    a[0] = x
    a0 = mem32[uctypes.addressof(a)]
    sign = '-' if a0 >> 31 else ''
    exp = ((a0>>23) & 0xff) - 150
    m = (0x800000 | (a0 & 0x7fffff))
#    print(f"{s if s else ''}  x: {x}  =  {(a0>>31)&1:01b} {((a0>>23) & 0xff):08b} {(a0 & 0x7fffff):023b}")
    print(f"{s if s else ''}  x: {x}  =  {sign}{m} * 2**{exp}")

s = '-5e2'
x = float(s)
printfp32(x, s)
y = str2f(s)
printfp32(y, s)

print()

s = '22.417e-030'
x = float(s)
printfp32(x, s)
y = str2f(s)
printfp32(y, s)

print()

s = '360.98564736629'
x = float(s)
printfp32(x, s)
y = str2f(s)
printfp32(y, s)

print()

s = '84324567992100453'
x = float(s)
printfp32(x, s)
y = str2f(s)
printfp32(y, s)

If you confirm that it works well: Should I post that to the issue too, as workaround?

0 replies

Search code, repositories, users, issues, pull requests...

RP2 Floating point accuracy #13075

Uh oh!

peterhinch Nov 27, 2023 Collaborator

Replies: 21 comments · 49 replies

Uh oh!

Uh oh!

Uh oh!

peterhinch Nov 27, 2023 Collaborator Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jimmo Nov 28, 2023 Maintainer

Uh oh!

robert-hh Nov 27, 2023 Collaborator

Uh oh!

Uh oh!

Uh oh!

peterhinch Nov 28, 2023 Collaborator Author

Uh oh!

Uh oh!

Uh oh!

peterhinch Nov 28, 2023 Collaborator Author

Uh oh!

Uh oh!

Uh oh!

peterhinch Nov 28, 2023 Collaborator Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

peterhinch Nov 28, 2023 Collaborator Author

Uh oh!

peterhinch Nov 28, 2023 Collaborator Author

64-bit FP

32-bit FP

Outliers

Uh oh!

Uh oh!

robert-hh Nov 28, 2023 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

peterhinch
Nov 27, 2023
Collaborator

peterhinch Nov 27, 2023
Collaborator Author

jimmo Nov 28, 2023
Maintainer

robert-hh
Nov 27, 2023
Collaborator

peterhinch
Nov 28, 2023
Collaborator Author

peterhinch Nov 28, 2023
Collaborator Author

peterhinch Nov 28, 2023
Collaborator Author

peterhinch Nov 28, 2023
Collaborator Author

peterhinch
Nov 28, 2023
Collaborator Author

robert-hh
Nov 28, 2023
Collaborator