STY: scientific notation or PEP 515 underscores in large numeric literals#19597
STY: scientific notation or PEP 515 underscores in large numeric literals#19597Pierre-Sassoulas wants to merge 7 commits intoastropy:mainastropy/astropy:mainfrom Pierre-Sassoulas:scientific-notationPierre-Sassoulas/astropy:scientific-notationCopy head branch name to clipboard
Conversation
|
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
These would change the types from int to float, it really doesn't seem safe to do it as a shotgun refactor. PEP 515 underscores on the other hand are both safe and indeed readability, so I'm inclined to merge as is. |
27ad8e3 to
2759f99
Compare
|
I spoke too fast, I didn't realize many of these are floats already. Feel free to rewrite them in scientific notation. Just make sure to preserve exact values. Thanks ! |
| "d": "347.222d", | ||
| "hr": "8333.336hr", | ||
| "min": "500000.167min", | ||
| "s": "30000010.0s", |
There was a problem hiding this comment.
This should be reverted
There was a problem hiding this comment.
Indeed, I used sed and it modified a string 😅 .
| eta0 = -19.9 / 3_600_000.0 | ||
| xi0 = 9.1 / 3_600_000.0 | ||
| da0 = -22.9 / 3_600_000.0 |
There was a problem hiding this comment.
nit: this value could be stored in a variable and use scientific noation
There was a problem hiding this comment.
The comment in the code is an LLM bluff, I know very little about this 4bdea27
| T = (equinox.jd - jd1950) / 36525.0 | ||
| # Eccentricity of the Earth's orbit | ||
| ek = math.radians(k) * ((-0.000000126 * T - 0.00004193) * T + 0.01673011) | ||
| ek = math.radians(k) * ((-0.000_000_126 * T - 0.00004193) * T + 0.01673011) |
There was a problem hiding this comment.
there are 2 other literals to update on the same line and they could all use scientific notation
There was a problem hiding this comment.
I was going slow on scientific notation with the assumption that is was "safer" to add PEP515 underscore (less changes), I'm going to be more daring. Thank you for the review, I'm here for the nits ; to get a feeling if the suggestions from the pylint check are good or not !
There was a problem hiding this comment.
I changed the approach in 62a64d2, I selected the "big win" first.
Replace the bare 3600000.0 conversion factor in _icrs_to_fk5_matrix() with a module-level _MAS_PER_DEG named constant in scientific notation. The value (3,600 arcsec/deg * 1,000 mas/arcsec) is documented in a short side comment. Same float, more self-documenting at the call sites. Per review suggestion from neutrinoceros on PR 19597.
…chars Apply pylint's scientific-notation suggestion to literals where the scientific form is at least 4 characters shorter than the underscore alternative. Mostly tiny CODATA / IAU uncertainties (e.g. 0.00000000051e-27 -> 5.1e-37, saving 10 chars) plus a handful of round-magnitude values where scientific is the obvious win: 4_000_000.0 -> 4e6, 3_396_190.0 -> 3.39619e6, 6378137.0 -> 6.378137e6, 0.000000443 -> 4.43e-7, etc. For values where scientific saves 0-3 chars vs underscore (Julian Dates, speed of light, Rydberg constant, day-numbered epochs), the underscore form preserves more visual structure and will be applied later instead. ruff-format collapsed several CODATA(...) calls onto single lines once the literals were short enough to fit; that's most of the line-count delta. All literals parse to identical floats.
Apply pylint bad-number-notation's "unconventional zero literal" suggestion in two test files: trailing-zero forms (0.00, 0.000) are rewritten as 0.0. Identical floats; only the textual form changes.
fa85920 to
89e54d2
Compare
|
Thanks a lot for the review ! Quick follow-up, I pushed what I think is consensual (big win in length, what you asked for directly, and removal of strange 0s). I have some question for a follow-up PR. (Or more commit in this PR seeing how many persons get pinged when you touch a lot of files.) Where should the "needs scientific notation" threshold sit? Pylint defaults to 1.0e6. At that level JDs (~2.4e6) and planet radii (~6e6) get flagged. You asked for more scientific notation in #19597 (comment), but it seems it's at ~= 10^5. Happy to lower the threshold. (But this would catch 3600, 86400, etc. which is why it's higher by default). Would engineering notation make more sense than scientific notation, sometime or all the time ? (299.792458e6 rather than 2.99792458e8). I have a lot of scientific notation changes locally, but I figured I'd better ask to be sure. Should we check integers ? This PR is floats only; since you can't write an int in scientific notation, the only fix would be PEP 515 underscores (1_000_000). There's a fair number of 1000000-style ints around (test counts, FITS constants…). There's some precision limitation on some floats with much higher number of significant numbers (16+) than what python can handle. Mostly ERFA reference fixtures. If we want to keep the precision we should switch to decimal.Decimal("...") (changes type, awkward through np.array/Quantity), keep it as is (but it's misleading about the actual precision), or accept the small drift by rewriting and truncating the unhandled precision (3.14159265358979323846 => 3.141592653589793). There's also underflow to zero (1.0e-324) or overflow to math.inf. Is it okay to switch to 0.0 or math.inf in this case ? What would you favor ? For example: 0.17293718391166087785 has 20 significant digits, more than float can represent exactly, and it could be written as '0.17293718391166088' or 'decimal.Decimal("0.17293718391166087785")' (or left alone). astropy/astropy/io/ascii/tests/test_c_reader.py Line 1193 in 12b5c56 '1.0e-324' underflows to zero could be written 0.0 directly. (The test itself uses string to represent float to pass them to C for precisely this reason). I checked that the number are the same using this script: |
I don't have a well defined personal threshold but there's a well established precedent you could build from: the >>> print(f"{100_000:g}")
100000
>>> print(f"{1_000_000:g}")
1e+06 |
I thought that was the starting point TBH, and yes I think we should be using PEP 515 for integers whose absolute value exceed
this seems like the way to go to me.
hum, that's odd. Now I wonder why it wasn't written as As you noticed, touching every part of astropy at once indeed pings a lot of people. We usually do these piecemeal, one namespace per PR, though maybe another way to go about it is to start your PRs as draft and only undraft them once I've actually completed the review cycle, so most subcribers would only get pinged once the PR is ready (and, most likely, merged). |
I'm not an expert myself but think there's officially a forced 53 bit precision since 2022 (python3.11) python/cpython#31790 / https://bugs.python.org/issue46917 ( |
Following pylint's bad-number-notation (C0329) check with the same $\geq$ 1e6 threshold as Python's 'g' format specifier, normalize literals such as 2456293.25 to 2.45629325e6 and 299792458.0 to 2.99792458e8. Mantissas are kept at the shortest length that round-trips to the same float64 value, so no numerical drift.
|
Excellent, we actually don't support older versions. |
Following pylint's bad-float-precision (C0330) check, normalize float literals carrying more digits than IEEE 754 binary64 can represent (e.g. ERFA reference fixtures with 17-19 significant digits) to their shortest round-tripping repr — Python's default for repr(float). The parsed float64 value is unchanged on every architecture astropy runs on, so test results are bit-identical.
Following pylint's bad-number-notation check with --suggest-int-underscore=y, add visual digit grouping to integer literals $\geq$ 1e6: groups of three for decimal (e.g. 2_446_354), groups of four for hex (e.g. 0xFFFF_FFFF). PEP 515 underscores are purely cosmetic — int values are byte-identical.
|
|
||
|
|
||
| OMEGA_EARTH = (1.002_737_811_911_354_48 * u.cycle / u.day).to( | ||
| OMEGA_EARTH = (1.0027378119113546 * u.cycle / u.day).to( |
There was a problem hiding this comment.
Need to fix this in pylint
There was a problem hiding this comment.
| OMEGA_EARTH = (1.0027378119113546 * u.cycle / u.day).to( | |
| OMEGA_EARTH = (1.002_737_811_911_354_6 * u.cycle / u.day).to( |
| @@ -460,7 +462,9 @@ def test_gcrs_hadec(): | ||
| gcrs = GCRS(usph, obstime="J2000") # broadcast with times below | ||
|
|
||
| # check array times sure N-d arrays work | ||
| times = Time(np.linspace(2456293.25, 2456657.25, 51) * u.day, format="jd")[:, None] | ||
| times = Time(np.linspace(2.45629325e6, 2.45665725e6, 51) * u.day, format="jd")[ | ||
| :, None | ||
| ] |
There was a problem hiding this comment.
The formatting change makes me want to revert this.
| values = np.array( | ||
| [1.01e200, 3.14e307, 1.777e308, -np.inf, 0.0, 4.94e-324, 1.024e308] | ||
| ) | ||
| values = np.array([1.01e200, 3.14e307, 1.777e308, -np.inf, 0.0, 5e-324, 1.024e308]) |
There was a problem hiding this comment.
5e-324 is the smallest possible absolute value representable with IEEE754 float 64, so maybe it should be 0.0 too here, but we're already checking zero before.
|
Overall I think this PR needs to be done with more nuance and respecting the perspectives of the original authors. In particular, there are many places like above where I find the original easier to read and I'm not keen on nearly all the changes in the |
PEP 515 wasn't an option until Python 2.7 and 3.5 were both dropped (astropy 4.2, November 2020), and even then, it has probably be underused for lack of awareness. Scientific notation on the other hand was always available, and our average contributor is very likely to know about it, of course. |
|
Some commits here are more consensual :
Those commits are opinionated or not very refined, I can drop them or open other PR to discuss them in particular (added them here to reduce noise):
|
|
I'm happy with the underscores, but I'm 👎 on the blanket changes to scientific notation. |
|
With regards to 62a64d2, I'm not sure that is uncontroversial. The reason it was written in the original verbose way is so that the uncertainty has the same exponent as the value. That way the reader can get an immediate visual idea of the relative precision of the constant. I'm not a maintainer of constants, but IMO the original is preferable. |
|
tl;dr -- here are my drive-by comments:
Thanks, all! |
Description
This pull request is to address readability of number larger than 10e6 or smaller than 10e-6 using mostly PEP 515 underscore separators. Scientific or engineering notation wouldn't make sense in most case I think, but I'm also happy to take feedback or close this if the change isn't wanted.
The motivation is the new bad-number-notation checker in pylint (pylint-dev/pylint#10425), We added astropy to our primer especially because it has a lot of floats, so I'd really like to have your feedback about (i.e is this change wanted or obnoxious) :) The underlying pylint check itself is still in review and will probably be in pylint 4.1.0.
Fixes #