Skip to main content
  1. About
  2. For Teams
Asked
Modified 2 months ago
Viewed 127 times
4

Arch linux 6.15.7-zen1-1-zen,

$ awk -V
GNU Awk 5.3.2, API 4.0, PMA Avon 8-g1, (GNU MPFR 4.2.2, GNU MP 6.3.0)

Start with y.csv:

4 2016201820192020
5 20162018201920202023
5 20162018201920202024
5 00000000000000002024

then, variants of printf:

$ awk '{print $1,sprintf("%020d",$2)}' y.csv
4 00002016201820192020
5 20162018201920200704
5 20162018201920200704
5 00000000000000002024
$ awk '{$2=sprintf("%020d",$2);print $1,$2}' y.csv
4 00002016201820192020
5 20162018201920200704
5 20162018201920200704
5 00000000000000002024
$ awk '{printf("%020d\n",$2)}' y.csv
00002016201820192020
20162018201920200704
20162018201920200704
00000000000000002024
$ awk '{printf("%020.0f\n",$2)}' y.csv
00002016201820192020
20162018201920200704
20162018201920200704
00000000000000002024

What's going on? The last 4 digits of the 2nd & 3rd lines are always changed, seemingly randomly, to 0704!

3
  • 5
    Don't interpret them as numbers, e.g. awk '{print $1, sprintf("%20s",$2)}' y.csv
    pmf
    –  pmf
    2025-07-26 20:21:19 +00:00
    Commented Jul 26 at 20:21
  • 3
    your version of gawk has been compiled with the MPFR and MP libs so you should be able to use the -M (or --bignum) flag to insure you get the desired output; see the gnu.org link mentioned KamilCuk's answer for additional details on support for this feature
    markp-fuso
    –  markp-fuso
    2025-07-27 02:15:29 +00:00
    Commented Jul 27 at 2:15
  • Aside: y.csv doesn't contain CSV so you should rename it.
    Ed Morton
    –  Ed Morton
    2025-08-11 12:39:22 +00:00
    Commented Aug 11 at 12:39

2 Answers 2

7

What's going on?

The number is too big for an int, thus it is interpreted as a double IEEE 754.

Double can not represent all values of integer, the value is rounded to the closest representable value.

Consider reading https://www.gnu.org/software/gawk/manual/gawk.html#Other-Stuff-to-Know . Consider -M option. See https://www.binaryconvert.com/result_double.html?decimal=050048049054050048049056050048049057050048050048050048050052 .

Sign up to request clarification or add additional context in comments.

Comments

1

bigint is overkill for a formatting issue :

echo '
4 2016201820192020
5 20162018201920202023
5 20162018201920202024
5 00000000000000002024' | 

awk '$2 = sprintf("%.*d%s", (_ = 20 - length($2)) * (_ >= 1), 0, $2)'

4 00002016201820192020
5 20162018201920202023
5 20162018201920202024
5 00000000000000002024

The extra filter (_ >= 1) is to guard against the extremely unlike event that $2 came in longer than 20 characters. Without the guard clause, an extra 0 would get prepended for no reason.

Comments

Your Answer

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.