Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Comments

Close side panel

Use -fspecialize-aggressively to improve performance by 30% on ACME build#4584

Merged
purefunctor merged 2 commits intopurescript:masterpurescript/purescript:masterfrom
seastian:masterseastian/purescript:masterCopy head branch name to clipboard
Oct 18, 2025
Merged

Use -fspecialize-aggressively to improve performance by 30% on ACME build#4584
purefunctor merged 2 commits intopurescript:masterpurescript/purescript:masterfrom
seastian:masterseastian/purescript:masterCopy head branch name to clipboard

Conversation

@seastian
Copy link
Contributor

@seastian seastian commented Oct 4, 2025

Description of the change

Use -flag-fspecialise-aggressively to improve performance by 30% on ACME build.

Stats for ACME build before change

'purs' 'compile' '--source-globs-file' 'sources.txt' +RTS '-N' '-sstats.txt' 
 448,761,996,296 bytes allocated in the heap
  86,000,620,168 bytes copied during GC
   1,389,766,704 bytes maximum residency (39 sample(s))
      15,732,600 bytes maximum slop
            3891 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     20264 colls, 20264 par   88.627s  26.122s     0.0013s    0.0084s
  Gen  1        39 colls,    38 par   29.499s   3.859s     0.0990s    0.1906s

  Parallel GC work balance: 66.99% (serial 0%, perfect 100%)

  TASKS: 42 (1 bound, 41 peak workers (41 total), using -N10)

  SPARKS: 4556 (4556 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.004s  (  0.004s elapsed)
  MUT     time  171.470s  ( 22.159s elapsed)
  GC      time  118.125s  ( 29.982s elapsed)
  EXIT    time    0.065s  (  0.010s elapsed)
  Total   time  289.665s  ( 52.155s elapsed)

  Alloc rate    2,617,143,209 bytes per MUT second

  Productivity  59.2% of total user, 42.5% of total elapsed

Stats for ACME build after change

'purs' 'compile' '--source-globs-file' 'sources.txt' +RTS '-N' '-sstats.txt' 
 244,192,893,800 bytes allocated in the heap
  75,082,410,768 bytes copied during GC
   1,409,996,352 bytes maximum residency (34 sample(s))
      16,230,848 bytes maximum slop
            3888 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     13248 colls, 13248 par   56.774s  19.087s     0.0014s    0.0089s
  Gen  1        34 colls,    33 par   23.722s   3.168s     0.0932s    0.1958s

  Parallel GC work balance: 63.18% (serial 0%, perfect 100%)

  TASKS: 42 (1 bound, 41 peak workers (41 total), using -N10)

  SPARKS: 4556 (4556 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.005s  (  0.004s elapsed)
  MUT     time  104.922s  ( 14.618s elapsed)
  GC      time   80.497s  ( 22.255s elapsed)
  EXIT    time    0.070s  (  0.010s elapsed)
  Total   time  185.493s  ( 36.887s elapsed)

  Alloc rate    2,327,371,167 bytes per MUT second

  Productivity  56.6% of total user, 39.6% of total elapsed

Binary size went from 110MB to 130MB. I think this is fine and worth the speed improvements.

Compilation time of purs takes longer, but when dev we can always use stack --fast

Repo with acme build: https://github.com/seastian/purs-acme

Yes we spend so much time doing GC 🫣, much of it can be improved by using different RTS options (+RTS -A256m -n16m -RTS) at the cost of more ram.

What do you all think?


Checklist:

  • Added a file to CHANGELOG.d for this PR (see CHANGELOG.d/README.md)
  • Added myself to CONTRIBUTORS.md (if this is my first contribution)
  • Linked any existing issues or proposals that this pull request should close
  • Updated or added relevant documentation
  • Added a test for the contribution (if applicable)

Copy link
Member

@f-f f-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very happy to merge this patch 😄

Can anyone else have a look?

Copy link
Member

@garyb garyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binary size went from 110MB to 130MB. I think this is fine and worth the speed improvements.

Seems like a reasonable tradeoff to me 👍

@MonoidMusician
Copy link
Contributor

Wow, that's great improvement! I can reproduce the 30% speedup for ACME, and on our work codebase it is more like 40%, with both single-threaded and multi-threaded builds.

Could you also add it to purescript.cabal for cabal builds? Just to be consistent.

diff --git a/purescript.cabal b/purescript.cabal
index 7601ec39..70e7dabb 100644
--- a/purescript.cabal
+++ b/purescript.cabal
@@ -402,7 +402,7 @@ executable purs
   import: defaults
   hs-source-dirs: app
   main-is: Main.hs
-  ghc-options: -fno-warn-unused-do-bind -threaded -rtsopts -with-rtsopts=-N -Wno-unused-packages
+  ghc-options: -fno-warn-unused-do-bind -threaded -rtsopts -with-rtsopts=-N -Wno-unused-packages -fspecialize-aggressively -fexpose-all-unfoldings
   build-depends:
     prettyprinter >=1.7.1 && <1.8,
     prettyprinter-ansi-terminal >=1.1.3 && <1.2,

@seastian
Copy link
Contributor Author

@MonoidMusician I am not sure where that should go, purescript.cabal or cabal.project?

If it goes to purescript.cabal then it needs to be added to both library and executable and can be dropped from stack.yaml.

The benefit of adding it to cabal.project is that it mimics what stack is doing and people who use purescript as a library will not be forced to compile it with optimizations on (rare?). I just checked and text package includes an -O2 flag in their text.cabal so guess is fine to have optimization options in cabal files.

What do you think?

@MonoidMusician
Copy link
Contributor

If it goes to purescript.cabal then it needs to be added to both library and executable and can be dropped from stack.yaml.

I'm not sure how that follows. Is there a problem with adding it to just executable like I did? I am not a cabal/stack expert but it did seem to build me a binary with optimizations. I guess any solution that adds it to purs-the-binary and does not require it on purescript-the-library is fine with me.

@purefunctor
Copy link
Member

Both would work for the release-distributed builds, but for Hackage-based installs it would need to be in purescript.cabal since cabal.project is not included in the cabal sdist by default.

@purefunctor
Copy link
Member

purefunctor commented Oct 14, 2025

This one is quite subtle actually, the inlining/specialisation would only apply to the executable's modules. I think adding it to cabal.project would be best, since it optimises both the purescript and purs targets for our release builds—this will not trickle downstream for library consumers but they can always add ghc-options through their own cabal.project or stack.yaml

@seastian
Copy link
Contributor Author

Okay added to cabal.project! This is ready to go in, would be really nice to make a release, want to use this on our CI 😊

@purefunctor purefunctor merged commit 8ac0fb2 into purescript:master Oct 18, 2025
7 checks passed
@finnhodgkin finnhodgkin mentioned this pull request Nov 4, 2025
5 tasks
noisyscanner pushed a commit to OxfordAbstracts/purescript that referenced this pull request Dec 1, 2025
…uild (purescript#4584)

* Use -fspecialize-aggressively to improve performance

* add fspecialize to cabal project
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.