Type-1 font subsetting #20716

jkseppan · Jul 22, 2021

PR Summary

Type-1 subsetting

This reduces pdf file sizes when usetex is active, at the cost of
some complexity in the code. We implement a charstring bytecode
interpreter to keep track of subroutine calls in font programs.

Recommend merging to main to give people time to test this, not to
a 3.10 point release.

Give dviread.DviFont a fake filename attribute and a get_fontmap
method for character tracking.

Add type hints to the code this touches.

Closes #127.

PR Checklist

Has pytest style unit tests (and pytest passes).
Is Flake 8 compliant (run flake8 on changed files to check).
New features are documented, with examples if plot related.
Documentation is sphinx and numpydoc compliant (the docs should build without error).
Conforms to Matplotlib style conventions (install flake8-docstrings and run flake8 --docstring-convention=all).
New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).

anntzer · Oct 5, 2021

Is this ready for review, now that #20715 has been merged?

jkseppan · May 4, 2025

This is failing on Ubuntu 22.04 and Windows but passing on 24.04 and Mac. Here's one failing image (test_usetex_pdf.png, so converted from pdf to png on the test system).

This looks like the font is entirely broken. The expected image similarly converted looks like this:

Strangely enough, the generated pdf file looks fine on my Mac.

jkseppan · May 4, 2025

I can repeat the error running on an Ubuntu 22.04 docker image:

GPL Ghostscript 9.55.0 (2021-09-27)
Copyright (C) 2021 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 1.
Page 1
   **** Error: can't process embedded font stream,
        attempting to load the font using its name.
               Output may be incorrect.
Querying operating system for font files...
Substituting font Helvetica for MPLAAD+CMR17.
Loading NimbusSans-Regular font from /usr/share/ghostscript/9.55.0/Resource/Font/NimbusSans-Regular... 4467044 2957798 6795968 5451925 4 done.
   **** Error: can't process embedded font stream,
        attempting to load the font using its name.
               Output may be incorrect.
Substituting font Helvetica for MPLAAC+CMR12.
   **** Error: can't process embedded font stream,
        attempting to load the font using its name.
               Output may be incorrect.
Substituting font Helvetica for MPLAAA+CMEX10.
   **** Error: can't process embedded font stream,
        attempting to load the font using its name.
               Output may be incorrect.
Substituting font Helvetica-Oblique for MPLAAB+CMMI12.
Loading NimbusSans-Italic font from /usr/share/ghostscript/9.55.0/Resource/Font/NimbusSans-Italic... 4534308 3175939 7138400 5761600 4 done.

The subsetting is wrong in some way that breaks Ghostscript 9.55 but not the viewer in macOS or the newer Ghostscript in Ubuntu 24.04. (Ghostscript 9.56 has a completely rewritten PDF interpreter.)

jkseppan · May 4, 2025

I hope I found the culprit... I was writing an extra delimiter between the Subrs and the Charstrings when one was already there.

lib/matplotlib/_type1font.py

anntzer · May 6, 2025

lib/matplotlib/_type1font.py

@@ -35,10 +37,64 @@

 from matplotlib.cbook import _format_approx
 from . import _api
+if T.TYPE_CHECKING:
+    from collections.abc import Iterable


We tend not to have inline type hints (and I personally very much don't like them), but there are a few exceptions (e.g. _mathtext.py) so I guess it's up to you whether you want to leave them in.

Right – I saw that there are some files with type hints, and figured that the project might be in the process of adding them. I've found type hints pretty useful at work, but of course we should have a consistent style in the project. Has this been discussed in the past on the dev list or somewhere?

I'm probably among the most vocal opponents, but I'll defer to @QuLogic or @ksunden on this aspect.

I gave my opinion on our gitter, but I'll copy the main comment here for the sake of consolidation/potential for finding it in the future:

I would say type hints are a net positive in my opinion, though I acknowledge that there are problems (perhaps especially in a case like ours where APIs were designed well before type hints).

We went with stub files on initial implementation specifically to minimize some risk, specifically since the stub files have are not even loaded at runtime, there was no chance of them interfering.

However, I do think inline reduces a level ongoing maintenance risk, in particular the chance of the two files getting out of sync. (We have tooling/CI in place to help catch such things, but don't think it fully removes the risk)

The other factor is that stub files allow us draw the line at public APIs, and not have to worry about typing our internal logic. (Which has positives and negatives, positives being largely catching additional problems by type checker, negative largely being the surface area that needs to be covered.)

So in all, I think my personal recommendation would be a relatively slow uptake, where we keep the stub files for most public facing things in the interim, do inline type hints for internal logic when it makes sense to do so (e.g. when doing so is actively helpful for some refactor/feature addition/etc) then once the majority of internal logic has inline hints, consider inlining the hints of the more public facing APIs.

So looking at this example in particular, I think this does fall within my recommendation there. I am not personally opposed to it, but also not hard pulling for it. I do question a bit whether to advocate for "do minimal as you are working and motivated" or "the unit for adding type hints should be one file at a time". Doing the latter would carry the advantage that you get a better sense of how complete the hints are, but the disadvantage of asking people to type hint 3-4x what they were otherwise looking at (in this case, for example), which also impacts the review-ability of the PR as there are lots of changes that are actually orthogonal.

For what its worth, this particular file already has one function which got an inline typehint added in #27796

and @tacaswell responded

It is worth re-opening the discussion of in-line type hints
the world seems to have stabilized and in other projects I have had type hints catch actual bugs before I ran the code (but I have also had to spend 15 minutes trying to sort out how to placate the type checker for code that clearly works a couple of times)

The type hints were useful for me while working on this, since VS Code pointed out some obvious mistakes in real time. But I agree that it is not ideal to leave the file only partially annotated, since it does not pass a full type check in its current state.

I could remove the extra type hints from this PR and make a separate PR to add hints to the whole file?

Sure, let's just go for it. I'm not going to be able to resist this forever 🙄

lib/matplotlib/_type1font.py

The old optimization where the same font file and descriptor could be shared by multiple differently-encoded fonts is wrong in the presence of subsetting, unless we also take the exact glyph subset into account. It is very unlikely for the exact same subset to be used by different encodings, so just remove the optimization. Rearrange _embedTeXFont in neater blocks.

lib/matplotlib/backends/backend_pdf.py

QuLogic · Jun 4, 2025

It looks like you'll need to install heuristica and DejaVuSans texlive packages in .github/workflows/tests.yml to get those new tests running.

jkseppan · Jun 5, 2025

It looks like you'll need to install heuristica and DejaVuSans texlive packages in .github/workflows/tests.yml to get those new tests running.

Yes, but Debian (and hence Ubuntu) bundles them into texlive-fonts-extra with a large number of other fonts. Not sure that slowing down all test runners is worth it, but perhaps we could do it in just one runner?

[Edit: Or we could see if we can install packages via tlmgr. Debian recommends against it but I don't think we need to worry about messing with package dependencies in a CI script.]

.github/workflows/tests.yml

QuLogic · Jun 5, 2025

Here are the times for the "Install OS dependencies" step in the previous 3 commits:

Commit	3.11 minver	3.11	3.12	3.13	3.13 freethread	3.12 arm
`5006acc`	1m9s	1m21s	2m59s	1m52s	1m24s	3m54s
`546a767`	1m10s	1m24s	2m23s	1m52s	1m22s	3m19s
`5c2f6bd`	1m13s	1m15s	2m16s	2m42	1m15s	2m20s

And for the 2 last commits with the additional 1 or 2 packages:

Commit	3.11 minver	3.11	3.12	3.13	3.13 freethread	3.12 arm
`7741e9b`	2m31s	2m36s	5m3s	2m50s	2m10s	3m25s
`f411fb3`	2m30s	2m58s	2m56s	2m34s	2m5s	5m15s

It looks like it would add about ~1m per job?

It's unfortunate we aren't running on Fedora, where all texlive packages are available individually (though no guarantee that'd be faster.)

jkseppan · Jun 5, 2025

Some ideas:

install TeX without using the Ubuntu packages, with e.g. TinyTeX (not guaranteed to be faster)
run the tests inside Docker, build images in phases so the base OS and TeX are installed in advance, store the images in GitHub Container Registry (can almost certainly shave off several minutes off the build time, at the cost of some complexity and may need to pay GitHub for the GHCR storage)
set up TeX only in a subset of the test runners, or dedicate some runners to TeX tests only (the latter would increase parallelism and might reduce wall-clock time, but Amdahl's law applies)

tacaswell · Jun 5, 2025

I think option 3 (only run on a subset). I would go with the newest Python.

QuLogic · Jun 6, 2025

Should be able to set an optional addition in the matrix just like we do with matrix.extra-requirements.

jkseppan · Jun 6, 2025

I moved the packages texlive-latex-extra, texlive-fonts-extra and texlive-lang-cyrillic to one runner only. (The texlive-latex-extra package has been included for a long time and I'm not sure which exact LaTeX dependencies are used by our tests.)

Complete run times:

Job	Run time
Python 3.11 on ubuntu-22.04 (Minimum Versions)	18m 37s
Python 3.11 on ubuntu-22.04	17m 31s
Python 3.12 on ubuntu-22.04-arm	12m 22s
Python 3.13 on ubuntu-22.04 (Extra TeX packages)	20m 25s
Python 3.13t on ubuntu-22.04 Free-threaded	16m 58s
Python 3.12 on ubuntu-24.04	20m 3s
Python 3.11 on macos-13	24m 18s
Python 3.12 on macos-14	17m 44s
Python 3.13 on macos-14	16m 36s
Create issue on failure	0s
	2h 44m 34s

For comparison, a recent commit on main:

Job	Run time
Python 3.11 on ubuntu-22.04 (Minimum Versions)	19m 24s
Python 3.11 on ubuntu-22.04	17m 10s
Python 3.12 on ubuntu-22.04-arm	11m 15s
Python 3.13 on ubuntu-22.04	18m 24s
Python 3.13t on ubuntu-22.04 Free-threaded	16m 54s
Python 3.12 on ubuntu-24.04	20m 32s
Python 3.11 on macos-13	22m 39s
Python 3.12 on macos-14	16m 5s
Python 3.13 on macos-14	15m 9s
Create issue on failure	0s
	2h 37m 32s

In this case it seems to have added a little under two minutes to that runner, including both the time taken to install and the extra tests run. There seems to be quite a bit of variation from run to run, and most of the overall increase is due to the macos builders where I don't think we even install TeX.

QuLogic · Jun 6, 2025

https://matplotlib.org/devdocs/install/dependencies.html#latex-dependencies says it's for import, sfmath, and type1cm, and type1cm is used unconditionally in the preamble. According to the log, texlive-latex-extra is already being installed by dependency of something else which is why it didn't break, so I don't think it makes sense to move it to the extras job only. It should stay for explicitness in all jobs.

The DejaVu and Heuristica fonts are used by the type-1 font subsetting tests. Heuristica has a Cyrillic encoding and apparently cannot be loaded without installing texlive-lang-cyrillic.

jkseppan · Jun 6, 2025

texlive-latex-extra is already being installed by dependency of something else which is why it didn't break, so I don't think it makes sense to move it to the extras job only. It should stay for explicitness in all jobs.

Agreed! I changed that package back.

QuLogic · Jun 6, 2025

Closes #127.

Congrats on closing a 3-digit issue.

jkseppan added backend: pdf topic: text/fonts topic: text/usetex labels Jul 22, 2021

jkseppan force-pushed the type1-subset branch from b66579d to 6546417 Compare July 22, 2021 14:54

jkseppan added status: waiting for other PR status: work in progress labels Jul 22, 2021

jkseppan force-pushed the type1-subset branch from 6546417 to 3aaf4c9 Compare July 22, 2021 16:20

jkseppan force-pushed the type1-subset branch 2 times, most recently from f6861ad to d8ae364 Compare August 30, 2021 08:00

oscargus added status: needs rebase and removed status: waiting for other PR labels Apr 14, 2022

jkseppan force-pushed the type1-subset branch from d8ae364 to 4dcd073 Compare May 4, 2025 11:48

github-actions bot removed status: needs rebase topic: text/fonts labels May 4, 2025

jkseppan added the topic: text/fonts label May 4, 2025

jkseppan force-pushed the type1-subset branch from 4dcd073 to 9c55b18 Compare May 4, 2025 16:25

github-actions bot removed the topic: text/fonts label May 4, 2025

jkseppan force-pushed the type1-subset branch 2 times, most recently from 9c5d971 to 9a9dc05 Compare May 4, 2025 17:15

jkseppan marked this pull request as ready for review May 4, 2025 17:16

jkseppan force-pushed the type1-subset branch from 9a9dc05 to b139e50 Compare May 4, 2025 17:42

anntzer reviewed May 6, 2025

View reviewed changes

lib/matplotlib/_type1font.py Outdated Show resolved Hide resolved

anntzer reviewed May 6, 2025

View reviewed changes

lib/matplotlib/_type1font.py Outdated Show resolved Hide resolved

anntzer reviewed May 6, 2025

View reviewed changes

lib/matplotlib/_type1font.py Outdated Show resolved Hide resolved

anntzer reviewed May 6, 2025

View reviewed changes

lib/matplotlib/_type1font.py Show resolved Hide resolved

jkseppan force-pushed the type1-subset branch from 1d88a25 to 53355ca Compare May 30, 2025 10:52

github-actions bot removed the topic: text/mathtext label May 30, 2025

QuLogic reviewed Jun 4, 2025

View reviewed changes

lib/matplotlib/backends/backend_pdf.py Show resolved Hide resolved

Keep a non-functional type1Descriptors

8abab03

QuLogic approved these changes Jun 4, 2025

View reviewed changes

QuLogic added this to Font and text overhaul Jun 5, 2025

QuLogic moved this to Ready for Review in Font and text overhaul Jun 5, 2025

jkseppan force-pushed the type1-subset branch from 4d7543a to c90ae58 Compare June 5, 2025 03:49

QuLogic reviewed Jun 5, 2025

View reviewed changes

.github/workflows/tests.yml Outdated Show resolved Hide resolved

jkseppan force-pushed the type1-subset branch 2 times, most recently from f0c24a2 to d8b5aec Compare June 5, 2025 04:11

jkseppan force-pushed the type1-subset branch 2 times, most recently from c8e8c11 to 892df4f Compare June 6, 2025 03:34

Install extra TeX fonts on one runner only

d5ab3b0

The DejaVu and Heuristica fonts are used by the type-1 font subsetting tests. Heuristica has a Cyrillic encoding and apparently cannot be loaded without installing texlive-lang-cyrillic.

jkseppan force-pushed the type1-subset branch from 892df4f to d5ab3b0 Compare June 6, 2025 06:13

QuLogic merged commit 7c49d52 into matplotlib:main Jun 6, 2025
40 checks passed

github-project-automation bot moved this from Ready for Review to Done in Font and text overhaul Jun 6, 2025

jkseppan deleted the type1-subset branch June 6, 2025 12:44

jkseppan mentioned this pull request Jun 12, 2025

Update font-related documentation #30150

Merged

Search code, repositories, users, issues, pull requests...

Uh oh!

Type-1 font subsetting #20716

Type-1 font subsetting #20716

Uh oh!

Conversation

jkseppan commented Jul 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

PR Checklist

Uh oh!

anntzer commented Oct 5, 2021

Uh oh!

jkseppan commented May 4, 2025

Uh oh!

jkseppan commented May 4, 2025

Uh oh!

jkseppan commented May 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anntzer May 6, 2025

Choose a reason for hiding this comment

Uh oh!

jkseppan May 9, 2025

Choose a reason for hiding this comment

Uh oh!

anntzer May 9, 2025

Choose a reason for hiding this comment

Uh oh!

ksunden May 12, 2025

Choose a reason for hiding this comment

Uh oh!

jkseppan May 13, 2025

Choose a reason for hiding this comment

Uh oh!

anntzer May 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

QuLogic commented Jun 4, 2025

Uh oh!

jkseppan commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

QuLogic commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkseppan commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tacaswell commented Jun 5, 2025

Uh oh!

QuLogic commented Jun 6, 2025

Uh oh!

jkseppan commented Jun 6, 2025

Uh oh!

QuLogic commented Jun 6, 2025

Uh oh!

jkseppan commented Jun 6, 2025

Uh oh!

Uh oh!

QuLogic commented Jun 6, 2025

Uh oh!

Uh oh!

jkseppan commented Jul 22, 2021 •

edited

Loading

jkseppan commented Jun 5, 2025 •

edited

Loading

QuLogic commented Jun 5, 2025 •

edited

Loading

jkseppan commented Jun 5, 2025 •

edited

Loading