Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 4f0d60f

Browse filesBrowse files
feat: upgrade PDF.js to v5.6.205
1 parent c488cf8 commit 4f0d60f
Copy full SHA for 4f0d60f

11 files changed

+2,533-2,036Lines changed: 2533 additions & 2036 deletions

File tree

Expand file treeCollapse file tree
Open diff view settings
Filter options
Expand file treeCollapse file tree
Open diff view settings
Collapse file

‎README.md‎

Copy file name to clipboardExpand all lines: README.md
+22-38Lines changed: 22 additions & 38 deletions
  • Display the source diff
  • Display the rich diff
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,25 @@
11
# unpdf
22

3-
A collection of utilities for PDF extraction and rendering. Designed specifically for serverless environments, but it also works in Node.js, Deno, Bun and the browser. `unpdf` is particularly useful for serverless AI applications, especially for summarizing PDF documents in document analysis workflows.
3+
Utilities for PDF extraction and rendering across all JavaScript runtimes – Node.js, Deno, Bun, the browser, and serverless environments like Cloudflare Workers. Especially useful for AI applications that need to summarize or analyze PDF documents.
44

5-
This library ships with a serverless build/redistribution of Mozilla's [PDF.js](https://github.com/mozilla/pdf.js) that is optimized for edge environments. Some string replacements, global mocks and inlining the PDF.js worker allow the browser code to become platform agnostic. See [`pdfjs.rollup.config.ts`](./pdfjs.rollup.config.ts) for the details.
6-
7-
This library is also intended as a modern alternative to the unmaintained but still popular [`pdf-parse`](https://www.npmjs.com/package/pdf-parse).
5+
Ships with a serverless build of Mozilla's [PDF.js](https://github.com/mozilla/pdf.js), optimized for edge environments. If you're coming from [`pdf-parse`](https://www.npmjs.com/package/pdf-parse), `unpdf` is a modern, actively maintained alternative with broader runtime support.
86

97
## Features
108

11-
- 🏗️ Made for Node.js, browser and serverless environments
9+
- 🏗️ Works in Node.js, browser and serverless environments
1210
- 🪭 Includes serverless build of PDF.js ([`unpdf/pdfjs`](./package.json#L34))
1311
- 💬 Extract [text](#extract-text-from-pdf), [links](#extractlinks), and [images](#extractimages) from PDF files
1412
- 🧠 Perfect for AI applications and PDF summarization
15-
- 🧱 Opt-in to legacy PDF.js build
16-
- 💨 Zero dependencies
17-
18-
## PDF.js Compatibility
19-
20-
> [!Tip]
21-
> The serverless PDF.js bundle provided by `unpdf` is built from PDF.js v5.4.394.
22-
23-
You can use an [official PDF.js build](#official-or-legacy-pdfjs-build) by using the [`definePDFJSModule`](#definepdfjsmodule) method. This is useful if you want to use a specific version or a custom build of PDF.js.
13+
- 🧱 Opt-in to official or legacy PDF.js build
2414

2515
## Installation
2616

27-
Run the following command to add `unpdf` to your project.
28-
2917
```bash
3018
# pnpm
31-
pnpm add -D unpdf
19+
pnpm add unpdf
3220

3321
# npm
34-
npm install -D unpdf
35-
36-
# yarn
37-
yarn add -D unpdf
22+
npm install unpdf
3823
```
3924

4025
## Usage
@@ -44,15 +29,11 @@ yarn add -D unpdf
4429
```ts
4530
import { extractText, getDocumentProxy } from 'unpdf'
4631

47-
// Either fetch a PDF file from the web or load it from the file system
32+
// Fetch a PDF from the web or load it from the file system
4833
const buffer = await fetch('https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf')
4934
.then(res => res.arrayBuffer())
50-
const buffer = await readFile('./dummy.pdf')
5135

52-
// Then, load the PDF file into a PDF.js document
5336
const pdf = await getDocumentProxy(new Uint8Array(buffer))
54-
55-
// Finally, extract the text from the PDF file
5637
const { totalPages, text } = await extractText(pdf, { mergePages: true })
5738

5839
console.log(`Total pages: ${totalPages}`)
@@ -64,9 +45,9 @@ console.log(text)
6445
Usually you don't need to worry about the PDF.js build. `unpdf` ships with a serverless build of the latest PDF.js version. However, if you want to use the official PDF.js version or the legacy build, you can define a custom PDF.js module.
6546

6647
> [!WARNING]
67-
> PDF.js v5.x uses `Promise.withResolvers`, which may not be supported in all environments, such as Node < 22. Consider to use the bundled serverless build, which includes a polyfill, or use an older version of PDF.js.
48+
> PDF.js v5.x uses `Promise.withResolvers`, which may not be supported in all environments, such as Node < 22. Consider using the bundled serverless build, which includes a polyfill, or use an older version of PDF.js.
6849
69-
For example, if you want to use the official PDF.js build, you can do the following:
50+
For example, if you want to use the official PDF.js build:
7051

7152
```ts
7253
import { definePDFJSModule, extractText, getDocumentProxy } from 'unpdf'
@@ -107,6 +88,17 @@ const document = await getDocument(new Uint8Array(data)).promise
10788
console.log(await document.getMetadata())
10889
```
10990

91+
## How It Works
92+
93+
> [!NOTE]
94+
> The serverless PDF.js bundle is built from PDF.js v5.6.205.
95+
96+
Heart and soul of this package is the [`pdfjs.rollup.config.ts`](./pdfjs.rollup.config.ts) file. It uses [Rollup](https://rollupjs.org/) to bundle PDF.js into a single file for serverless environments. The key techniques:
97+
98+
- **String replacements** strip browser-specific references from the PDF.js source.
99+
- **Worker inlining** embeds the PDF.js worker directly into the main bundle, since serverless runtimes can't load separate worker files.
100+
- **Global polyfills** provide missing APIs like `FinalizationRegistry` (unavailable in Cloudflare Workers).
101+
110102
## API
111103

112104
### `definePDFJSModule`
@@ -209,15 +201,7 @@ for (const link of links) console.log(link)
209201

210202
### `extractImages`
211203

212-
Extracts images from a specific page of a PDF document, including necessary metadata such as width, height, and calculated color channels.
213-
214-
> [!NOTE]
215-
> This method will only work in Node.js and browser environments.
216-
217-
In order to use this method, make sure to meet the following requirements:
218-
219-
- Use the official PDF.js build (see below for details).
220-
- Install the [`@napi-rs/canvas`](https://github.com/Brooooooklyn/canvas) package if you are using Node.js. This package is required to render the PDF page as an image.
204+
Extracts images from a specific page of a PDF document, including necessary metadata such as width, height, and calculated color channels. Works with both the serverless and official PDF.js build.
221205

222206
**Type Declaration**
223207

@@ -285,7 +269,7 @@ To render a PDF page as an image, you can use the `renderPageAsImage` method. Th
285269

286270
In order to use this method, make sure to meet the following requirements:
287271

288-
- Use the official PDF.js build (see below for details).
272+
- Use the official PDF.js build (see [Official or Legacy PDF.js Build](#official-or-legacy-pdfjs-build)).
289273
- Install the [`@napi-rs/canvas`](https://github.com/Brooooooklyn/canvas) package if you are using Node.js. This package is required to render the PDF page as an image.
290274

291275
**Type Declaration**
Collapse file

‎examples/cloudflare/package.json‎

Copy file name to clipboardExpand all lines: examples/cloudflare/package.json
+2-2Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"dev": "esbuild --bundle --platform=neutral --outfile=build/index.js index.ts && wrangler dev build/index.js"
77
},
88
"devDependencies": {
9-
"esbuild": "^0.25.12",
10-
"wrangler": "^4.51.0"
9+
"esbuild": "^0.28.0",
10+
"wrangler": "^4.81.1"
1111
}
1212
}
Collapse file

‎package.json‎

Copy file name to clipboardExpand all lines: package.json
+21-17Lines changed: 21 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"name": "unpdf",
33
"type": "module",
44
"version": "1.4.0",
5-
"packageManager": "pnpm@10.24.0",
5+
"packageManager": "pnpm@10.33.0",
66
"description": "PDF extraction and rendering across all JavaScript runtimes",
77
"author": "Johann Schopplich <hello@johannschopplich.com>",
88
"license": "MIT",
@@ -15,10 +15,17 @@
1515
"url": "https://github.com/unjs/unpdf/issues"
1616
},
1717
"keywords": [
18+
"cloudflare",
19+
"edge",
20+
"extract",
1821
"parse",
19-
"pdfjs-dist",
2022
"pdf",
21-
"serverless"
23+
"pdf.js",
24+
"pdfjs-dist",
25+
"rendering",
26+
"serverless",
27+
"text-extraction",
28+
"workers"
2229
],
2330
"sideEffects": false,
2431
"exports": {
@@ -70,24 +77,21 @@
7077
}
7178
},
7279
"devDependencies": {
73-
"@antfu/eslint-config": "^6.2.0",
74-
"@napi-rs/canvas": "^0.1.83",
75-
"@rollup/plugin-alias": "^6.0.0",
76-
"@rollup/plugin-inject": "^5.0.5",
80+
"@antfu/eslint-config": "^8.2.0",
81+
"@napi-rs/canvas": "^0.1.97",
7782
"@rollup/plugin-node-resolve": "^16.0.3",
7883
"@rollup/plugin-replace": "^6.0.3",
79-
"@rollup/plugin-terser": "^0.4.4",
84+
"@rollup/plugin-terser": "^1.0.0",
8085
"@rollup/plugin-typescript": "^12.3.0",
81-
"@types/node": "^24.10.1",
82-
"bumpp": "^10.3.2",
83-
"eslint": "^9.39.1",
84-
"fast-glob": "^3.3.3",
85-
"pdfjs-dist": "~5.4.394",
86-
"rollup": "^4.53.3",
87-
"tinyglobby": "^0.2.15",
86+
"@types/node": "^24.12.2",
87+
"bumpp": "^11.0.1",
88+
"eslint": "^10.2.0",
89+
"pdfjs-dist": "~5.6.205",
90+
"rollup": "^4.60.1",
91+
"tinyglobby": "^0.2.16",
8892
"tslib": "^2.8.1",
89-
"typescript": "^5.9.3",
93+
"typescript": "^6.0.2",
9094
"unbuild": "^3.6.1",
91-
"vitest": "^4.0.14"
95+
"vitest": "^4.1.4"
9296
}
9397
}
Collapse file

‎pdfjs.rollup.config.ts‎

Copy file name to clipboardExpand all lines: pdfjs.rollup.config.ts
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ export default defineConfig({
3636
preventAssignment: true,
3737
values: {
3838
// Force inlining the PDF.js worker.
39-
'await import(/*webpackIgnore: true*/this.workerSrc)': '__pdfjsWorker__',
39+
'await import(\n /*webpackIgnore: true*/\n /*@vite-ignore*/\n this.workerSrc)': '__pdfjsWorker__',
4040
// Force setting up fake PDF.js worker.
4141
'#isWorkerDisabled = false': '#isWorkerDisabled = true',
4242
// Remove WASM code from the worker.

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.