Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

A minimal docker image for running tabula-java for PDF table extraction

Notifications You must be signed in to change notification settings

grfeller/docker-tabula-java

Open more actions menu
 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

docker-tabula-java

A minimal Docker image for running the tabula-java from a Docker container. Based on java:openjdk-8-jre-alpine.

Running

You can mount a PDF file available at /full/path/to/my/file.pdf to the docker container by using the -v option to the docker run command:

$ docker run --rm -v /full/path/to/my/file.pdf:/data.pdf sseemayer/tabula-java [tabula options] /data.pdf > output.csv

To get tabula help:

$ docker run --rm sseemayer/tabula-java --help
usage: tabula [-a <AREA>] [-c <COLUMNS>] [-d] [-f <FORMAT>] [-g] [-h] [-i]
       [-n] [-o <OUTFILE>] [-p <PAGES>] [-r] [-s <PASSWORD>] [-u] [-v]

Tabula helps you extract tables from PDFs
 -a,--area <AREA>           Portion of the page to analyze
                            (top,left,bottom,right). Example: --area
                            269.875,12.75,790.5,561. Default is entire
                            page
 -c,--columns <COLUMNS>     X coordinates of column boundaries. Example
                            --columns 10.1,20.2,30.3
 -d,--debug                 Print detected table areas instead of
                            processing.
 -f,--format <FORMAT>       Output format: (CSV,TSV,JSON). Default: CSV
 -g,--guess                 Guess the portion of the page to analyze per
                            page.
 -h,--help                  Print this help text.
 -i,--silent                Suppress all stderr output.
 -n,--no-spreadsheet        Force PDF not to be extracted using
                            spreadsheet-style extraction (if there are
                            ruling lines separating each cell, as in a PDF
                            of an Excel spreadsheet)
 -o,--outfile <OUTFILE>     Write output to <file> instead of STDOUT.
                            Default: -
 -p,--pages <PAGES>         Comma separated list of ranges, or all.
                            Examples: --pages 1-3,5-7, --pages 3 or
                            --pages all. Default is --pages 1
 -r,--spreadsheet           Force PDF to be extracted using
                            spreadsheet-style extraction (if there are
                            ruling lines separating each cell, as in a PDF
                            of an Excel spreadsheet)
 -s,--password <PASSWORD>   Password to decrypt document. Default is empty
 -u,--use-line-returns      Use embedded line returns in cells. (Only in
                            spreadsheet mode.)
 -v,--version               Print version and exit.

About

A minimal docker image for running tabula-java for PDF table extraction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Dockerfile 100.0%
Morty Proxy This is a proxified and sanitized view of the page, visit original site.