Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

docling-project/docling-java

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

352 Commits
352 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Welcome to the Docling Java Project!

Docling Java

This is the repository for Docling Java, a Java API for using Docling.

Docs docling-core version docling-serve-api version docling-serve-client version docling-testcontainers version License MIT Discord OpenSSF Best Practices

Docling simplifies document processing, parsing diverse formats, including advanced PDF understanding, and providing seamless integrations with the Generative AI ecosystem.

Features

  • 🗂️ Parsing of multiple document formats incl. PDF, DOCX, PPTX, XLSX, HTML, WAV, MP3, VTT, images (PNG, TIFF, JPEG, ...), and more
  • 📑 Advanced PDF understanding incl. page layout, reading order, table structure, code, formulas, image classification, and more
  • 🧬 Unified, expressive DoclingDocument representation format
  • ↪️ Various export formats and options, including Markdown, HTML, DocTags and lossless JSON
  • 🔒 Local execution capabilities for sensitive data and air-gapped environments
  • 🤖 Plug-and-play integrations including LangChain4j
  • 🔍 Extensive OCR support for scanned PDFs and images
  • 👓 Support of several Visual Language Models (GraniteDocling)
  • 🎙️ Audio support with Automatic Speech Recognition (ASR) models

Documentation

See the documentation for complete information on the various artifacts that are provided by this project.

Artifacts

This project provides the following artifacts:

Getting started

Use DoclingServeApi.convertSource() to convert individual documents (make sure both docling-serve-api and docling-serve-client are on your classpath).

For example:

import ai.docling.serve.api.DoclingServeApi;
import ai.docling.serve.api.convert.request.ConvertDocumentRequest;
import ai.docling.serve.api.convert.request.source.HttpSource;
import ai.docling.serve.api.convert.response.ConvertDocumentResponse;

DoclingServeApi doclingServeApi = DoclingServeApi.builder()
    .baseUrl("<location of docling serve instance>")
    .build();

ConvertDocumentRequest request = ConvertDocumentRequest.builder()
    .source(
        HttpSource.builder()
            .url(URI.create("https://arxiv.org/pdf/2408.09869"))
            .build()
    )
    .build();

ConvertDocumentResponse response = doclingServeApi.convertSource(request);
System.out.println(response.getDocument().getMarkdownContent());

More usage information is available in the docs.

Get help and support

Please feel free to connect with us using the discussion section.

Contributing

Please read Contributing to Docling Java for details.

License

The Docling codebase is under MIT license. For individual model usage, please refer to the model licenses found in the original packages.

IBM ❤️ Open Source AI

The project was started by the AI for knowledge team at IBM Research Zurich.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Eric Deandrea
Eric Deandrea

💻 🖋 📖 🤔 🚇 🚧 📆 ⚠️ 👀
Thomas Vitale
Thomas Vitale

💻 🖋 📖 🤔 🚇 🚧 📆 ⚠️ 👀
Alex Soto
Alex Soto

🤔 📆
Cesar Berrospi Ramis
Cesar Berrospi Ramis

🤔
Michele Dolfi
Michele Dolfi

🎨 🤔 🚇 💬
Andrea Cosentino
Andrea Cosentino

🎨 📣 🤔 💻 📖
jmb-streamsets
jmb-streamsets

🤔 🎨
insectengine
insectengine

🖋 🎨
Maxim Lysak
Maxim Lysak

🖋 🎨
warnulf
warnulf

🐛
Kristian Rickert
Kristian Rickert

💻 🤔 📖 ⚠️

This project follows the all-contributors specification. Contributions of any kind are welcome!

Morty Proxy This is a proxified and sanitized view of the page, visit original site.