Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

jamesprowe/html-to-markdown

Open more actions menu
 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,760 Commits
1,760 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

html-to-markdown

Banner

High-performance HTML to Markdown conversion powered by Rust. Ships as native bindings for Rust, Python, TypeScript/Node.js, Ruby, PHP, Go, Java, C#, Elixir, R, C (FFI), and WebAssembly with identical rendering across all runtimes.

Documentation | Live Demo | API Reference

Highlights

  • 150-280 MB/s throughput (10-80x faster than pure Python alternatives)
  • 12 language bindings with consistent output across all runtimes
  • Structured resultconvert() returns ConversionResult with content, metadata, tables, images, and warnings
  • Metadata extraction — title, headers, links, images, structured data (JSON-LD, Microdata, RDFa)
  • Visitor pattern — custom callbacks for content filtering, URL rewriting, domain-specific dialects
  • Table extraction — extract structured table data (cells, headers, rendered markdown) during conversion
  • Secure by default — built-in HTML sanitization via ammonia

Quick Start

# Rust
cargo add html-to-markdown-rs

# Python
pip install html-to-markdown

# TypeScript / Node.js
npm install @kreuzberg/html-to-markdown-node

# Ruby
gem install html-to-markdown

# CLI
cargo install html-to-markdown-cli
# or
brew install kreuzberg-dev/tap/html-to-markdown

See the Installation Guide for all languages including PHP, Go, Java, C#, Elixir, R, and WASM.

Usage

convert() is the single entry point. It returns a structured ConversionResult:

# Python
from html_to_markdown import convert

result = convert("<h1>Hello</h1><p>World</p>")
print(result["content"])        # # Hello\n\nWorld
print(result["metadata"])       # title, links, headings, …
// TypeScript / Node.js
import { convert } from "@kreuzberg/html-to-markdown-node";

const result = convert("<h1>Hello</h1><p>World</p>");
console.log(result.content);    // # Hello\n\nWorld
console.log(result.metadata);   // title, links, headings, …
// Rust
use html_to_markdown_rs::convert;

let result = convert("<h1>Hello</h1><p>World</p>", None)?;
println!("{}", result.content.unwrap_or_default());

Language Bindings

Language Package Install
Rust html-to-markdown-rs cargo add html-to-markdown-rs
Python html-to-markdown pip install html-to-markdown
TypeScript / Node.js @kreuzberg/html-to-markdown-node npm install @kreuzberg/html-to-markdown-node
WebAssembly @kreuzberg/html-to-markdown-wasm npm install @kreuzberg/html-to-markdown-wasm
Ruby html-to-markdown gem install html-to-markdown
PHP kreuzberg-dev/html-to-markdown composer require kreuzberg-dev/html-to-markdown
Go htmltomarkdown go get github.com/kreuzberg-dev/html-to-markdown/packages/go/v3
Java dev.kreuzberg:html-to-markdown Maven / Gradle
C# KreuzbergDev.HtmlToMarkdown dotnet add package KreuzbergDev.HtmlToMarkdown
Elixir html_to_markdown mix deps.get html_to_markdown
R htmltomarkdown install.packages("htmltomarkdown")
C (FFI) releases Pre-built .so / .dll / .dylib

Part of the Kreuzberg Ecosystem

html-to-markdown is developed by kreuzberg.dev and powers the HTML conversion pipeline in Kreuzberg, a document intelligence library for extracting text from PDFs, images, and office documents.

Contributing

Contributions welcome! See CONTRIBUTING.md for setup instructions and guidelines.

License

MIT License — see LICENSE for details.

About

High performance and CommonMark compliant HTML to Markdown converter. Maintained by the Kreuzberg team. Kreuzberg is a fast, polyglot document intelligence engine with a Rust core. It extracts structured data from 56+ document formats using streaming parsers and built-in OCR.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • HTML 42.6%
  • Rust 28.2%
  • Java 4.6%
  • C# 3.9%
  • PHP 3.3%
  • Python 3.3%
  • Other 14.1%
Morty Proxy This is a proxified and sanitized view of the page, visit original site.