condenser.cpp

Focused FLUX.2 Klein inference engine. Fork of stable-diffusion.cpp stripped to a single model family, with a persistent JSON-over-stdio engine that keeps models in VRAM between generations.

stable-diffusion.cpp supports dozens of model architectures and sampling strategies. condenser.cpp trades that breadth for depth: one model family (FLUX.2 Klein), two frontends (CLI and engine), and a C API designed for embedding into desktop applications. The engine (cn-engine) runs as a child process, accepts NDJSON commands on stdin, and streams progress and results back on stdout — no HTTP server, no dependencies beyond the GPU driver.

Features

FLUX.2 Klein text-to-image and image-to-image (4GB and 9GB GGUF variants)
Persistent engine — load once, generate many, with faster repeat generations vs cold-start CLI
Prompt conditioning cache — same prompt with different seeds skips the text encoder entirely
Reference image latent cache — same reference image across img2img runs skips the VAE encoder
Multi-backend — Vulkan, CUDA, Metal, CPU (and experimental ROCm, SYCL, OpenCL)
VRAM offloading — run on 4-8GB GPUs by keeping idle model components on system RAM
Flash attention — reduced memory footprint and faster inference where supported
C API — clean C interface (condenser.h) for embedding into any language

Build

git clone --recursive https://github.com/jcluts/condenser.cpp
cd condenser.cpp

Vulkan

cmake -B build -DSD_VULKAN=ON
cmake --build build --config Release

CUDA

cmake -B build -DSD_CUDA=ON
cmake --build build --config Release

Metal (macOS)

cmake -B build -DSD_METAL=ON
cmake --build build --config Release

CPU only

cmake -B build
cmake --build build --config Release

Binaries are output to build/bin/. See docs/build.md for advanced options and platform-specific notes.

Usage

CLI — Single-shot generation

./build/bin/cn-cli \
  --diffusion-model model.gguf \
  --vae ae.safetensors \
  --llm qwen.gguf \
  --prompt "a cat on a windowsill" \
  -W 1024 -H 1024 \
  --steps 4 \
  --seed 42 \
  --offload-to-cpu --fa \
  -o output.png

Engine — Persistent inference

cn-engine is designed to be spawned as a child process by a parent application. It reads JSON commands from stdin and writes JSON responses to stdout. All log output goes to stderr.

# Quick test
echo '{"cmd":"ping","id":"1"}' | ./build/bin/cn-engine
# → {"id":"1","type":"ok","data":{"status":"pong"}}

# Interactive session
./build/bin/cn-engine
{"cmd":"load","id":"1","params":{"diffusion_model":"model.gguf","vae":"ae.safetensors","llm":"qwen.gguf","offload_to_cpu":true,"flash_attn":true}}
{"cmd":"generate","id":"2","params":{"prompt":"a sunset over mountains","width":1024,"height":1024,"seed":42,"steps":4,"output":"output.png"}}
{"cmd":"generate","id":"3","params":{"prompt":"a sunset over mountains","width":1024,"height":1024,"seed":99,"steps":4,"output":"output2.png"}}
{"cmd":"quit","id":"4"}

The second generate is fast — the model stays loaded and the prompt conditioning is cached from the first run.

See tools/engine/README.md for the full protocol reference, caching behavior, and integration examples (Python, Node.js).

Key Runtime Flags

Flag	Effect
`--offload-to-cpu`	Keep model weights on system RAM, move to VRAM only during compute
`--fa`	Enable flash attention (Vulkan, CUDA)
`--vae-on-cpu`	Run VAE on CPU
`--llm-on-cpu`	Keep text encoder on CPU entirely
`--vae-tiling`	Tile-based VAE decode for high-resolution output

See docs/INFERENCE_PARAMETERS.md for the complete parameter reference.

Supported Models

FLUX.2 Klein (4GB and 9GB GGUF variants)
FLUX.2

Credits

Based on stable-diffusion.cpp by leejet.

Name	Name	Last commit message	Last commit date
Latest commit History 551 Commits 551 Commits
.claude	.claude
.github	.github
assets	assets
docs	docs
ggml @ 5cecdad	ggml @ 5cecdad
thirdparty	thirdparty
tools	tools
.clang-format	.clang-format
.clang-tidy	.clang-tidy
.gitignore	.gitignore
.gitmodules	.gitmodules
AGENTS.md	AGENTS.md
CMakeLists.txt	CMakeLists.txt
LICENSE	LICENSE
README.md	README.md
common.hpp	common.hpp
condenser.cpp	condenser.cpp
condenser.h	condenser.h
conditioner.hpp	conditioner.hpp
denoiser.hpp	denoiser.hpp
diffusion_model.hpp	diffusion_model.hpp
esrgan.hpp	esrgan.hpp
flux.hpp	flux.hpp
format-code.sh	format-code.sh
ggml_extend.hpp	ggml_extend.hpp
gguf_reader.hpp	gguf_reader.hpp
llm.hpp	llm.hpp
model.cpp	model.cpp
model.h	model.h
name_conversion.cpp	name_conversion.cpp
name_conversion.h	name_conversion.h
ordered_map.hpp	ordered_map.hpp
rng.hpp	rng.hpp
rng_philox.hpp	rng_philox.hpp
rope.hpp	rope.hpp
tokenize_util.cpp	tokenize_util.cpp
tokenize_util.h	tokenize_util.h
upscaler.cpp	upscaler.cpp
util.cpp	util.cpp
util.h	util.h
vae.hpp	vae.hpp
version.cpp	version.cpp
vocab_qwen.hpp	vocab_qwen.hpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

condenser.cpp

Features

Build

Vulkan

CUDA

Metal (macOS)

CPU only

Usage

CLI — Single-shot generation

Engine — Persistent inference

Key Runtime Flags

Supported Models

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

Folders and files

Latest commit

History

Repository files navigation

condenser.cpp

Features

Build

Vulkan

CUDA

Metal (macOS)

CPU only

Usage

CLI — Single-shot generation

Engine — Persistent inference

Key Runtime Flags

Supported Models

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages