Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Releases: ohnoitsaninja/stable-diffusion.cpp

Paralol GPU handoff build 2fac608

10 May 00:43

Choose a tag to compare

Paralol GPU handoff runtime build for stable-diffusion.cpp commit 2fac608a63f380261edfb7493659ff0d6a083007.

Fork documentation:

Includes:

  • GPU latent/image handoff APIs
  • COMFY_NORMAL full-frame VAE encode/decode path
  • CUDA implicit-GEMM VAE convolution backend
  • DPM++ SDE sampler variants used by ComfyUI-style SDXL workflows
  • caller-owned GPU image download API: sd_gpu_image_download_to_buffer
  • DLL-owned download free API: sd_free_downloaded_image
  • capability-gated safety refusals for unsupported VAE Encode GPU latent handoff
  • CUDA 13 / system CUDA dependency mode

Validated locally with sd-latent-smoke:

  • T2I bridge latent -> GPU decode -> old allocation download + sd_free_downloaded_image
  • T2I bridge latent -> GPU decode -> caller-owned RGBA8 buffer download
  • strict sampler GPU-resident refusal
  • VAE Encode GPU latent refusal
  • export/capability smoke

Important limitation:
sd_sample_latent_gpu is a bridge-uploaded sampled latent, not true all-GPU sampler internals. SDCPP_STRICT_GPU_RESIDENT=1 refuses that path honestly.

Paralol ControlNet perf build a8e80ee

10 May 04:51

Choose a tag to compare

Paralol fork Windows CUDA runtime build for commit a8e80ee.

Changes since paralol-controlnet-f16-2061b5a:

  • Keeps ControlNet outputs backend/GPU-resident and feeds them directly into UNet.
  • Eliminates the per-step ControlNet output host materialization/download/re-upload path.
  • Caches the guided hint on the backend for reuse.
  • Skips unnecessary control-image VAE encode for ordinary external ControlNet paths.
  • Adds SDCPP_TRACE_CONTROLNET=1 timing logs for ControlNet compute, UNet compute, cache hits, and per-denoise-step totals.

Local validation:

  • SDXL 1024 ControlNet 8-step smoke succeeded with canny-sdxl-new-v2.safetensors and --type f16.
  • ControlNet outputs: host_materialize=0ms, d2h=0, gpu_bytes=102.50MB per pass.
  • ControlNet 8-step sampling: 7.56s.
  • Matching no-ControlNet 8-step sampling: 5.24s.
  • ControlNet compute: 16 passes, avg 141.3ms, sum 2261ms.
  • UNet compute stayed flat: ControlNet avg 322.1ms vs no-ControlNet avg 320.1ms.

Staged DLL SHA256:
BBC4B51D20D3C5745E86C6FD42F8C944242598FE78E20FB8066BB74FB5866ECA

Asset SHA256:
23BF5D223E3A6034EA53402F1EA37B63C82EC26E396C7F18436E2A33199DCB85

Paralol ControlNet f16 build 2061b5a

10 May 03:21

Choose a tag to compare

Paralol fork Windows CUDA runtime build for commit 2061b5a.

Changes:

  • Fixes ControlNet dtype-aware loading by reading ControlNet metadata before allocating params.
  • Propagates wtype/tensor type rules into the ControlNet loader.
  • Adds ControlNet dtype and byte histograms for source, expected, destination, and source-to-destination conversions.
  • Adds Diffusers ControlNet key mapping for controlnet_down_blocks, controlnet_mid_block, and controlnet_cond_embedding.
  • Increases the ControlNet graph budget for SDXL ControlNet graphs.

Local validation:

  • SDXL ControlNet smoke succeeded with canny-sdxl-new-v2.safetensors and --type f16.
  • ControlNet destination tensor bytes before allocation: f16=2384.63MB, f32=2.98MB.
  • Output image saved locally at build/controlnet-f16-smoke/controlnet-f16-smoke.png.

Asset SHA256:
476806EF0BABA2B5C4F9F9EAED847EF8D36DB1B63F9E6A76A01EC6C3F13AD5E2

Paralol latent API 7ade90e

08 May 12:13

Choose a tag to compare

Paralol patched stable-diffusion.cpp build based on upstream 7ade90e. Adds the resident latent C API used by Paralol: sd_encode_image, sd_sample_latent, sd_decode_latent, free_sd_latent, free_sd_image, sd_release_clip_model_params, and sd_release_diffusion_model_params. Windows x64 CUDA binary staged from the local Paralol validation build. SHA256: 2B58D52117C26B623AFF44F2C0C5971B10C26DEAB986FBA06C9EC063BC9C19C5

Paralol DPM++ SDE sampler build 87f1783

08 May 13:07

Choose a tag to compare

Adds ComfyUI-compatible DPM++ SDE sampler names and CLI/API options for the Paralol latent API build. Includes dpmpp_sde, dpmpp_sde_gpu, dpmpp_2m_sde, dpmpp_2m_sde_gpu, dpmpp_2m_sde_heun, dpmpp_2m_sde_heun_gpu, dpmpp_3m_sde, and dpmpp_3m_sde_gpu. The *_gpu names currently alias the same C++ implementation; Brownian noise is a deterministic pass-1 approximation rather than a full BrownianTree cache. Runtime asset SHA256: 2d56880baf4ad4585ca4db8fc26707b454ca4c853b9c9bdf832395c18dbf2691.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.