Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Gemma 3:4B Multimodal CLIP Error [WinError -529697949] Windows Error 0xe06d7363 #2031

Copy link
Copy link
Open
@PlatDrake2875

Description

@PlatDrake2875
Issue body actions

Expected Behavior

I am trying to load the multimodal model bartowski/google_gemma-3-4b-it-qat-GGUF using the Llama.from_pretrained method. The script is configured to use the Llama3VisionAlphaChatHandler with the appropriate mmproj file.

I expect the library to successfully load both the multimodal projector and the main language model onto the GPU (using n_gpu_layers=-1) and become ready for inference without crashing.


Current Behavior

The library successfully loads and initializes the mmproj-google_gemma-3-4b-it-qat-f16.gguf file, detects the CUDA device, and loads the CLIP model to the CUDA backend. However, immediately after loading the CLIP model and before the main language model is fully loaded, the program terminates with a Windows C++ exception.

The script fails with the error: An error occurred during model operation: [WinError -529697949] Windows Error 0xe06d7363.


Environment and Context

  • Hardware: NVIDIA GeForce RTX 3060 (Compute Capability 8.6)
  • Operating System: Windows 11 24H2
  • SDK Versions:
    • Python: 3.10.5
    • CUDA Toolkit: 12.4
    • llama-cpp-python: 0.3.9 (built using a pre-built CUDA 12.4 wheel)
    • torch: 2.5.1+cu124

Steps to Reproduce

  1. Set up a Python virtual environment on Windows with CUDA 12.4 installed.

  2. Install llama-cpp-python from source with CUDA support using PowerShell:

    $env:FORCE_CMAKE=1; $env:CMAKE_ARGS='-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=native'
    uv pip install --upgrade --force-reinstall --no-cache-dir --no-binary :all: llama-cpp-python
  3. Install other dependencies like torch, transformers, huggingface-hub.

  4. Run the following Python script which attempts to load the bartowski/google_gemma-3-4b-it-qat-GGUF model with full GPU offload.

    from pathlib import Path
    from llama_cpp import Llama
    from llama_cpp.llama_chat_format import Llama3VisionAlphaChatHandler
    
    def load_gemma_model(mmproj_path: Path):
        print("Attempting to load Gemma 3 model...")
        chat_handler = Llama3VisionAlphaChatHandler(clip_model_path=str(mmproj_path))
        llm = Llama.from_pretrained(
            repo_id="bartowski/google_gemma-3-4b-it-qat-GGUF",
            filename="google_gemma-3-4b-it-qat-IQ2_M.gguf", # Also fails with Q4_K_M and other quants
            chat_handler=chat_handler,
            n_ctx=2048,
            n_gpu_layers=-1, # Fails with full offload
            verbose=True
        )
        return llm
    
    if __name__ == "__main__":
        model_dir = Path("./models")
        model_dir.mkdir(parents=True, exist_ok=True)
        mmproj_filename = "mmproj-google_gemma-3-4b-it-qat-f16.gguf"
        mmproj_path = model_dir / mmproj_filename
    
        # (I assume mmproj file is downloaded and present at mmproj_path)
        try:
            if mmproj_path.exists():
                model = load_gemma_model(mmproj_path)
                print("Model loaded successfully!")
            else:
                print(f"Error: mmproj file not found at {mmproj_path}")
        except Exception as e:
            print(f"\nAn error occurred during model operation: {e}")

Failure Logs

The following log is produced when running the script. The crash occurs after the clip_model_load completes and before the main Llama model object is returned.

clip_model_load: loaded meta data with 16 key-value pairs and 439 tensors from models\mmproj-google_gemma-3-4b-it-qat-f16.gguf
clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_model_load: - kv   0:                               general.architecture str              = clip
clip_model_load: - kv   1:                                clip.projector_type str              = gemma3
clip_model_load: - kv   2:                                clip.has_text_encoder bool           = false
clip_model_load: - kv   3:                                clip.has_vision_encoder bool           = true
clip_model_load: - kv   4:                               clip.has_llava_projector bool           = false
clip_model_load: - kv   5:                               clip.vision.image_size u32              = 896
clip_model_load: - kv   6:                               clip.vision.patch_size u32              = 14
clip_model_load: - kv   7:                         clip.vision.embedding_length u32              = 1152
clip_model_load: - kv   8:                      clip.vision.feed_forward_length u32              = 4304
clip_model_load: - kv   9:                             clip.vision.projection_dim u32              = 2560
clip_model_load: - kv  10:                                clip.vision.block_count u32              = 27
clip_model_load: - kv  11:                         clip.vision.attention.head_count u32              = 16
clip_model_load: - kv  12:                   clip.vision.attention.layer_norm_epsilon f32              = 0.000001
clip_model_load: - kv  13:                               clip.vision.image_mean arr[f32,3]       = [0.500000, 0.500000, 0.500000]
clip_model_load: - kv  14:                                 clip.vision.image_std arr[f32,3]       = [0.500000, 0.500000, 0.500000]
clip_model_load: - kv  15:                                        clip.use_gelu bool           = true
clip_model_load: - type  f32:  276 tensors
clip_model_load: - type  f16:  163 tensors
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
clip_model_load: CLIP using CUDA backend
clip_model_load: params backend buffer size =  811.79 MB (439 tensors)
key clip.vision.image_grid_pinpoints not found in file
key clip.vision.mm_patch_merge_type not found in file
key clip.vision.image_crop_resolution not found in file

An error occurred during model operation: [WinError -529697949] Windows Error 0xe06d7363

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.