Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

e2e medium job fails with OSError: Not enough free space to write 67108864 bytes #3298

Copy link
Copy link
@booxter

Description

@booxter
Issue body actions

Describe the bug

https://github.com/instructlab/instructlab/actions/runs/14519599987/job/40737534990?pr=3295


Permuting layer 0
Permuting layer 1
Permuting layer 2
Permuting layer 3
Permuting layer 4
Permuting layer 5
Permuting layer 6
Permuting layer 7
Permuting layer 8
Permuting layer 9
Permuting layer 10
Permuting layer 11
Permuting layer 12
Permuting layer 13
Permuting layer 14
Permuting layer 15
Permuting layer 16
Permuting layer 17
Permuting layer 18
Permuting layer 19
Permuting layer 20
Permuting layer 21
Permuting layer 22
Permuting layer 23
Permuting layer 24
Permuting layer 25
Permuting layer 26
Permuting layer 27
Permuting layer 28
Permuting layer 29
Permuting layer 30
Permuting layer 31
model.embed_tokens.weight                        -> token_embd.weight                        | F32    | [32008, 4096]
model.layers.0.self_attn.q_proj.weight           -> blk.0.attn_q.weight                      | F32    | [4096, 4096]
model.layers.0.self_attn.k_proj.weight           -> blk.0.attn_k.weight                      | F32    | [4096, 4096]
model.layers.0.self_attn.v_proj.weight           -> blk.0.attn_v.weight                      | F32    | [4096, 4096]
model.layers.0.self_attn.o_proj.weight           -> blk.0.attn_output.weight                 | F32    | [4096, 4096]
model.layers.0.mlp.gate_proj.weight              -> blk.0.ffn_gate.weight                    | F32    | [11008, 4096]
model.layers.0.mlp.up_proj.weight                -> blk.0.ffn_up.weight                      | F32    | [11008, 4096]
model.layers.0.mlp.down_proj.weight              -> blk.0.ffn_down.weight                    | F32    | [4096, 11008]
model.layers.0.input_layernorm.weight            -> blk.0.attn_norm.weight                   | F32    | [4096]
model.layers.0.post_attention_layernorm.weight   -> blk.0.ffn_norm.weight                    | F32    | [4096]
model.layers.1.self_attn.q_proj.weight           -> blk.1.attn_q.weight                      | F32    | [4096, 4096]
model.layers.1.self_attn.k_proj.weight           -> blk.1.attn_k.weight                      | F32    | [4096, 4096]
model.layers.1.self_attn.v_proj.weight           -> blk.1.attn_v.weight                      | F32    | [4096, 4096]
model.layers.1.self_attn.o_proj.weight           -> blk.1.attn_output.weight                 | F32    | [4096, 4096]
model.layers.1.mlp.gate_proj.weight              -> blk.1.ffn_gate.weight                    | F32    | [11008, 4096]
[155/291] Writing tensor blk.17.attn_q.weight                   | size   4096 x   4096  | type F32  | T+ 109
    sys.exit(ilab())
             ^^^^^^
  File "/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/instructlab/clickext.py", line 356, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/instructlab/cli/model/train.py", line 524, in train
    full_train.train(
  File "/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/instructlab/model/full_train.py", line 395, in train
    llamacpp_convert_to_gguf.convert_llama_to_gguf(
  File "/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/instructlab/llamacpp/llamacpp_convert_to_gguf.py", line 1731, in convert_llama_to_gguf
    OutputFile.write_all(
  File "/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/instructlab/llamacpp/llamacpp_convert_to_gguf.py", line 1340, in write_all
    of.gguf.write_tensor_data(ndarray)
  File "/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/gguf/gguf_writer.py", line 417, in write_tensor_data
    tensor.tofile(fout)
OSError: Not enough free space to write 67108864 bytes

It's exactly 64 Mb. The error comes from gguf library that relies on numpy. 64 Mb doesn't seem like a lot to me, and df -h output in the job shows that the machine has a EBS volume of 2TB that is nearly empty before the failure. Maybe a bug in gguf or some other underlying component.

Reactions are currently unavailable

Metadata

Metadata

Assignees

No one assigned

    Labels

    CI/CDAffects CI/CD configurationAffects CI/CD configurationbugSomething isn't workingSomething isn't workingci-failurePR has at least one CI failurePR has at least one CI failurestale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.