Llama3 GGUF conversion with merged LORA Adapter seems to lose training data randomly

I'm running Unsloth to fine tune LORA the Instruct model on llama3-8b .

1: I merge the model with the LORA adapter into safetensors
2: Running inference in python both with the merged model directly or the unsloth loaded model with the adapter on top of it produces correct outputs as per the fine tune

Bug:
GGUF conversion of the merged model does not produce the same output. The GGUF has lost some of its fine tune data, while still maintaining most of it.

I can ask it who it is, who created it etc. And it responds Llama and Meta as usual, but it incorporates the fine tuned speech style and humor into the response. This is not the case for my fine tuned model.

1: I tried merging the LORA adapter with the original GGUF (non-fine tuned) using llama.cpp, the same results.
2: I tried running the server on the original GGUF (non-fine tuned) usling llama.cpp server and the adapter loaded into the server terminal command - same results.

It seemes that GGUF conversion is losing fine tuned data randomly during conversion.

If this is the case, all GGUF converts of the fine tuned models are basically out the window. And the question is how much the non-fine tuned models are affected by this.

I've tried F16, Q8, same issues.

This is not a quantization issue as I get the exact same results running FP16 as well as 4-bit in python running HF loader or Unsloth, both works fine as mentioned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama3 GGUF conversion with merged LORA Adapter seems to lose training data randomly #7062

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Llama3 GGUF conversion with merged LORA Adapter seems to lose training data randomly #7062

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions