[Bug]: Incomplete words when using llama.cpp / llama-server as backend

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (Language Policy).
Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
Please do not modify this template :) and fill in all the required fields.

RAGFlow workspace code commit ID

N.A.

RAGFlow image version

v0.20.5 full

Other environment information

Running on an Unraid server (Linux based) with two GPUS (an RTX 3090 and RTX 3090 TI) using the Docker images.

Actual behavior

I connected my llama-server (proxied using llama-swap) running GPT-OSS-20B to RagFlow using the OpenAI compatible API connection. When chatting with the model, I get thinking tokens which have in complete words although the final output seems to be fine.

See this image:

However, when I use the same model on Ollama, the output is more coherent:

I am not sure if there is something wrong with my settings. But when I use both Ollama and llama-server as backends in my OpenWeb UI chat interface, I have no problems and the outputs are generally consistent.

Expected behavior

I expect the behaviour to be more like the output of Ollama running the same model - GPT-OSS-20B.

Steps to reproduce

1. Add GPT-OSS-20B as an OpenAI Compatible API model.
2. Use GTP-OSS-20B (served via the OpenAI Compatible API option) as the chat model.
3. Chat and see the quality of the reasoning tokens.

Additional information

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Incomplete words when using llama.cpp / llama-server as backend #10530

Self Checks

RAGFlow workspace code commit ID

RAGFlow image version

Other environment information

Actual behavior

Expected behavior

Steps to reproduce

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

[Bug]: Incomplete words when using llama.cpp / llama-server as backend #10530

Description

Self Checks

RAGFlow workspace code commit ID

RAGFlow image version

Other environment information

Actual behavior

Expected behavior

Steps to reproduce

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions