-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Open
Labels
🐞 bugSomething isn't working, pull request that fix bug.Something isn't working, pull request that fix bug.
Description
Self Checks
- I have searched for existing issues search for existing issues, including closed ones.
- I confirm that I am using English to submit this report (Language Policy).
- Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- Please do not modify this template :) and fill in all the required fields.
RAGFlow workspace code commit ID
N.A.
RAGFlow image version
v0.20.5 full
Other environment information
Running on an Unraid server (Linux based) with two GPUS (an RTX 3090 and RTX 3090 TI) using the Docker images.
Actual behavior
I connected my llama-server (proxied using llama-swap) running GPT-OSS-20B to RagFlow using the OpenAI compatible API connection. When chatting with the model, I get thinking tokens which have in complete words although the final output seems to be fine.
However, when I use the same model on Ollama, the output is more coherent:

I am not sure if there is something wrong with my settings. But when I use both Ollama and llama-server as backends in my OpenWeb UI chat interface, I have no problems and the outputs are generally consistent.
Expected behavior
I expect the behaviour to be more like the output of Ollama running the same model - GPT-OSS-20B.
Steps to reproduce
1. Add GPT-OSS-20B as an OpenAI Compatible API model.
2. Use GTP-OSS-20B (served via the OpenAI Compatible API option) as the chat model.
3. Chat and see the quality of the reasoning tokens.
Additional information
No response
dosubot
Metadata
Metadata
Assignees
Labels
🐞 bugSomething isn't working, pull request that fix bug.Something isn't working, pull request that fix bug.