One-Shot-CFT

This repo contains the code for our EMNLP25 paper Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem.

One-Shot Critique Fine-Tuning (CFT) is a simple, robust, and compute-efficient training paradigm for unleashing the reasoning capabilities of pretrained LLMs in both mathematical and logical domains. By leveraging critiques on just one problem, One-Shot CFT enables models like Qwen and LLaMA to match or even outperform reinforcement learning, while using 20× less compute.

Highlights

Unleashes Reasoning with One Example: One-Shot CFT uses critiques of diverse model-generated solutions to a single problem to significantly boost performance across math and logic tasks. For example, with just 5 GPU hours of training on Qwen2.5-Math-7B, One-Shot CFT achieves an average improvement of +15% on six math benchmarks and +16% on three logic reasoning benchmarks.
Outperforms RLVR and Full SFT with 20× Less Compute: One-Shot CFT outperforms both one-shot Reinforcement Learning with Verifiable Rewards (RLVR) and full-dataset supervised fine-tuning, while requiring only 5 GPU hours on a 7B model—offering a much more efficient and stable training alternative.
Robust Across Seeds and Model Scales: One-Shot CFT remains effective across different seed problem choices and model sizes—from 1.5B to 14B parameters—demonstrating strong generalization and scalability.

Getting Started

Installation

cd tools/
bash setup_env.sh

Preparing Datasets

bash prepare_data.sh

Training

Train on Mathematical Reasoning

cd ../train/
bash train_on_math_reasoning.sh

We randomly select 500 math problems (excluding MATH-500) for validation. To validate after training:

cd train/Validation
bash start_validate.sh

This generates validation_summary.txt containing MATH-Validation scores per checkpoint. Select the checkpoint with the highest score as your final model.

Train on Logic Reasoning

cd ../train/
bash train_on_logic_reasoning.sh

We do not use a separate validation set for logic tasks. Based on our experiments, checkpoints between ckpt-30 and ckpt-40 generally yield the best performance.

Evaluation

Edit the following scripts with your trained model path and output directory:

eval/eval_on_math_reasoning.sh
eval/eval_on_logic_reasoning.sh

Then run:

cd eval/
bash eval_on_math_reasoning.sh
bash eval_on_logic_reasoning.sh

Our evaluation code is based on Qwen2.5-Math and BBEH.

Create Your Own Critique Data

You can create new critique data using the prompt templates in "prompts/" for:

Candidate solution generation
Teacher critique generation

Citation

Cite our paper as

@article{wang2025unleashing,
  title={Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem},
  author={Wang, Yubo and Nie, Ping and Zou, Kai and Wu, Lijun and Chen, Wenhu},
  journal={arXiv preprint arXiv:2506.03295},
  year={2025}
}

Name	Name	Last commit message	Last commit date
Latest commit History 23 Commits 23 Commits
eval	eval
ms-swift	ms-swift
prompts	prompts
tools	tools
train	train
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

One-Shot-CFT

Highlights

Getting Started

Installation

Preparing Datasets

Training

Evaluation

Create Your Own Critique Data

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

Folders and files

Latest commit

History

Repository files navigation

One-Shot-CFT

Highlights

Getting Started

Installation

Preparing Datasets

Training

Evaluation

Create Your Own Critique Data

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages