Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

XiaoMi/EfficientFT

Open more actions menu

Repository files navigation

RUC & Xiaomi: Efficient Fine-Tuning 🙌🎉

📰 News

  • 2025-4-29: Our paper has been accepted by IJCAI-25. Congratulations!
  • 2025-3-31: Delivery of a Prototype System for Parameter-Efficient and Gradient Projection Methods: A Comprehensive Benchmark Against 10+ State-of-the-Art Efficient Fine-Tuning Approaches.
  • 2024-12-30: Theoretical Insights into Fine-Tuning Attention Mechanism.

🎯 Introduction and Target

(1) Our insights (paper, in progress):

According to the traditional statistical learning viewpoint, performance can be defined by the sum of optimization error and generalization error. In (generalization, storage-friendly), we give Theorem 1 (Information-theoretic genralization bounds), showing that with the same $r$ value, fine-tuning $\mathbf{W}_q,\mathbf{W}_v$ consistently achieves results comparable to or even surpassing those of fine-tuning $\mathbf{W}_q,\mathbf{W}_k,\mathbf{W}_v$. This reduces the number of parameters for the same $r$, while improving generalization bounds and potentially providing memory benefits. In (optimization, time-friendly), we discuss the learning dynamics in fine-tuning attention mechanism, and we illustrate Theorem 2 that the feature learning of attention mechanism is efficient when the learning rate for $\mathbf{W}_v$ should be generally much larger than that of $\mathbf{W}_q,\mathbf{W}_k$ in fine-tuning. Building on our experimental and theoretical insights, one can develop new algorithms to improve the effectiveness (e.g., storage, and time) of fine-tuning.

theorem1

theorem2

(2) Target:

$\text{\textcolor{blue}{This project conducts comprehensive benchmarking of the following 10+ efficient fine-tuning methods.}}$

Notably, our proposed approach maintains orthogonal compatibility and can be synergistically combined with any of these methods.

📖 10+ efficient fine-tuning methods

⚙️ Install

  1. To install the experiment, please install the pip file.
pip install -r requirements.txt
  1. (Optional) For SIFT&Galore
git clone git@github.com:song-wx/SIFT.git
cd SIFT
pip install .
pip install galore-torch

🚀 Quick Start

Get Dataset

data_download.py

Usage

  1. ensure execute permissions

    chmod +x xxx.sh  #xxx->your file name
    
  2. Full-Finetuning, LoRA, AdaLoRA, DoRa, PiSSA, rsLoRA, OLoRA, EVA, SIFT

    # choose the target method_name and modules.
    EfficientFT/sh/roberta-base-peft.sh 
    EfficientFT/sh/llama-peft.sh
    
  3. Galore.

    EfficientFT/sh/roberta_galore.sh
    

😊Some Results

res1

📝 Citation

@article{yao2024theoretical,
  title={Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization},
  author={Yao, Xinhao and Qian, Hongjin and Hu, Xiaolin and Xu, Gengze and Liu, Yong and Liu, Wei and Luan, Jian and Wang, Bin},
  journal={arXiv preprint arXiv:2410.02247},
  year={2024}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
Morty Proxy This is a proxified and sanitized view of the page, visit original site.