Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

shadowbatcode/Enzoria

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enzoria

Enzoria 是这个仓库统一使用的项目名称。仓库主要用于酶相关序列、结构、文本三模态建模实验,以及 ESIBank 数据集上的功能二分类微调。

项目概览

  • 训练入口:train_esi_function.py
  • 核心代码:Enzoria/
  • 数据目录:dataset/
  • 结果输出:result/
  • 论文资料:论文/

当前训练脚本会顺序遍历 4 种数据划分:

  • all_split
  • enzyme_split
  • random_split
  • reaction_split

目录结构

.
|-- train_esi_function.py
|-- dataset/
|   |-- ESIbank/
|   |-- PAIR/
|   `-- PDBbind/
|-- Enzoria/
|   |-- model/
|   |-- demo/
|   |-- scripts/
|   |-- requirements.txt
|   `-- weights/
|-- result/
|-- docs/plans/
`-- 论文/

环境准备

建议使用 Python 3.10 和独立 conda 环境。

conda create -n enzoria python=3.10 -y
conda activate enzoria
pip install -r Enzoria/requirements.txt
pip install pandas numpy scipy matplotlib tqdm openpyxl

如果你使用 GPU 版 FAISS,可以继续安装:

conda install pytorch::faiss-gpu=1.8.0 -y

权重目录

训练脚本默认从下面的目录读取模型权重:

Enzoria/weights/Enzoria_650M/

默认配置会查找:

  • esm2_t33_650M_UR50D/
  • BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext/
  • foldseek_t30_150M/
  • Enzoria_650M.pt

数据准备

训练脚本默认读取:

dataset/ESIbank/processed_splits/

当前仓库中保存了四个划分压缩包,需要先解压到上面的目录:

  • dataset/ESIbank/all_split-20251014T071615Z-1-001.zip
  • dataset/ESIbank/enzyme_split-20251014T071617Z-1-001.zip
  • dataset/ESIbank/random_split-20251014T071619Z-1-001.zip
  • dataset/ESIbank/reaction_split-20251014T071620Z-1-001.zip

解压后的结构应类似:

dataset/ESIbank/processed_splits/
|-- all_split/
|-- enzyme_split/
|-- random_split/
`-- reaction_split/

PowerShell 示例:

New-Item -ItemType Directory -Force -Path dataset/ESIbank/processed_splits
Expand-Archive dataset/ESIbank/all_split-20251014T071615Z-1-001.zip dataset/ESIbank/processed_splits -Force
Expand-Archive dataset/ESIbank/enzyme_split-20251014T071617Z-1-001.zip dataset/ESIbank/processed_splits -Force
Expand-Archive dataset/ESIbank/random_split-20251014T071619Z-1-001.zip dataset/ESIbank/processed_splits -Force
Expand-Archive dataset/ESIbank/reaction_split-20251014T071620Z-1-001.zip dataset/ESIbank/processed_splits -Force

运行训练

python train_esi_function.py

默认配置会使用序列、结构、文本三种输入,并对四种 split 依次完成训练、验证和测试。

结果输出

训练结果保存在:

result/<split_type>/run_<timestamp>/

常见输出包括:

  • logs/training_config.json
  • logs/batch_history.json
  • logs/epoch_history.json
  • logs/training_summary.json
  • checkpoints/best_model.pt
  • plots/metrics_curves.png

About

Enzoria: Three-modal (sequence + structure + text) enzyme function classification on ESIBank

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.