GitHub - microsoft/augmented-interpretable-models: Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.

Augmenting Interpretable Models with LLMs during Training

This repo contains code to reproduce the experiments in the Aug-imodels paper (Nature Communications, 2023). For a simple scikit-learn interface to use Aug-imodels, use the imodelsX library. Below is a quickstart example.

Installation: pip install imodelsx

from imodelsx import AugLinearClassifier, AugTreeClassifier, AugLinearRegressor, AugTreeRegressor
import datasets
import numpy as np

# set up data
dset = datasets.load_dataset('rotten_tomatoes')['train']
dset = dset.select(np.random.choice(len(dset), size=300, replace=False))
dset_val = datasets.load_dataset('rotten_tomatoes')['validation']
dset_val = dset_val.select(np.random.choice(len(dset_val), size=300, replace=False))

# fit model
m = AugLinearClassifier(
    checkpoint='textattack/distilbert-base-uncased-rotten-tomatoes',
    ngrams=2, # use bigrams
)
m.fit(dset['text'], dset['label'])

# predict
preds = m.predict(dset_val['text'])
print('acc_val', np.mean(preds == dset_val['label']))

# interpret
print('Total ngram coefficients: ', len(m.coefs_dict_))
print('Most positive ngrams')
for k, v in sorted(m.coefs_dict_.items(), key=lambda item: item[1], reverse=True)[:8]:
    print('\t', k, round(v, 2))
print('Most negative ngrams')
for k, v in sorted(m.coefs_dict_.items(), key=lambda item: item[1])[:8]:
    print('\t', k, round(v, 2))

Reference:

@article{singh2023augmenting,
  title={Augmenting interpretable models with large language models during training},
  author={Singh, Chandan and Askari, Armin and Caruana, Rich and Gao, Jianfeng},
  journal={Nature Communications},
  volume={14},
  number={1},
  pages={7913},
  year={2023},
  publisher={Nature Publishing Group UK London}
}

Name	Name	Last commit message	Last commit date
Latest commit History 198 Commits 198 Commits
augdistill	augdistill
auggam	auggam
auglm	auglm
augtree	augtree
docs	docs
.gitignore	.gitignore
LICENSE	LICENSE
SECURITY.md	SECURITY.md
readme.md	readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases 1

Uh oh!

Contributors

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Contributors

Uh oh!

Languages