Releases · TransformerLensOrg/TransformerLens

What's Changed

Hot fix for issues with Gemma3 multimodal interp.

Release v3.2.1 by @jlarson4 in #1295

Full Changelog: v3.2.0...v3.2.1

What's Changed

Fix generate() when tokenizer is unset and add regression tests by @DityaChawla in #1267
Updated boot_transformers to use local hf_config, if provided by @jlarson4 in #1279
Prevent weight processing split between devices by @jlarson4 in #1281
Fix: IOIDataset generates diverse samples by @DivijChawla in #1282
Fix: preserve tokenizer.padding_side when reloading with add_bos_token by @DivijChawla in #1283
MPS CI Support by @huseyincavusbi in #1278
Add n_params_total property for total parameter count by @DivijChawla in #1284
Model Table Cleanup by @jlarson4 in #1285
Demos/bridge lm eval demo by @jlarson4 in #1286
Tokenize and Concatenate additional datasets by @jlarson4 in #1287
Resolution to Issues #477 and #264 by @jlarson4 in #1288
mT5 Support by @jlarson4 in #1289
SimpleStories Model verification by @jlarson4 in #1292
Verification System Improvements by @jlarson4 in #1293
Release v3.2.0 by @jlarson4 in #1294

New Contributors

@DityaChawla made their first contribution in #1267
@DivijChawla made their first contribution in #1282

Full Changelog: v3.1.0...v3.2.0

What's Changed

3.0 CI Bugs by @jlarson4 in #1261
fix: use cfg.dtype instead of torch.get_default_dtype for KV cache init by @davidcyze in #1260
Fix type of HookedTransformerConfig.device by @brendanlong in #1230
Fix tests broken by a local GPU by @brendanlong in #1219
fix: handle LayerNorm folding correctly in load_and_process_state_dict by @VedantMadane in #1215
Fix HookedTransformerConfig rotary_base types by @brendanlong in #1231
Fixed Masking in HookedTransformer.generate by @tuomaso in #999
Add hooked transformer generate stream by @anthonyduong9 in #908
Add py.typed for type hints by @UFO-101 in #760
Created Baichuan Architecture adapter by @jlarson4 in #1262
Make FactoredMatrix compatible with tensor-like arguments by @JasonGross in #599
NanoGPT Conversation did not handle case when there were no biases in model by @dashstander in #629
[BUG] Batched Generation solution extended to run_with_cache and run_with_hooks for TransformerBridge by @jlarson4 in #1265
Added 1D tensor handling to TransformerBridge by @jlarson4 in #1266
Added n_ctx override to TransformerBridge by @jlarson4 in #1269
Feature/generate stream on bridge by @jlarson4 in #1268
Added warnings for users attempting to use MPS with Torch 2.8 by @jlarson4 in #1271
Improved Tokenize & Concatenate by @jlarson4 in #1273
Multi-Device Processing on Bridge by @jlarson4 in #1270
Adding Architecture Adapter Creation Guide to Docs by @jlarson4 in #1274
Fixed Quantization bug in TransformerLens 3.0 by @jlarson4 in #1276
TransformerLens 3.1.0 by @jlarson4 in #1277

New Contributors

@davidcyze made their first contribution in #1260
@VedantMadane made their first contribution in #1215
@tuomaso made their first contribution in #999
@dashstander made their first contribution in #629

Full Changelog: v3.0.0...v3.1.0

What's Changed

Migrating to a new way to implement models via the TransformerBridge system. Increased model support from ~200 models to ~9,000 models

Refactor the utilities file into utilities folder by @starship006 in #628
Raise exception when BERT is loaded with HookedTransformer instead of… by @degenfabian in #795
Circular dependency resolution by @bryce13950 in #803
fixed corner param by @bryce13950 in #817
bumped python min version by @bryce13950 in #802
Updates torch to use the most recent version by @bryce13950 in #822
updated python requirements by @bryce13950 in #821
Recent releases by @bryce13950 in #841
updated mypy limit by @bryce13950 in #880
Activation utils cleanup by @bryce13950 in #879
Restore consistency of hook_normalized between LayerNorm and RMSNorm by @degenfabian in #770
Fix that padding_side always defaults to "right" when no value is explicitly passed by @degenfabian in #814
Unified conversions by @bryce13950 in #881
Flatten state dictionary for proper weight loading by @degenfabian in #860
enabled actions on action pr by @bryce13950 in #882
Add weight conversion for Phi model by @degenfabian in #863
Add weight conversion for T5 models by @degenfabian in #859
Visualize weight conversions by @degenfabian in #852
Fixed test for ensuring weight conversions are provided by @bryce13950 in #883
Drop python 3.9 by @bryce13950 in #885
Conversion improved test coverage by @bryce13950 in #886
Component test coverage by @bryce13950 in #890
Bug new loading by @bryce13950 in #891
Weight conversion llama by @bryce13950 in #892
Refactor supported models module by @bryce13950 in #893
Bug neox by @bryce13950 in #895
Feature model adapter by @bryce13950 in #928
added test for making sure formatting works well by @bryce13950 in #932
Refactor final issues by @bryce13950 in #933
restored tokenizer content by @bryce13950 in #935
Refactor weight conversion by @bryce13950 in #931
added python 3.13 to CI by @bryce13950 in #843
upstream fixes from dev by @bryce13950 in #941
Flexible component mapping by @bryce13950 in #938
Move flatten dictionary to architecture_conversion by @degenfabian in #936
made new transformer bridge extend nn module properly by @bryce13950 in #955
brought in remaining hooked transformer functions by @bryce13950 in #954
Setup tokenizer in boot function by @degenfabian in #959
Bridged Robust Model Structure by @bryce13950 in #960
Remove transformers dependency from bridge tokenization by @degenfabian in #963
Dynamically add boot function to bridge by @degenfabian in #964
Pre release version publishing by @bryce13950 in #973
Setup deprecated hook aliases and got the majority of the main demo running properly by @bryce13950 in #976
Linear test coverage by @bryce13950 in #977
Create Bridge for every Gemma 3 module by @degenfabian in #966
Add Bridges for every module in GPT2 by @degenfabian in #967
Cache hook aliases & stop at layer by @bryce13950 in #978
Create Bridges for every module in Bloom models by @degenfabian in #970
Create Bridges for every module in Gemma 2 by @degenfabian in #971
Create bridges for every module in Gemma 1 by @degenfabian in #972
Create bridges for every module in Mistral by @degenfabian in #979
Remove that output_attention flag defaults to true in boot function by @degenfabian in #982
Create bridge for every module in GPT-J by @degenfabian in #974
Create bridge for every module in Llama by @degenfabian in #975
Unified aliases by @bryce13950 in #991
fixed hook alias positions by @bryce13950 in #992
Create bridge for every module in Mixtral by @degenfabian in #984
removed numpy ceiling by @bryce13950 in #994
Ensure hook and property backwards compatibility with HookedTransformer by @degenfabian in #990
Create bridge for every module in neox by @degenfabian in #995
Create bridges for every module in neo by @degenfabian in #987
Weight conversion renaming by @bryce13950 in #996
Attention shape normalization by @bryce13950 in #997
Joint hook handling by @bryce13950 in #1001
Add compatibility_mode feature by @degenfabian in #998
Add support for GPT-OSS by @degenfabian in #1004
Fix GPT-OSS initialization error by @degenfabian in #1007
added setters and hook utils to bridge by @bryce13950 in #1009
updated property access by @bryce13950 in #1026
feat: Bridge.boot should allow using alias model names, but show a deprecation warning by @hijohnnylin in #1028
Move QKV separation into bridge that wraps QKV matrix by @degenfabian in #1027
removed unnecessary import by @bryce13950 in #1030
Attn pattern shape by @bryce13950 in #1029
added cache layer for hook collection by @bryce13950 in #1032
Bridge unit test compatibility coverage by @bryce13950 in #1031
updated loading in interactive neuroscope demo to use transformer bridge by @degenfabian in #1017
map hook_pos_embed to rotary_emb, allow hook_aliases to be a list by @hijohnnylin in #1034
created new base config class by @bryce13950 in #1042
made sure to check for nested hooks by @bryce13950 in #1035
Fix warning for aliases when compatibility mode is turned off by @degenfabian in #1041
Feature kv cache by @bryce13950 in https...

What's Changed

Isolate demo dependencies and pin orjson for CVE-2025-67221 mitigation by @evcyen in #1173
feat: Add LIT integration for interactive model analysis (#121) by @HetanshWaghela in #1163
fix: set n_ctx=512 for TinyStories models by @puranikyashaswin in #1162
Fix/tokenize and concatenate invalid token by @evcyen in #1179
Remove spurious warning for tokenize_and_concatenate by @evcyen in #1177
Add MMLU benchmark evaluation to evals by @CarlG0123 in #1183
Fix/1076 logit lens layer norm by @evcyen in #1180
Updating Interactive Neuroscope, CI to properly install demo by @jlarson4 in #1205
Fix tokenize_and_concatenate splitting tokens across chunk boundaries by @brainsnog in #1201
Fix deprecated IPython magic() calls in demo notebooks (issue #1036) by @brainsnog in #1203
Expose n_ctx override in HookedTransformer.from_pretrained (issue #1006) by @brainsnog in #1204
Added warning flags for usages of MPS by @jlarson4 in #1182
Add GPT-OSS-20B model support by @CarlG0123 in #1195
fixed the logit lens implementation inside ActivationCache.accumulated_resid to match the standard definition in literature and the expected and defined behavior as per the documentation in the docstring and in the docs by @hartigel in #1077
Add Apertus model support with XIeLU activation by @sinievanderben in #1197
Fix attention calculation on mps for torch 2.8.0 by @BrownianNotion in #1068
HuBERT support rollout by @david-wei-01001 in #1111
Pre-release testing by @jlarson4 in #1210
Fix backward hooks Runtime Error by @evcyen in #1175
Release v2.18.0 by @jlarson4 in #1211

New Contributors

@evcyen made their first contribution in #1173
@HetanshWaghela made their first contribution in #1163
@puranikyashaswin made their first contribution in #1162
@CarlG0123 made their first contribution in #1183
@brainsnog made their first contribution in #1201
@hartigel made their first contribution in #1077
@sinievanderben made their first contribution in #1197
@BrownianNotion made their first contribution in #1068
@david-wei-01001 made their first contribution in #1111

Full Changelog: v2.17.0...v2.18.0

What's Changed

Support callable filters in TransformerBridge.add_hook() by @jlarson4 in #1186
Update Patching Hook to avoid causing conflicts by @jlarson4 in #1187
Prevent Stale Joint QKV values from being incorporated into weight folding after Layer Norm application by @jlarson4 in #1188
Updated to remove hardcoded .cpu() processing by @jlarson4 in #1189
Return true initial batch size information by @jlarson4 in #1190
hook_result & Hook Aliases issues by @jlarson4 in #1191
updated loading in exploratory analysis demo to use transformer bridge by @degenfabian in #1014
updated loading in patchscopes generation demo to use transformer bridge by @degenfabian in #1021
Additional Exploratory analysis Demo fixes by @jlarson4 in #1192
update loading in bert demo to use transformer bridge by @degenfabian in #1015
updating loading in qwen demo to use transformer bridge by @degenfabian in #1025
updated loading in activation patching demo to use transformer bridge by @degenfabian in #1011
updating loading in t5 demo to use transformer bridge by @degenfabian in #1022
updated loading in attribution patching demo to use transformer bridge by @degenfabian in #1013
v3.0.0b3 – Notebook Demo Update & Bug Fixes by @jlarson4 in #1196
Verifying Additional Models by @jlarson4 in #1199
Feature/multimodal architecture adapters by @jlarson4 in #1200
Fix boolean 4D attention-mask handling in joint-QKV bridge attention reconstruction by @speediedan in #1198
Feature/llava next and onevision variants by @jlarson4 in #1202

Full Changelog: v3.0.0b2...v3.0.0b3

What's Changed

Release 2.16 by @bryce13950 in #945
Release 2.16.1 by @bryce13950 in #952
Update README.md by @jmole in #957
improve model properties table in docs by @mivanit in #769
Release v2.16.2 by @bryce13950 in #958
Add Gemma 3 and MedGemma model support by @huseyincavusbi in #1149
Add timestamp for 2.0 announcement [docs] by @MattAlp in #983
Add support for Qwen/Qwen3-0.6B-Base model by @mtaran in #1075
Repairing tests that were failing due to recent contributions by @jlarson4 in #1157
Fix 934 by @kapedalex in #1155
Fix 1130 and 1102 by @kapedalex in #1154
Fix key and value heads patching for models with different n_heads from n_key_value_heads by @nikolaystanishev in #981
updating the compatibility notebook by @jlarson4 in #1158
New Release – v2.17.0 by @jlarson4 in #1159
Integrate v2.17.0 phase1 by @jlarson4 in #1166
transformers v5 support by @jlarson4 in #1167
Improve TransformerBridge optimizer compatibility via dual PyTorch/TransformerLens parameter access API by @speediedan in #1143
Add HuggingFace ModelOutput support to TransformerLens generation API by @speediedan in #1144
Testing R1 Distills to confirm functional in TransformerLens by @jlarson4 in #1168
StableLM Architecture Adapter by @jlarson4 in #1171
Complete type checking for OLMo support (builds on #816) by @taziksh in #1081
Olmo3 support by @etomoscow in #1170
Setup and tested OLMo architecture adapters by @jlarson4 in #1174
Isolate demo dependencies and pin orjson for CVE-2025-67221 mitigation by @evcyen in #1173
feat: Add LIT integration for interactive model analysis (#121) by @HetanshWaghela in #1163
OpenELM Architecture Adapter by @jlarson4 in #1172
fix: set n_ctx=512 for TinyStories models by @puranikyashaswin in #1162
Architecture Benchmarks – Review & Extension by @jlarson4 in #1176
created initial model registry tool by @bryce13950 in #1151
Initial Verification Run by @jlarson4 in #1181
Additional Verification by @jlarson4 in #1184
Prepping for v3.0.0b2 by @jlarson4 in #1185

New Contributors

@jmole made their first contribution in #957
@huseyincavusbi made their first contribution in #1149
@MattAlp made their first contribution in #983
@mtaran made their first contribution in #1075
@kapedalex made their first contribution in #1155
@nikolaystanishev made their first contribution in #981
@taziksh made their first contribution in #1081
@etomoscow made their first contribution in #1170
@HetanshWaghela made their first contribution in #1163
@puranikyashaswin made their first contribution in #1162

Full Changelog: v3.0.0b1...v3.0.0b2

We've got an exciting new release that includes several new models! Gemma 3, MedGemma, and Qwen3-0.6B-Base are now included in options for models. In addition to these new models, a handful of bugs and other small non-breaking changes were made.

What's Changed

Update README.md by @jmole in #957
improve model properties table in docs by @mivanit in #769
Release v2.16.2 by @bryce13950 in #958
Add Gemma 3 and MedGemma model support by @huseyincavusbi in #1149
Add timestamp for 2.0 announcement [docs] by @MattAlp in #983
Add support for Qwen/Qwen3-0.6B-Base model by @mtaran in #1075
Repairing tests that were failing due to recent contributions by @jlarson4 in #1157
Fix 934 by @kapedalex in #1155
Fix 1130 and 1102 by @kapedalex in #1154
Fix key and value heads patching for models with different n_heads from n_key_value_heads by @nikolaystanishev in #981
updating the compatibility notebook by @jlarson4 in #1158
New Release – v2.17.0 by @jlarson4 in #1159

New Contributors

@jmole made their first contribution in #957
@huseyincavusbi made their first contribution in #1149
@MattAlp made their first contribution in #983
@mtaran made their first contribution in #1075
@kapedalex made their first contribution in #1155
@nikolaystanishev made their first contribution in #981

Full Changelog: v2.16.1...v2.17.0

What's Changed

registered hook correctly by @bryce13950 in #1051
optimized QKV bridge a bit by @bryce13950 in #1046
Add support for layer norm and bias folding by @degenfabian in #1044
updated get params to fill zeroes when needed by @bryce13950 in #1049
Match device selection of TransformerBridge to HookedTransformer by @degenfabian in #1047
Improve TransformerBridge hook compatibility with HookedTransformers by @degenfabian in #1054
Enable setting cached hooks by @degenfabian in #1048
Create bridge for every module in Phi 1 by @degenfabian in #1055
Rename Neo bridges to be in line with new naming scheme by @degenfabian in #1056
Rename Mixtral bridges to be in line with new naming scheme by @degenfabian in #1057
added test and made sure backwards hooks are working by @bryce13950 in #1058
Remove second layer norm from phi component mapping by @degenfabian in #1059
Create bridge for every module in pythia by @degenfabian in #1060
Create bridge for every module in Qwen 2 by @degenfabian in #1061
Processing functions by @bryce13950 in #1053
Attempted Processing match by @bryce13950 in #1063
Process restoration by @bryce13950 in #1064
Add missing configuration parameters by @degenfabian in #1065
Properly set up normalization_type and layer_norm_folding attributes in initialized components by @degenfabian in #1066
Process accuracy by @bryce13950 in #1067
Ablation hugging face weights by @bryce13950 in #1070
Ci fixes by @bryce13950 in #1072
Revision extra forwards by @bryce13950 in #1073
Test coverage by @bryce13950 in #1074
Attention hooks full coverage for folding by @bryce13950 in #1078
Ci job splitting by @bryce13950 in #1079
fixed batch dimension by @bryce13950 in #1082
fixed cache hooks by @bryce13950 in #1083
fixed bias displaying by @bryce13950 in #1084
fixed return type none by @bryce13950 in #1085
Create pass through for hooks in compatibility mode by @bryce13950 in #1086
fixed alias hook props by @bryce13950 in #1087
made all hooks show properly by @bryce13950 in #1088
updated loading in main demo to use transformers bridge by @bryce13950 in #1010
switch from poetry to uv by @mivanit in #1037
addded full kv cache by @bryce13950 in #1089
Added full hook coverage for previous keys by @bryce13950 in #988
updated loading in arena content demo to use transformer bridge by @degenfabian in #1012
regeneerated with new hooks by @bryce13950 in #1091
added test coverage for ensuring compatibility by @bryce13950 in #989
Test hook shape coverage by @bryce13950 in #1000
Hook compatibility by @bryce13950 in #1092
Final compatibility coverage by @bryce13950 in #1090
tested llama 3.1 by @bryce13950 in #1096
fixed stop at layer by @bryce13950 in #1100
Duplicate hook fix by @bryce13950 in #1098
Gemma2 fix by @bryce13950 in #1099
Fix gpt oss by @bryce13950 in #1101
created benchmark suite by @bryce13950 in #1104
finalized t5 adapter by @bryce13950 in #1095
Model improvements by @bryce13950 in #1105
decoupling weight processing completely from hooked transformer by @bryce13950 in #1103
removed invalid comparison by @bryce13950 in #1107
Revert "decoupling weight processing completely from hooked transformer" by @bryce13950 in #1108
finalized bench mark logic by @bryce13950 in #1109
Fix opt by @bryce13950 in #1106
Benchmarking and compatibility only by @bryce13950 in #1112
Decouple weight procesing by @bryce13950 in #1114
optimized benchmarks a bit by @bryce13950 in #1115
fixed tensor storing by @bryce13950 in #1116
added skip condition by @bryce13950 in #1117
Gpt2 weight match by @bryce13950 in #1118
Gemma3 match by @bryce13950 in #1119
setup real aliases by @bryce13950 in #1121
Gpt oss match by @bryce13950 in #1120
trimmed memory a bit by @bryce13950 in #1122
created benchmark suite for unsupported models in hooked transformer by @bryce13950 in #1123
fixed remaining gemma 3 benchmarks by @bryce13950 in #1124
Gated MLP bridge by @bryce13950 in #1110
setup brenchmark suite, and trimmed out extra tests by @bryce13950 in #1125
Attention cleanup by @bryce13950 in #1126
Benchmarking cross comparison revision by @bryce13950 in #1127
Oss match by @bryce13950 in #1128
Cleanup by @bryce13950 in #1129
Weight processing generalization by @bryce13950 in #1131
Processing cleanup by @bryce13950 in #1132
Final cleanup by @bryce13950 in #1135
Supported Architectures – code artifact cleanup by @jlarson4 in #1136
Qwen3 adapter by @bryce13950 in #1138
Model Bridge – Source Keys Cleanup by @jlarson4 in #1137
cleaned up a lot of things by @bryce13950 in #1113
Transformer bridge layer norm folding by @bryce13950 in #1071
Updated release workflow by @bryce13950 in #1146

New Contributors

@jlarson4 made their first contribution in #1136

Full Changelog: v3.0.0a8...v3.0.0b1

Another update that rounds out the API for our new module

What's Changed

created new base config class by @bryce13950 in #1042
made sure to check for nested hooks by @bryce13950 in #1035
Fix warning for aliases when compatibility mode is turned off by @degenfabian in #1041
Feature kv cache by @bryce13950 in #1045
Split weights instead of logits for models with joint QKV matrix by @degenfabian in #1043

Full Changelog: v3.0.0a7...v3.0.0a8

Search code, repositories, users, issues, pull requests...

Releases: TransformerLensOrg/TransformerLens

v3.2.1

What's Changed

Contributors

Uh oh!

v3.2.0

What's Changed

New Contributors

Contributors

Uh oh!

v3.1.0

What's Changed

New Contributors

Contributors

Uh oh!

TransformerLens 3.0

What's Changed

Contributors

Uh oh!

v2.18.0

What's Changed

New Contributors

Contributors

Uh oh!

v3.0.0b3

What's Changed

Contributors

Uh oh!

v3.0.0b2

What's Changed

New Contributors

Contributors

Uh oh!

v2.17.0

What's Changed

New Contributors

Contributors

Uh oh!

v3.0.0b1

What's Changed

New Contributors

Contributors

Uh oh!

v3.0.0a8

What's Changed

Contributors

Uh oh!