Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Fix GPT-OSS initialization error#1007

Merged
bryce13950 merged 7 commits intodev-3.xTransformerLensOrg/TransformerLens:dev-3.xfrom
add_support_for_gpt_ossTransformerLensOrg/TransformerLens:add_support_for_gpt_ossCopy head branch name to clipboard
Aug 16, 2025
Merged

Fix GPT-OSS initialization error#1007
bryce13950 merged 7 commits intodev-3.xTransformerLensOrg/TransformerLens:dev-3.xfrom
add_support_for_gpt_ossTransformerLensOrg/TransformerLens:add_support_for_gpt_ossCopy head branch name to clipboard

Conversation

@degenfabian
Copy link
Copy Markdown
Collaborator

Description

This PR fixes an error that occurred during the initialization of the GPT-OSS model due to trying to load in nn.Parameters as LinearBridges.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

@bryce13950 bryce13950 merged commit 4ece3c4 into dev-3.x Aug 16, 2025
28 checks passed
@bryce13950 bryce13950 deleted the add_support_for_gpt_oss branch August 16, 2025 09:28
degenfabian pushed a commit that referenced this pull request Sep 6, 2025
Fix GPT-OSS initialization error (#1007)

* Add support for GPT-OSS

* Add conversion method to config in gpt-oss architecture adapter

* Fix missing comma

* fixed doc string issues

* fix missing import

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

added setters and hook utils to bridge (#1009)

* added setters and hook utils to bridge

* ran format

* added better type checking

* clenaed up a bit

* cleaned up comments

* removed extra function

* added setter test

* added typing

---------

Co-authored-by: degenfabian <fabian.degen@tuta.com>

updated model name

updated property access (#1026)

* updated property access

* removed extra function

feat: Bridge.boot should allow using alias model names, but show a deprecation warning (#1028)

* Automatically replace aliased model name and show deprecation warning

* add test for aliased model name and deprecation

Move QKV separation into bridge that wraps QKV matrix (#1027)

* Move QKV separation to bridge that directly wraps QKV matrix

* Fix typing issues

* Fix hook collection issues

* Ensuring standardized hook shape

* Fix syntax error

* Run CI again

* adjust test to reflect new hook names in qkv bridge

* simplify getattr in base component

* Add parameter for conversion rule of hook_in and hook_out in qkvbridge

* moved hook point wrapper, and added more test coverage

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

removed unnecessary import (#1030)

Attn pattern shape (#1029)

* Move QKV separation to bridge that directly wraps QKV matrix

* Fix typing issues

* Fix hook collection issues

* Ensuring standardized hook shape

* Fix syntax error

* Run CI again

* adjust test to reflect new hook names in qkv bridge

* simplify getattr in base component

* Add parameter for conversion rule of hook_in and hook_out in qkvbridge

* moved hook point wrapper, and added more test coverage

* matched attn pattern shape to hooked transformer

* revised hook pattern application

* updated outdated test

---------

Co-authored-by: degenfabian <fabian.degen@tuta.com>
Co-authored-by: Fabian Degen <106864199+degenfabian@users.noreply.github.com>

added cache layer for hook collection (#1032)

* added cache layer for hook collection

* added hook registry

* merged setattr

* fixed type issue

* made sure aliases were used during hook registration from generalized components

* resolved aliased hooks proplery

* resolved remaining hook alias issues

Bridge unit test compatibility coverage (#1031)

* added test coverage for ensuring compatibility

* ran format

* fixed unit tests

* resolved type issue

* added init files

* added init file

* removed broken test

* reverted type change

* removed attention mask test

* ran format

* fixed test

* removed failing test

* ran format

updated loading in interactive neuroscope demo to use transformer bridge (#1017)

* updated loading in interactive neuroscope demo to use transformer bridge

* update cells

* cleared cell output

* removed extra file

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

map hook_pos_embed to rotary_emb, allow hook_aliases to be a list (#1034)

* map hook_pos_embed to rotary_emb, allow hook_aliases to be a list

* ran format

* removed neuroscope

* added full coverage for hook aliases change

* updated cache to accept list

---------

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

created new base config class (#1042)

* created new base config class

* cleaned up imports

* reorganized config

* setup transformer bridge config

* fixed docstring issue

* fixed typing issues

* ran format

* fixed docstring

* fixed import issues

* ran format

* fixed type checking again

* fixed import again

* fixed doc string

* removed import

* seperated devices

* cleaned up utils a bit

* ran format

* fixed import

* fixed imports

* ran format

* fixed typing

* updated typing

* updated name

* changed to python 3.12

* cleaned up comments

* removed extra functions

made sure to check for nested hooks (#1035)

* made sure to check for nested hooks

* removed extra check

* Skip original_model in hook scanning

* Remove extra traversal through general components submodules

* Remove adding aliases to hook registry

* Fix typing error

* Fix typing error

* Remove constant copying of hook dictionary to save memory

---------

Co-authored-by: degenfabian <fabian.degen@tuta.com>

Fix warning for alias when compatibility mode is turned off (#1041)

Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

Split weights instead of logits for models with joint QKV activation

Adjust tests accordingly

Set split_qkv_matrix function inside init

Remove debugging print statements

Add support for layer norm folding and value bias folding
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.