head_dim expansion managed at get_model() level#476
head_dim expansion managed at get_model() level#476rzbhatti merged 8 commits intomainfoundation-model-stack/foundation-model-stack:mainfrom override_head_dimfoundation-model-stack/foundation-model-stack:override_head_dimCopy head branch name to clipboard
Conversation
…ake head_dim as kwarg Signed-off-by: Rashed Z. Bhatti, PhD <rzbhatti@us.ibm.com>
Signed-off-by: Rashed Z. Bhatti, PhD <rzbhatti@us.ibm.com>
…` to `serialization.py` Signed-off-by: Rashed Z. Bhatti, PhD <rzbhatti@us.ibm.com>
override_hf_pretrained_config to allow get_model() take head_dim as kwarg…step Signed-off-by: Rashed Z. Bhatti, PhD <rzbhatti@us.ibm.com>
…step Signed-off-by: Rashed Z. Bhatti, PhD <rzbhatti@us.ibm.com>
fms/utils/serialization.py
Outdated
| if "attn.in_proj.query" in layer: | ||
| expansion_factor = ( | ||
| model_config.head_dim | ||
| * model_config.nheads | ||
| // input_sd[layer].size(0) | ||
| ) | ||
| break |
There was a problem hiding this comment.
we cannot assume that this will happen first before key or value. It might, but it's not guaranteed
There was a problem hiding this comment.
you might have to pick layer_dim first, and then have a second dictionary with whether you need to multiply by model_config.nheads or model_config.kvheads
There was a problem hiding this comment.
I have updated the expansion factor calculation based on all QKV and Dense. When you get a chance, please take a look at it to resolve this conversation.
Signed-off-by: Rashed Z. Bhatti, PhD <rzbhatti@us.ibm.com>
Signed-off-by: Rashed Z. Bhatti, PhD <rzbhatti@us.ibm.com>
| # When emb_dim // nheads < head_dim, expand QKV and Dense Weights | ||
| def _weight_expansion_for_mismatched_head_dim( | ||
| input_sd: Mapping[str, Any], model_config | ||
| ) -> Mapping[str, Any]: |
There was a problem hiding this comment.
we may want to assert here that head_dim exists in the config. I don't believe all of the models have a head_dim.
There was a problem hiding this comment.
Very good point, I can add an assertion.
The application that would register this adapter extenstion must make sure that the head_dim is either part of the model config, or passed as kwarg override.
Signed-off-by: Rashed Z. Bhatti, PhD <rzbhatti@us.ibm.com>
This PR does the following:
Moves the weights expansion adapter function
_weight_expansion_for_mismatched_head_dimtoserialization.py, so that it can be registered with other models like llama 1b, gpt_oss, etc too.Removes the weights expansion adapter registration from
granite.py. This allows the application layer (at get_mode() level) to decide when the weights expansion should be done. e.g.override_hf_pretrained_configto theget_mode(), which allows overriding model config parameters, likehead_dim=128as a kwarg, whenarchitecture = hf_pretrained. e.g. ininference.py:This is how inference is run from the command line: