fms-extras

This is a repo as part of the foundation-model-stack organization which is used for new features staged to be integrated with foundation-model-stack. This repo is the home for extensions, research and/or in-development work, and fms-based models trained by IBM.

Installation

Local

pip install -e .

Notable Features

MLPSpeculator: a lightweight speculator model that can be used along-side a generative model to speed up inference (currently deployed in IBM TGIS with training in fms-fsdp)
PagedKVCacheManager: an implementation of kv-cache management that provides a user with the proper input to use paged-attention with their own models (currently deployed in IBM TGIS)
PagedLLaMA: a LLaMA implementation that uses paged-attention in Multi-Head Attention. This model is compilable without graph breaks.
speculative generation: a reference implementation of speculative generate using PagedKVCacheManager and MLPSpeculator

Structure and contents of this Repository

This repo follows a similar structure to that of foundation-model-stack

fms_extras/models/ - Pure pytorch implementations of popular model architectures, without requiring any specific common interface beyond nn.Module. Each model configuration is registered with fms.models.register_model() so that instances can be obtained through fms.models.get_model('architecture', 'variant', '/path/to/data'). Each model can also register sources/formats/versions of data to load (e.g. checkpoints provided by meta, HF, or trained from this repo).
fms_extras/models/hf/ - Adapters that compose our native PyTorch FMS model architecture implementations in HF-compatible wrapper interfaces. Each FMS model implements an adapter, and adapted instances are obtained via fms.models.hf.to_hf_api(model)
fms_extras/utils/ - Other operators useful in working with LLMs. These include a speculative_generate() function, PagedKVCacheManager class for easy-to-use kv-cache management with paged attention kernels, etc.
scripts/ - Various scripts for inference (paged generation and speculative generation)
csrc/ - Custom kernels used in fms-extra, currently related to paged-attention

References

Huggingface TGI: https://github.com/huggingface/text-generation-inference
IBM TGIS: https://github.com/IBM/text-generation-inference

Name	Name	Last commit message	Last commit date
Latest commit History 224 Commits 224 Commits
.github/workflows	.github/workflows
csrc/paged_attention	csrc/paged_attention
fms_extras	fms_extras
scripts	scripts
tests	tests
.gitignore	.gitignore
.isort.cfg	.isort.cfg
LICENSE	LICENSE
README.md	README.md
fms-extras-requirements.txt	fms-extras-requirements.txt
pyproject.toml	pyproject.toml
requirements-build.txt	requirements-build.txt
requirements.txt	requirements.txt
setup.py	setup.py
test-requirements.txt	test-requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fms-extras

Installation

Local

Notable Features

Structure and contents of this Repository

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

License

foundation-model-stack/fms-extras

Folders and files

Latest commit

History

Repository files navigation

fms-extras

Installation

Local

Notable Features

Structure and contents of this Repository

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages