Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Pomilon-Intelligence-Lab/Aetheris

Open more actions menu

Repository files navigation

Aetheris: Hybrid Mamba-MoE Experiment

Status License Python PyTorch FastAPI

Aetheris is a hobbyist research project and experimental implementation exploring the intersection of State Space Models (Mamba) and Mixture of Experts (MoE).

The goal of this project was to learn by doing: attempting to combine the linear-time inference of Mamba with the sparse scaling capacity of MoE from scratch in PyTorch. It is designed as a playground for understanding these modern architectures, not as a published academic paper or production-ready foundation model.

🧪 The Experiment

Current LLM architectures are evolving rapidly. I built Aetheris to investigate a specific question:

Can we successfully interleave Mamba blocks (for long context) with sparse MoE layers (for capacity) to train an efficient model on consumer hardware?

This project implements a hybrid architecture that attempts to:

  1. Replace Attention: Use Mamba (SSM) blocks to achieve $O(N)$ sequence scaling.
  2. Scale Parameters Sparsely: Use MoE layers to increase model size without exploding the computational cost per token.
  3. Run Locally: Optimize the implementation for single-GPU training (gradient checkpointing, efficient routing).

🏗️ Architecture Implementation

Aetheris alternates between custom implementations of two core modules:

  • SSMBlock (The Backbone): Implements the selective scan mechanism described in the Mamba paper. This handles the sequence mixing and "memory" of the model.
  • SparseMoELayer (The Scaling): A router-based layer that dispatches tokens to Top-K experts (Feed-Forward Networks). This allows the model to "specialize" parts of its parameters for different types of tokens.

🚀 Quick Start

This code is provided for educational purposes and for others who want to experiment with hybrid architectures.

Installation

Option 1: Local Python Environment

git clone https://github.com/Pomilon-Intelligence-Lab/Aetheris.git
cd Aetheris
pip install -r requirements.txt

Option 2: Docker

We provide Dockerfiles for both CPU (slim) and GPU (NVIDIA) environments.

# CPU Version
docker build -t aetheris-cpu -f Dockerfile .
docker run -p 7860:7860 aetheris-cpu

# GPU Version (Requires NVIDIA Container Toolkit)
docker build -t aetheris-gpu -f Dockerfile-nvidia .
docker run --gpus all -p 7860:7860 aetheris-gpu

Usage (CLI)

Aetheris includes a CLI to train, inference, or serve the model.

1. Training (From Scratch)

# Trains a small model defined in configs/default.yaml
python -m aetheris.cli.main train --config configs/default.yaml

2. Generation (CLI)

python -m aetheris.cli.main generate --prompt "The quick brown fox" --checkpoint_dir checkpoints

3. API Server (OpenAI-Compatible)

Start a local API server that simulates OpenAI's chat completions endpoint.

python -m aetheris.cli.main serve --host 0.0.0.0 --port 8000

You can then interact with it using standard tools:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d 	{
    "model": "aetheris-hybrid",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }

Development & Testing

To run the test suite:

pytest tests/

⚙️ Configuration

You can tweak the hyperparameters in configs/. I've included a "Debug" config that is small enough to train on a laptop CPU for testing the code flow.

Config File Description
configs/default.yaml Standard experimental setup (requires GPU).
configs/debug.yaml Tiny model (2 layers) for code debugging.

📚 Acknowledgements & References

This project is an implementation study and relies heavily on the brilliant theoretical work of others. It is not an original invention of the Mamba or MoE concepts.

  • Mamba Architecture: Gu, A., & Dao, T. (2023). Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv:2312.00752
  • Mixture of Experts: Shazeer, N., et al. (2017). Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. arXiv:1701.06538
  • Inspiration: Jamba (AI21 Labs) and OpenMoE.

🧠 Model Weights & Checkpoints

All pre-trained checkpoints are hosted on the Hugging Face Hub.

Model Artifact Step Description Download
Aetheris-Base 17k Early convergence checkpoint (Loss ~1.81). Good for analyzing router behavior. 🤗 Hugging Face
Aetheris-Chat -- Coming Soon (Post-SFT) --

⚠️ Important: Aetheris uses a custom Hybrid Mamba-MoE architecture. You cannot load it directly with transformers.AutoModel. You must use the interface provided in this repository.

🐍 How to Load

python -m aetheris.cli.main generate --prompt "The quick brown fox" --checkpoint_dir path/to/checkpoints_folder # rename the checkpoint inside to checkpoint_current.pth

Note: will add better inference later down the line, for now use this scuffed version. :D

Note: These weights are from an experimental run. While they demonstrate the architectural capabilities, do not expect GPT-5 or even google bard level coherence. :D this project was made for learning and fun!

License

MIT

About

Aetheris is a hybrid Mamba-MoE Language Model designed for efficiency by combining the strengths of both Mamba and Mixture-of-Experts.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Morty Proxy This is a proxified and sanitized view of the page, visit original site.