CodeBase-280B: Next-Gen MoE LLM Launched
Join CommunitySummary
The 404Development | Software Hub community has launched the first phase of CodeBase-280B, their next-generation Mixture of Experts (MoE) Large Language Model. This model features 128 experts with 9 active per token, a massive 384,000 token context window, and optimized inference via 8-bit quantization. This launch is significant because it offers a highly scalable and efficient LLM designed for advanced performance across multi-GPU setups.
🚀 Introducing CodeBase-280B: Phase 1
Hey everyone,
We’re excited to share the first phase of CodeBase-280B, our next-generation language model built for performance, scalability, and advanced AI capabilities. Here’s what makes it special:
💡 Key Highlights
- Mixture of Experts (MoE): 128 experts with 9 active per token, meaning the model only uses the parts it needs for efficiency and speed.
- Massive Context Window: Handles up to 384,000 tokens at once, allowing it to understand extremely long documents or conversations.
- Compressed Attention: Optimized memory usage with partial KV sharing to keep inference fast.
- Efficient Inference: 8-bit quantization with KV cache compression reduces memory requirements without sacrificing quality.
- Parallel & Distributed: Designed for multi-GPU setups with support for distributed training and mixed precision (bfloat16) for optimal performance.
📊 Specs at a Glance
- Hidden Size: 7,168
- Layers: 75
- Attention Heads: 52
- Experts: 128 total, 9 active
- Context Window: 384,000 tokens
- Parameters: ~280B total (~18B active at a time)
- Vocabulary: 51,200 tokens
🛠️ Project Structure & Usage
CodeBase-280B includes everything you need to train, test, and run inference on the model, including:
- Transformer architecture and MoE modules
- Compressed attention and RoPE positional encoding
- Quantization utilities for memory-efficient inference
- Training scripts with multi-GPU support
- Open-source configuration for customization
Installation is simple: clone the repo and install dependencies via pip. You can train, generate text, benchmark, or run tests with provided scripts.