Getting Started
This guide will help you get started with OptiReduce.
System Requirements
Hardware Requirements
- Network Interface Card (NIC):
- Recommended: Mellanox ConnectX NICs (supports flow bifurcation)
- Alternative: Two NICs (one for TCP-based communication, one DPDK-compatible NIC)
- CPU: At least 4 dedicated cores for OptiReduce
- Memory: 16GB of hugepages
Recommended Software Requirements
- Ubuntu 22.04 LTS
- Python 3.9.19
- CUDA 11.7 and cuDNN 8.5 (optional, for GPU training)
Installation Options
You can install OptiReduce in two ways:
1. Using Ansible (Recommended)
For automated deployment across multiple nodes:
git clone https://github.com/OptiReduce/ansible.git
cd ansible
make optireduce-full
2. Manual Installation
Clone and install the core repository on all nodes manually:
git clone https://github.com/OptiReduce/setup.git
cd setup
make install
For detailed installation instructions, see our Installation Guide.
Quick Start
1. Configure Environment
export GLOO_ALGO=Optireduce
export GLOO_SOCKET_IFNAME="ens17" # Your NIC name
export GLOO_DPDK_THREADS_OFFSET=11 # OptiReduce cores (will be 11-15 here)
2. Setup Your Code
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP
# Initialize process group
dist.init_process_group(backend="gloo")
# Create DDP model
model = DDP(model, bucket_cap_mb=1350) # This value is required
3. Run Training
We provide ready-made training scripts for popular models (VGG19, BERT, BART, RoBERTa, GPT2) in our benchmark repository. See our benchmarking guide for running these scripts.
What's Next?
Understanding OptiReduce
- Check Usage Guide for detailed configuration options
- Read Technical Details to learn about OptiReduce's architecture
Optimizing Performance
- Follow our Benchmarking Guide to evaluate performance
- Learn how to simulate different network environments
- Compare OptiReduce with other communication schemes
Getting Help
If you encounter issues:
- Check the documentation pages linked above
- Review existing GitHub issues
- Open a new issue with a minimal example
Contributing
We welcome contributions to OptiReduce! Whether it's improving documentation, fixing bugs, optimizing performance, or adding new features, your help is appreciated. Please check our Contributing Guide for guidelines on how to get started.