Skip to content

Installation Guide

Prerequisites

Network Interface Card (NIC) Requirements

OptiReduce works best with Mellanox ConnectX NICs as they support DPDK flow bifurcation. This allows:

  • Single NIC operation for both PyTorch/Gloo TCP-based communication and OptiReduce DPDK-based communication
  • Efficient packet steering and processing
  • Optimal performance with hardware offloading capabilities

If not using Mellanox ConnectX NICs, you will need:

  1. One NIC for standard TCP-based PyTorch and Gloo communication
  2. A separate DPDK-compatible NIC for OptiReduce

Note

DPDK v20.11 will be installed automatically as part of OptiReduce.

Info

While CUDA and cuDNN are supported for GPU training, they are not required for OptiReduce to work. OptiReduce can also be used with CPU-only training.

Installation Options

There are two ways to install OptiReduce and its dependencies:

For automated deployment across multiple nodes:

git clone https://github.com/OptiReduce/ansible.git
cd ansible
make optireduce-full

For detailed instructions on using the Ansible deployment, visit our Ansible documentation.

Option 2: Manual Installation

  1. Install prerequisites:

    • Mellanox OFED drivers (if using Mellanox NICs)
    • Anaconda
    • CUDA and cuDNN (optional, for GPU training)
  2. Install OptiReduce components:

    # Clone the optireduce setup repository
    git clone https://github.com/OptiReduce/setup.git
    cd setup
    
    # Install all components
    make install
    
    # Or install specific components
    make dpdk        # Install DPDK only
    make optireduce  # Setup OptiReduce only
    make hadamard    # Install Hadamard CUDA only
    

Tip

Use make help to see all available installation options.

Directory Structure

setup/
├── Makefile          # Build and installation scripts
├── patches/          # Required patches for OptiReduce logic

Next Steps

For detailed instructions on using OptiReduce in your distributed training setup, please refer to our usage guide.

To evaluate OptiReduce's performance and compare different communication schemes in your environment, please refer to our benchmarking guide.

Additional Documentation