Skip to content

Ansible Deployment Guide

Download

Clone the ansible repository:

git clone https://github.com/OptiReduce/ansible.git
cd ansible

Prerequisites

Ansible Installation

# For Ubuntu/Debian
sudo apt update
sudo apt install software-properties-common
sudo apt-add-repository --yes --update ppa:ansible/ansible
sudo apt install ansible

# For RHEL/CentOS
sudo yum install epel-release
sudo yum install ansible

# Verify installation
ansible --version

SSH Setup

  • Ensure SSH access to target machines
  • Configure SSH keys for passwordless authentication
  • Test connection to all target machines

Directory Structure

ansible/
├── ansible.cfg
├── inventory
│   └── hosts
├── group_vars
│   └── all.yml
├── optireduce_deploy.yml
├── Makefile
└── roles/
    ├── cuda/
    ├── mellanox/
    ├── anaconda/
    ├── optireduce/
    └── benchmark/

Configuration

Inventory Setup

Edit inventory/hosts to specify your target machines:

[gpu_nodes]
node1 ansible_host=192.168.1.101 ansible_user=test ansible_become_password=test
node2 ansible_host=192.168.1.102 ansible_user=test ansible_become_password=test

Variable Configuration

Edit group_vars/all.yml to customize versions and settings:

# CUDA settings
cuda_version: "11.7.0-1"
nvidia_version: "560"
cudnn_version: "8.5.0.96-1+cuda11.7"

# Python/Conda settings
python_version: "3.9.19"
dpdk_version: "v20.11"

# Other settings...

Deployment Options

The deployment can be customized using the provided Makefile:

Full Installation

make optireduce-full

Selective Installation

# Install only CUDA
make cuda-only

# Install only benchmarking tools
make benchmark-only

# Custom installation
make deploy INSTALL_CUDA=true INSTALL_BENCHMARK=true

Check Configuration

make check

Available Components

You can selectively install the following components:

  • CUDA (11.7) and cuDNN (8.5)
  • Mellanox OFED
  • Anaconda with Python 3.9.19
  • DPDK v20.11
  • OptiReduce core
  • Benchmarking tools

Environment Variables

The following environment variables can be set to customize the deployment:

INSTALL_CUDA=true/false
INSTALL_MELLANOX=true/false
INSTALL_ANACONDA=true/false
INSTALL_OPTIREDUCE=true/false
INSTALL_BENCHMARK=true/false

Common Issues and Troubleshooting

SSH Connection Issues

  • Verify SSH keys are properly set up
  • Check network connectivity
  • Ensure proper permissions on SSH keys

CUDA Installation Failures

  • Verify system compatibility
  • Check for sufficient disk space
  • Ensure proper network connectivity to NVIDIA repositories

OFED Installation Issues

  • Verify kernel compatibility
  • Check system prerequisites
  • Ensure proper network connectivity

Additional Resources