Ansible Deployment Guide
Download
Clone the ansible repository:
git clone https://github.com/OptiReduce/ansible.git
cd ansible
Prerequisites
Ansible Installation
# For Ubuntu/Debian
sudo apt update
sudo apt install software-properties-common
sudo apt-add-repository --yes --update ppa:ansible/ansible
sudo apt install ansible
# For RHEL/CentOS
sudo yum install epel-release
sudo yum install ansible
# Verify installation
ansible --version
SSH Setup
- Ensure SSH access to target machines
- Configure SSH keys for passwordless authentication
- Test connection to all target machines
Directory Structure
ansible/
├── ansible.cfg
├── inventory
│ └── hosts
├── group_vars
│ └── all.yml
├── optireduce_deploy.yml
├── Makefile
└── roles/
├── cuda/
├── mellanox/
├── anaconda/
├── optireduce/
└── benchmark/
Configuration
Inventory Setup
Edit inventory/hosts
to specify your target machines:
[gpu_nodes]
node1 ansible_host=192.168.1.101 ansible_user=test ansible_become_password=test
node2 ansible_host=192.168.1.102 ansible_user=test ansible_become_password=test
Variable Configuration
Edit group_vars/all.yml
to customize versions and settings:
# CUDA settings
cuda_version: "11.7.0-1"
nvidia_version: "560"
cudnn_version: "8.5.0.96-1+cuda11.7"
# Python/Conda settings
python_version: "3.9.19"
dpdk_version: "v20.11"
# Other settings...
Deployment Options
The deployment can be customized using the provided Makefile:
Full Installation
make optireduce-full
Selective Installation
# Install only CUDA
make cuda-only
# Install only benchmarking tools
make benchmark-only
# Custom installation
make deploy INSTALL_CUDA=true INSTALL_BENCHMARK=true
Check Configuration
make check
Available Components
You can selectively install the following components:
- CUDA (11.7) and cuDNN (8.5)
- Mellanox OFED
- Anaconda with Python 3.9.19
- DPDK v20.11
- OptiReduce core
- Benchmarking tools
Environment Variables
The following environment variables can be set to customize the deployment:
INSTALL_CUDA=true/false
INSTALL_MELLANOX=true/false
INSTALL_ANACONDA=true/false
INSTALL_OPTIREDUCE=true/false
INSTALL_BENCHMARK=true/false
Common Issues and Troubleshooting
SSH Connection Issues
- Verify SSH keys are properly set up
- Check network connectivity
- Ensure proper permissions on SSH keys
CUDA Installation Failures
- Verify system compatibility
- Check for sufficient disk space
- Ensure proper network connectivity to NVIDIA repositories
OFED Installation Issues
- Verify kernel compatibility
- Check system prerequisites
- Ensure proper network connectivity