Skip to content

Publications

Research Paper

OptiReduce: Resilient and Tail-Optimal AllReduce for Distributed Deep Learning in the Cloud
Presented at USENIX NSDI 2025

Authors

  • Ertza Warraich (Purdue University)
  • Omer Shabtai (NVIDIA)
  • Khalid Manaa (NVIDIA)
  • Shay Vargaftik (VMware Research)
  • Yonatan Piasetzky (NVIDIA)
  • Matty Kadosh (NVIDIA)
  • Lalith Suresh (Feldera)
  • Muhammad Shahbaz (University of Michigan)

Citation

Please cite this paper when using OptiReduce:

@inproceedings{warraich2025optireduce,
    title={OptiReduce: Resilient and Tail-Optimal AllReduce for Distributed Deep Learning in the Cloud},
    author={Warraich, Ertza and Shabtai, Omer and Manaa, Khalid and Vargaftik, Shay and Piasetzky, Yonatan and Kadosh, Matty and Suresh, Lalith and Shahbaz, Muhammad},
    booktitle={22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25)},
    year={2025},
    publisher={USENIX Association}
}

Technical Documentation

We maintain detailed documentation about OptiReduce's:

Contact

For research-related queries or collaborations:

  • Email: ewarraic@purdue.edu