Publications
Research Paper
OptiReduce: Resilient and Tail-Optimal AllReduce for Distributed Deep Learning in the Cloud
Presented at USENIX NSDI 2025
Authors
- Ertza Warraich (Purdue University)
- Omer Shabtai (NVIDIA)
- Khalid Manaa (NVIDIA)
- Shay Vargaftik (VMware Research)
- Yonatan Piasetzky (NVIDIA)
- Matty Kadosh (NVIDIA)
- Lalith Suresh (Feldera)
- Muhammad Shahbaz (University of Michigan)
Citation
Please cite this paper when using OptiReduce:
@inproceedings{warraich2025optireduce,
title={OptiReduce: Resilient and Tail-Optimal AllReduce for Distributed Deep Learning in the Cloud},
author={Warraich, Ertza and Shabtai, Omer and Manaa, Khalid and Vargaftik, Shay and Piasetzky, Yonatan and Kadosh, Matty and Suresh, Lalith and Shahbaz, Muhammad},
booktitle={22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25)},
year={2025},
publisher={USENIX Association}
}
Technical Documentation
We maintain detailed documentation about OptiReduce's:
Contact
For research-related queries or collaborations:
- Email: ewarraic@purdue.edu