Ammar Ahmad Awan
Ammar Ahmad Awan
Microsoft
Verified email at osu.edu - Homepage
Title
Cited by
Cited by
Year
S-caffe: Co-designing mpi runtimes and caffe for scalable deep learning on modern gpu clusters
AA Awan, K Hamidouche, JM Hashmi, DK Panda
ACM PPoPP '17 52 (8), 193-205, 2017
1092017
Privacy-aware searching with oblivious term matching for cloud storage
Z Pervez, AA Awan, AM Khattak, S Lee, EN Huh
The Journal of Supercomputing 63 (2), 538-560, 2013
402013
Efficient large message broadcast using NCCL and CUDA-aware MPI for deep learning
AA Awan, K Hamidouche, A Venkatesh, DK Panda
Proceedings of the 23rd European MPI Users' Group Meeting, 15-22, 2016
322016
An in-depth performance characterization of CPU-and GPU-based DNN training on modern architectures
AA Awan, H Subramoni, DK Panda
Proceedings of the Machine Learning on HPC Environments, 1-8, 2017
292017
Optimized broadcast for deep learning workloads on dense-GPU InfiniBand clusters: MPI or NCCL?
AA Awan, CH Chu, H Subramoni, DK Panda
Proceedings of the 25th European MPI Users' Group Meeting, 1-9, 2018
272018
Scalable distributed dnn training using tensorflow and cuda-aware mpi: Characterization, designs, and performance evaluation
AA Awan, J Bédorf, CH Chu, H Subramoni, DK Panda
2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2019
162019
CUDA kernel based collective reduction operations on large-scale GPU clusters
CH Chu, K Hamidouche, A Venkatesh, AA Awan, DK Panda
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016
162016
OC-DNN: Exploiting advanced unified memory capabilities in CUDA 9 and volta GPUs for out-of-core DNN training
AA Awan, CH Chu, H Subramoni, X Lu, DK Panda
2018 IEEE 25th International Conference on High Performance Computing (HiPC …, 2018
152018
Efficient and scalable multi-source streaming broadcast on gpu clusters for deep learning
CH Chu, X Lu, AA Awan, H Subramoni, J Hashmi, B Elton, DK Panda
2017 46th International Conference on Parallel Processing (ICPP), 161-170, 2017
132017
Intercloud message exchange middleware
MB Amin, WA Khan, AA Awan, S Lee
Proceedings of the 6th International Conference on Ubiquitous Information …, 2012
122012
Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters
K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ...
2015 IEEE International Conference on Cluster Computing, 78-87, 2015
112015
Designing non-blocking personalized collectives with near perfect overlap for rdma-enabled clusters
H Subramoni, AA Awan, K Hamidouche, D Pekurovsky, A Venkatesh, ...
International Conference on High Performance Computing, 434-453, 2015
92015
Exploiting hardware multicast and GPUDirect RDMA for efficient broadcast
CH Chu, X Lu, AA Awan, H Subramoni, B Elton, DK Panda
IEEE Transactions on Parallel and Distributed Systems 30 (3), 575-588, 2018
72018
CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters
K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ...
Parallel Computing 58, 27-36, 2016
72016
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow
AA Awan, A Jain, Q Anthony, H Subramoni, DK Panda
arXiv preprint arXiv:1911.05146, 2019
52019
Performance characterization of dnn training using tensorflow and pytorch on modern clusters
A Jain, AA Awan, Q Anthony, H Subramoni, DKDK Panda
2019 IEEE International Conference on Cluster Computing (CLUSTER), 1-11, 2019
52019
Optimized large-message broadcast for deep learning workloads: MPI, MPI+ NCCL, or NCCL2?
AA Awan, KV Manian, CH Chu, H Subramoni, DK Panda
Parallel Computing 85, 141-152, 2019
42019
Nv-group: link-efficient reduction for distributed deep learning on modern dense gpu systems
CH Chu, P Kousha, AA Awan, KS Khorassani, H Subramoni, DK Panda
Proceedings of the 34th ACM International Conference on Supercomputing, 1-12, 2020
32020
Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects
AA Awan, A Jain, CH Chu, H Subramoni, DK Panda
IEEE Micro 40 (1), 35-43, 2019
32019
On-demand connection management for OpenSHMEM and OpenSHMEM+ MPI
S Chakraborty, H Subramoni, J Perkins, AA Awan, DK Panda
2015 IEEE International Parallel and Distributed Processing Symposium …, 2015
32015
The system can't perform the operation now. Try again later.
Articles 1–20