S-caffe: Co-designing mpi runtimes and caffe for scalable deep learning on modern gpu clusters AA Awan, K Hamidouche, JM Hashmi, DK Panda Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of …, 2017 | 119 | 2017 |
Efficient inter-node MPI communication using GPUDirect RDMA for InfiniBand clusters with NVIDIA GPUs S Potluri, K Hamidouche, A Venkatesh, D Bureddy, DK Panda 2013 42nd International Conference on Parallel Processing, 80-89, 2013 | 118 | 2013 |
MVAPICH-PRISM: A proxy-based communication framework using InfiniBand and SCIF for Intel MIC clusters S Potluri, D Bureddy, K Hamidouche, A Venkatesh, K Kandalla, ... Proceedings of the International Conference on High Performance Computing …, 2013 | 50 | 2013 |
A case for application-oblivious energy-efficient MPI runtime A Venkatesh, A Vishnu, K Hamidouche, N Tallent, D Panda, D Kerbyson, ... SC'15: Proceedings of the International Conference for High Performance …, 2015 | 35 | 2015 |
Efficient large message broadcast using NCCL and CUDA-aware MPI for deep learning AA Awan, K Hamidouche, A Venkatesh, DK Panda Proceedings of the 23rd European MPI Users' Group Meeting, 15-22, 2016 | 33 | 2016 |
Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters R Shi, S Potluri, K Hamidouche, J Perkins, M Li, D Rossetti, DKDK Panda 2014 21st International Conference on High Performance Computing (HiPC), 1-10, 2014 | 32 | 2014 |
Designing optimized mpi broadcast and allreduce for many integrated core (mic) infiniband clusters K Kandalla, A Venkatesh, K Hamidouche, S Potluri, D Bureddy, DK Panda 2013 IEEE 21st Annual Symposium on High-Performance Interconnects, 63-70, 2013 | 30 | 2013 |
Designing MPI library with dynamic connected transport (DCT) of InfiniBand: early experiences H Subramoni, K Hamidouche, A Venkatesh, S Chakraborty, DK Panda International Supercomputing Conference, 278-295, 2014 | 29 | 2014 |
Parallel smith-waterman comparison on multicore and manycore computing platforms with BSP++ K Hamidouche, FM Mendonca, J Falcou, ACMA de Melo, D Etiemble International Journal of Parallel Programming 41 (1), 111-136, 2013 | 26 | 2013 |
Power-check: An energy-efficient checkpointing framework for HPC clusters RR Chandrasekar, A Venkatesh, K Hamidouche, DK Panda 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2015 | 24 | 2015 |
Hand: A hybrid approach to accelerate non-contiguous data movement using mpi datatypes on gpu clusters R Shi, X Lu, S Potluri, K Hamidouche, J Zhang, DK Panda 2014 43rd International Conference on Parallel Processing, 221-230, 2014 | 23 | 2014 |
A scalable and portable approach to accelerate hybrid HPL on heterogeneous CPU-GPU clusters R Shi, S Potluri, K Hamidouche, X Lu, K Tomko, DK Panda 2013 IEEE International Conference on Cluster Computing (CLUSTER), 1-8, 2013 | 21 | 2013 |
A framework for an automatic hybrid MPI+ OpenMP code generation. K Hamidouche, J Falcou, D Etiemble SpringSim (hpc), 48-55, 2011 | 20 | 2011 |
Hybrid bulk synchronous parallelism library for clustered SMP architectures K Hamidouche, J Falcou, D Etiemble Proceedings of the fourth international workshop on High-level parallel …, 2010 | 20 | 2010 |
Designing scalable out-of-core sorting with hybrid MPI+ PGAS programming models J Jose, S Potluri, H Subramoni, X Lu, K Hamidouche, K Schulz, H Sundar, ... Proceedings of the 8th International Conference on Partitioned Global …, 2014 | 19 | 2014 |
Scalable Graph500 design with MPI-3 RMA M Li, X Lu, S Potluri, K Hamidouche, J Jose, K Tomko, DK Panda 2014 IEEE International Conference on Cluster Computing (CLUSTER), 230-238, 2014 | 18 | 2014 |
MIC-RO: enabling efficient remote offload on heterogeneous many integrated core (MIC) clusters with InfiniBand K Hamidouche, S Potluri, H Subramoni, K Kandalla, DK Panda Proceedings of the 27th international ACM conference on International …, 2013 | 17 | 2013 |
Cuda kernel based collective reduction operations on large-scale gpu clusters CH Chu, K Hamidouche, A Venkatesh, AA Awan, DK Panda 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016 | 16 | 2016 |
Mvapich2-mic: A high performance mpi library for xeon phi clusters with infiniband S Potluri, K Hamidouche, D Bureddy, DK Panda 2013 Extreme Scaling Workshop (xsw 2013), 25-32, 2013 | 16 | 2013 |
Three high performance architectures in the parallel APMC boat K Hamidouche, A Borghi, P Esterie, J Falcou, S Peyronnet 2010 Ninth International Workshop on Parallel and Distributed Methods in …, 2010 | 16 | 2010 |