Personal Site

>

Projects:

Open MPI

  • Kernel-assisted Intra-node Collective component: KNEM Collective
  • Kernel-assisted Hierarchical Collective component: HierKNEM Collective
  • Specific networking supporting for Open MPI's point-to-point communication: ELAN BTL for Quadrics Networks, SICORTEX BTL, SM/KNEM BTL(kernel-assisted Inter-Process Communication approach)

Cell Message Layer(CML)

  • Develop non-blocking communication libraries for cell messaging layer, which is an MPI-style lightweight communication libraries for LANL's supercomputer `roadrunner'.

FT-MPI

  • FT-MPI has been developed in the frame of the HARNESS project. The goal of FT-MPI is to provide the end-user a communication library providing an MPI API, which benefits from the fault-tolerance in the HARNESS system. Therefore, FT-MPI implements the whole MPI-1.2 specification, some parts of the MPI-2 document and extends some of the semantics of MPI for giving the application the possibility to recover from failed processes. FT-MPI survives the crash of n-1 processes in a n-process job, and, if required, can respawn them. However, it is still the responsebility of the application to recover the data-structures and the data on the crahsed processes.

DAGuE

  • DAGuE aims at enabling scientific computing on large scale distributed environments featuring many cores, accelerators and high speed networks. The framework includes libraries, a runtime system, and development tools to help application developers tackle the difficult task of porting their applications to highly heterogeneous and diverse environment. Current and future computing environments leverage highly parallel and heterogeneous hardware systems. Taking advantage of the hardware parallelism exposed by these environments requires highly technical approaches, involving synergies between multiple divergent programming models. This daunting task is critical to expose the hardware parallelism to the application, but due to the involved complexity it remains a solution that only a few programmers can reasonably approach. This project goes out of the sequential, multi-programming mainstream model by proposing a data-flow approach. The application describes a set of data dependencies between tasks, and the DAGuE runtime manages the data transfers and copies in order to complete the execution of the application. It seamlessly integrate accelerators into the execution environment, allowing applications to portably maximize their efficiency on any heterogeneous distributed computing environment.