The Muenster Skeleton Library

  • The Münster Skeleton Library

    The Muenster Skeleton Library (Muesli) is a C++ programming library enabling the hassle-free programming of heterogeneous clusters equipped with multi-core CPUs as well as many-core GPUs and Xeon Phi coprocessors by implementing the concept of so-called algorithmic skeletons. When using Muesli, low-level details of parallel programming are encapsulated inside the library, such that parallel programming is taken to a higher level of abstraction. Users do not need to bother with MPI, OpenMP and/or CUDA, but can simply implement parallel programs as if they were sequential . In essence, Muesli makes parallel programming easier, safer, and less error-prone.
  • Main Features

    • Three execution configurations for CPU, GPU, or Xeon Phi based heterogeneous clusters. From a single program source, multiple binaries for different heterogeneous clusters based on either multi-core CPUs, many-core GPUs, or Xeon Phi coprocessors.
    • Parallel containers in terms of distributed data structures (1D array + 2D matrix) abstract from the memory hierarchy of heterogeneous clusters and estabish coarse-grained parallelization. They provide a flexible data (re)distribution mechnism, automatic memory management, and implicit (lazy) data transfer between different memory areas.
    • Data parallel skeletons map, zip, fold, mapStencil, and their variants (inPlace, IndexInPlace) are implemented in terms of member functions of distributed data structures and establish fine-grained parallelization.
    • A task parallel farm skeleton can be used for simultaneous CPU+GPU execution. A dynamic load balancing mechanism ensures a reasonable workload distribution between the different execution units.
    • Flexible and convenient mechanisms for implementing and providing the skeleton user functions including both functional approaches (C++11 lambdas) and object-oriented approaches (C++ functors).  As a key feature, Muesli functors define an interface for providing additional arguments to the user function.
  • Code Example

    The following code example computes the Frobenius Norm of a matrix.

     #include "muesli.h" #include "dmatrix.h" using namespace msl; int main() { initSkeletons(argc, argv); // initialize Muesli // create distributed matrix auto init = [] (int row, int col) {return randomFloat(row, col);}; DMatrix A(8, 8, Muesli::num_total_procs, 1, init, Distribution::DIST); // create user functions auto square = [] MSL_GPUFUNC (T a) {return a*a;}; auto sum = [] MSL_GPUFUNC (T a, T b) {return a+b;}; // apply skeletons A.mapInPlace(square); T f_norm = A.fold(sum); printv("||A||_F = %f\n", sqrt(f_norm)); terminateSkeletons(); // terminate Muesli }

  • Downloads

    The most up to date version of Muesli is v3.0 and can be downloaded here. It provides the features listed above.

    There are also older versions of Muesli that provide a slightly different feature set (see below).

  • Old Versions

    • Download Muesli 2.3 
      ​Main Features: 
      • Algorithmic skeletons for multi-core clusters (MPI + OpenMP)
      • Parallel containers: Distributed Array, Matrix, Sparse Matrix​​​​
      • Data parallel skeletons: map, zip, fold, scan, and variants (InPlace, IndexInPlace)
      • Task parallel skeletons: Pipe, Farm, Filter, Branch&Bound, Devide&Conquer
      • Currying of user functions
    • Download Muesli 1.0 
      Main Features: 
      • Algorithmic skeletons for (multi-core) clusters (MPI)
      • Parallel containers: Distributed Array, Matrix
      • Data parallel skeletons: map, zip, fold, scan
      • Taks parallel skeletons: Pipe, Farm, Filter, Loop
      • Currying of user functions
  • Publications

    2016

    • Steffen Ernsting and Herbert Kuchen. Data Parallel Algorithmic Skeletons with Accelerator Support. International Journal of Parallel Programming, pages 1–17, 2016. Available as 'Online First': doi: 10.1007/s10766–016–0416–7

    ​2015

    • Steffen Ernsting and Herbert Kuchen. Java Implementation of Data Parallel Skeletons on GPUs. In In Parallel Computing: On the Road to Exascale, Proceedings of the International Conference on Parallel Computing, ParCo 2015, 1–4 September 2015, Edinburgh, Scotland, UK, pages 155–164, 2015.

    ​2014

    • Steffen Ernsting and Herbert Kuchen. A Scalable Farm Skeleton for Hybrid Parallel and Distributed Programming. International Journal of Parallel Programming, 42(6):968–987, 2014.

    ​2013

    • Steffen Ernsting and Herbert Kuchen. A Scalable Farm Skeleton for Heterogeneous Parallel Programming. In Parallel Computing: Accelerating Computational Science and Engineering (CSE), Proceedings of the International Conference on Parallel Computing, ParCo 2013, 10-13 September 2013, Garching (near Munich), Germany, pages 72–81, 2013.

    ​2012

    • Steffen Ernsting and Herbert Kuchen. Algorithmic skeletons for multi-core, multi-GPU systems and clusters. IJHPCN, 7(2):129–138, 2012.
    • Steffen Ernsting and Herbert Kuchen. Data Parallel Skeletons in Java. In Proceedings of the International Conference on Computational Science, ICCS 2012, Omaha, Nebraska, USA, 4–6 June, 2012, pages 1817–1826, 2012.

    ​2011

    • Steffen Ernsting and Herbert Kuchen. Data Parallel Skeletons for GPU Clusters and Multi-GPU Systems. In Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August – 3 September 2011, Ghent, Belgium, pages 509–518, 2011.

    ​2010

    • Philipp Ciechanowicz and Herbert Kuchen: Enhancing Muesli's Data Parallel Skeltons for Multi-Core Computer Architectures. In: Proceedings of the 12th IEEE International Conference on High Performance Computing and communications (HPCC). Melbourne, Victoria, Australia, pp. 108-113, DOI 10.1109/HPCC.2010.23.

    ​​​2009

    • Philipp Ciechanowicz, Michael Poldner, Herbert Kuchen: The Münster Skeleton Library Muesli- A Comprehensive Overview. ERCIS Working Paper No. 7, 2009.

    ​2008

    • Philipp Ciechanowicz: Algorithmic Skeletons for General Sparse Matrices on Multi-Core Processors. In: Proceedings of The 20th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS). Orlando, Florida, USA, 2008, p. 188-197.
    • Philipp Ciechanowicz,  Stephan Duglosz, Herbert Kuchen, Ulrich Müller-Funk: Exploiting Training Example Parallelism with a Batch Variant of the ART 2 Classification Algorithm. In: Proceedings of The IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN) as part of The 26th IASTED International Multi-Conference on Applied Informatics. Innsbruck, Austria, 2008, p. 195-201
    • Michael Poldner, Herbert Kuchen: Task Parallel Skeletons for Divide and Conquer. Proceedings of Workshop of the Working Group Programming Languages and Computing Concepts of the German Computer Science Association GI, Bad Honnef, 2008.
    • Michael Poldner, Herbert Kuchen: Optimizing Skeletal Stream Processing for Divide and Conquer. Proceedings of the 3rd International Conference on Software and Data Technologies (ICSOFT), pages 181-189, INSTICC PRESS, 2008.
    • Michael Poldner, Herbert Kuchen: Algorithmic Skeletons for Branch and Bound. ICSOFT 2006, CCIS 10, pages 204–219, Springer, 2008.
    • Michael Poldner, Herbert Kuchen: On Implementing the Farm Skeleton. Parallel Processing Letters, Vol. 18, No. 1, pages 117-131, March 2008.
    • Michael Poldner, Herbert Kuchen: Skeletons for Divide and Conquer Algorithms. Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN), Innsbruck, Austria, February, IASTED/ACTA Press 2008. 

    ​2006

    • Michael Poldner, Herbert Kuchen: Algorithmic Skeletons for Branch & Bound. Proceedings of 1st International Conference on Software and Data Technology (ICSOFT), Vol. 1, pages 291-300, Setubal, Portugal, 2006.
    • Michael Poldner, Herbert Kuchen: Scalable Farms. Proceedings of the International Conference ParCo 2005, NIC Series, Vol. 33, pages 795-802, 2006.

    ​2005

    • Michael Poldner, Herbert Kuchen: On Implementing the Farm Skeleton. Proceedings of 3rd International Workshop on High-level Parallel Programming and Applications (HLPP), Warwick, 2005. 

    ​2003

    • Herbert Kuchen. A Skeleton Library. In Proceedings of the 8th International Euro-Par Conference on Parallel Processing, Euro-Par’02, pages 620–629, London, UK, 2002. Springer-Verlag. ISBN 3-540-44049-6.

    ​2002

    • H. Kuchen and J. Striegnitz. Higher-Order Functions and Partial Applications for a C++ Skeleton Library. In Proceedings of the 2002 joint ACM-ISCOPE Conference on Java Grande, pages 122–130. ACM, 2002
    • Herbert Kuchen and Murray Cole. The Integration of Task and Data Parallel Skeletons. Parallel Processing Letters, 12(02):141–155, 2002.
  • Licensing

    Muesli is available under the MIT license. For more information please refer to the LICENSE.txt file. 
  • Contact