Indice

Class work 1

Instructions to use the remote machine

Once you have implemented on your own laptop the program, in order to test it on andromeda (the 8 core, 16 contexts machine remotely available, do the following steps:

  1. ssh spm@andromeda.di.unipi.it
  2. create a subdirectory with your name, e.g. mkdir marcodanelutto
  3. exit
  4. scp localfile.cpp spm@andromeda.di.unipi.it:marcodanelutto/
  5. ssh spm@andromeda.di.unipi.it
  6. cd marcodanelutto
  7. then compile with gcc/g++ and run as you did on your own laptop

Pthread routines of interests

More grain in a computation

To increase the time spent in the f(x) computed by the pipeline stage or by the map workers, use a loop such as

dummyload.c
for(int i=0; i<LARGEVALUE; i++) 
   x = sin(x); 

For large values of iterations (millions) it is easy to get a dely ranging in fractions of seconds.

Sample pipeline

Using C (or C++) and pthread (only), implement a three stage pipeline where:

Each stage should be implemented as a concurrent activity (a pthread).

The stream should be implemented in such a way a special tag “EOS” is propagate when no more stream items are available. Concurrent activities receiving (or generating, in case of the first stage) the EOS should propagate (if the case) the EOS and then terminate.

The communication channels implementing the stream should be properly implemented using any of the standard data structures available in C/C++ or in the standard libraries.

Proper “timing” code should be used to print the amount of time spent computing the whole pipeline.

Sample code

Sample map

Using a code similar to the one used to implement the farm (the one discussed in class lessons), implement a map where a floating point vector of N items is processed by *NW* workers to obtain the vector with each item being the square of its original value. As for the other exercise, you should implement the map using (only) pthreads. Once it works, modify the code in such a way a stream of vectors is processed, rather than a single one.

Class work 2

Crivello di Eratostene

Set up a pipeline filtering a stream of integers from 2 to MAX, such that:

       2
[GEN] ---> [CHK( )] ---> [CHK( )] ---> [CHK( )] ---> [DISCARD]
       3     
[GEN] ---> [CHK(2)] ---> [CHK( )] ---> [CHK( )] ---> [DISCARD]
       4       3
[GEN] ---> [CHK(2)] ---> [CHK( )] ---> [CHK( )] ---> [DISCARD]
       5       (4)   3                                           4 discarded
[GEN] ---> [CHK(2)] ---> [CHK( )] ---> [CHK( )] ---> [DISCARD]
       6       5             
[GEN] ---> [CHK(2)] ---> [CHK(3)] ---> [CHK( )] ---> [DISCARD]
       7       (6)   5                                           6 discarded
[GEN] ---> [CHK(2)] ---> [CHK(3)] ---> [CHK( )] ---> [DISCARD]
       8       7                   5             
[GEN] ---> [CHK(2)] ---> [CHK(3)] ---> [CHK(5)] ---> [DISCARD]

Adding stages to a pipeline

A pipeline may be created and stages may be added by invoking the method add_stage, e.g. as in

myPipe.add_stage(new Stage(...))

Prime numbers in an interval

Consider the possibility to list all the prime numbers in an interval [MIn, MAX] by:

This implies using a completely different parallel pattern with respect to the previous exercise, of course.

Sample solutions

Sample solutions for the prime number examples here

Class work 3

Matrix multiplication

Implement a C++ program computing the standard ijk matrix multiplication algorithm between two matrices A and B of size MxP and PxN, respectively.

Then parallelize the code using the FastFlow ParallelFor and ParalleForReduce patterns. Implement three versions:

The seq ijk matrix multiplication code looks as follows:

for(size_t i=0; i<M; i++) 
  for(size_t j=0; j<P; j++) {
    C[i][j] = 0.0;
    for(size_t k=0; k<P; k++) 
       C[i][j] += A[i][k]*B[k][j];
  }

Sample solutions

See the matmul example in the FastFlow tutorial source code.

Class work 4

Macro Data-Flow matrix multiplication

Implement by using the FastFlow MDF pattern (ff_mdf), the matrix multiplication algorithm starting from the following sequential code (matrices are of size NxN, double precision elements) :

for(size_t i=0;i<N; ++i)
  for(size_t j=0;j<N;++j) {
    A[i*N+j] = i+j;
    B[i*N+j] = i*j;
  }
for(size_t i=0;i<N; ++i)
  for(size_t j=0;j<N;++j) {
    C[i*N +j] = 0;
    for(size_t k=0;k<N;++k) {
      C[i*N + j] += A[ i*N+k ]*B[ j*N+k ];  // B is transposed !
    }
  }

The initialization of the two matrices A and B has to be overlapped with the computation of the elements of the C matrix.

Sample solutions

Sample code here.

Class work 5

Map skeleton with dynamic scheduling

Sample solutions

Sample code here,