magistraleinformaticanetworking:spm:spm14exe1

Once you have implemented on your own laptop the program, in order to test it on andromeda (the 8 core, 16 contexts machine remotely available, do the following steps:

**ssh spm@andromeda.di.unipi.it**- create a subdirectory with your name, e.g.
**mkdir marcodanelutto** **exit****scp localfile.cpp spm@andromeda.di.unipi.it:marcodanelutto/****ssh spm@andromeda.di.unipi.it****cd marcodanelutto**- then compile with gcc/g++ and run as you did on your own laptop

**pthread_mutex_t**the type of the mutexes (to be initialized with a**PTHREAD_MUTEX_INITIALIZER**)**pthread_cont_t**the type of the condition variables (to be initialized with a**PTHREAD_COND_INITIALIZER**)**pthread_mutex_lock(pthread_mutex_t *)****pthread_mutex_unlock(pthread_mutex_t *)****pthread_cond_signal(pthread_cond_t *)****pthread_cond_wait(pthread_cond_t *, pthread_mutex_lock *)****pthread_create(pthread_t *tid, pthread_attr_t attributes, void *(*routine) (void *) , void * param)**and**pthread_join(pthread_t tid, void * *retval)**should be used to create a thread and await termination

To increase the time spent in the **f(x)** computed by the pipeline stage or by the map workers, use a loop such as

- dummyload.c
for(int i=0; i<LARGEVALUE; i++) x = sin(x);

For large values of iterations (millions) it is easy to get a dely ranging in fractions of seconds.

Using C (or C++) and **pthread** (only), implement a three stage pipeline where:

- the first stage generates a stream of floats from 0.00 to
*max*(*max*given through command line parameters) - the second stage squares the items appearing onto the input stream and delivers them to the output stream
- the last stage accumulates (summing them up to 0.00) all the values received onto the input stream.

Each stage should be implemented as a concurrent activity (a pthread).

The stream should be implemented in such a way a special tag “EOS” is propagate when no more stream items are available. Concurrent activities receiving (or generating, in case of the first stage) the EOS should propagate (if the case) the EOS and then terminate.

The communication channels implementing the stream should be properly implemented using any of the standard data structures available in C/C++ or in the standard libraries.

Proper “timing” code should be used to print the amount of time spent computing the whole pipeline.

Using a code similar to the one used to implement the farm (the one discussed in class lessons), implement a map where a floating point vector of **N** items is processed by *NW* workers to obtain the vector with each item being the square of its original value. As for the other exercise, you should implement the map using (only) **pthreads**.
Once it works, modify the code in such a way a stream of vectors is processed, rather than a single one.

Set up a pipeline filtering a stream of integers from 2 to MAX, such that:

- a first stage generates integers towards the rest of the pipeline stages
- the I-th stage:
- if has no number stored, stores the first number received and does not forward it
- if it has a number stored
- passes the received number if it is not multiple of the stored integer
- discards the received number if it is a multiple of the stored integer

- after EOS, each stage prints its stored number (a prime number)

2 [GEN] ---> [CHK( )] ---> [CHK( )] ---> [CHK( )] ---> [DISCARD] 3 [GEN] ---> [CHK(2)] ---> [CHK( )] ---> [CHK( )] ---> [DISCARD] 4 3 [GEN] ---> [CHK(2)] ---> [CHK( )] ---> [CHK( )] ---> [DISCARD] 5 (4) 3 4 discarded [GEN] ---> [CHK(2)] ---> [CHK( )] ---> [CHK( )] ---> [DISCARD] 6 5 [GEN] ---> [CHK(2)] ---> [CHK(3)] ---> [CHK( )] ---> [DISCARD] 7 (6) 5 6 discarded [GEN] ---> [CHK(2)] ---> [CHK(3)] ---> [CHK( )] ---> [DISCARD] 8 7 5 [GEN] ---> [CHK(2)] ---> [CHK(3)] ---> [CHK(5)] ---> [DISCARD]

A pipeline may be created and stages may be added by invoking the method add_stage, e.g. as in

myPipe.add_stage(new Stage(...))

Consider the possibility to list all the prime numbers in an interval [MIn, MAX] by:

- splitting the interval in a number
*nw*of subintervals - assigning each subinterval to a farm worker
- the worker
- for each i in the interval
- checks if i is prime
- either using the test a^(i-1) mod i == 1? (this may give false positives)
- or using the test at this page

This implies using a completely different parallel pattern with respect to the previous exercise, of course.

Sample solutions for the prime number examples here

Implement a C++ program computing the standard ijk matrix multiplication algorithm between two matrices A and B of size MxP and PxN, respectively.

Then parallelize the code using the FastFlow ParallelFor and ParalleForReduce patterns. Implement three versions:

- ParallelFor applied to the first (external) loop (index
*i*) - ParallelFor applied to the second loop (index
*j*) - ParallelForReduce applied to the third loop (index
*k*)

The seq ijk matrix multiplication code looks as follows:

for(size_t i=0; i<M; i++) for(size_t j=0; j<P; j++) { C[i][j] = 0.0; for(size_t k=0; k<P; k++) C[i][j] += A[i][k]*B[k][j]; }

See the matmul example in the FastFlow tutorial source code.

Implement by using the FastFlow MDF pattern (ff_mdf), the matrix multiplication algorithm starting from the following sequential code (matrices are of size NxN, double precision elements) :

for(size_t i=0;i<N; ++i) for(size_t j=0;j<N;++j) { A[i*N+j] = i+j; B[i*N+j] = i*j; } for(size_t i=0;i<N; ++i) for(size_t j=0;j<N;++j) { C[i*N +j] = 0; for(size_t k=0;k<N;++k) { C[i*N + j] += A[ i*N+k ]*B[ j*N+k ]; // B is transposed ! } }

The initialization of the two matrices A and B has to be overlapped with the computation of the elements of the C matrix.

Sample code here.

Sample code here,

magistraleinformaticanetworking/spm/spm14exe1.txt · Ultima modifica: 17/12/2014 alle 06:29 (10 anni fa) da Massimo Torquati