Yusong's Blog Don't Panic

Loop Parallelism

2017-08-19

Coursera - Parallel Programming in Java Week3

http://www.omsn.de/blog/how-to-parallelize-loops-with-java-7-fork-join-framework

https://github.com/habanero-rice/PCDP

https://stackoverflow.com/questions/42342502/how-to-parallelize-loops-with-java-8s-fork-join-framework

Parallel Loops

Pointer chasing example :

finish {
for (p = head; p != null ; p = p.next) async compute(p);
}

Counted-for loops (the number of iterations is known on entry on the loop):

forall (i : [0:n-1]) a[i] = b[i] + c[i]

Above vector addition example can also be written with Java Stream :

a = IntStream.rangeClosed(0, N-1).parallel().toArray(i -> b[i] + c[i]);

Reading : Executing Streams in Parallel

Parallel Matrix Multiplication

c [ i ][ j ] = ∑n−1k=0 a[ i ][ k ] ∗ b[ k ][ j ]

Sequential algorithm :

for(i : [0:n-1]) {
  for(j : [0:n-1]) { c[i][j] = 0;
    for(k : [0:n-1]) {
      c[i][j] = c[i][j] + a[i][k]*b[k][j]
    }
  }
}

we can see that it is safe to convert for-i and for-j into forall loops, but for-k must remain a sequential loop to avoid data races. There are some trickier ways to also exploit parallelism in the for-k loop, but they rely on the observation that summation is algebraically associative even though it is computationally non-associative.

Reading : Matrix Multiplication Algorithm

Barriers in Parallel Loops

forall (i : [0:n-1]) {
        myId = lookup(i); // convert int to a string 
        print HELLO, myId;
        next(); // barrier
        print BYE, myId;
}

Inserting a barrier between the two print statements could ensure that all HELLO’s would be printed before any BYE’s.

Thus, barriers extend a parallel loop by dividing its execution into a sequence of phases. While it may be possible to write a separate forall loop for each phase, it is both more convenient and more efficient to instead insert barriers in a single forall loop, e.g., we would need to create an intermediate data structure to communicate the myId values from one forall to another forall if we split the above forall into two (using the notation next) loops. Barriers are a fundamental construct for parallel loops that are used in a majority of real-world parallel applications.

Parallel One-Dimensional Iterative Averaging

  • Stencil Code
  • Using barrier to reduce the number of tasks created from N^2 to N

Iteration Grouping/Chunking in Parallel Loops

Reduce the number of tasks created by Grouping or Chunking.

  • by block
  • by cyclic

Get update from Yusong's blog by Email on → Feedburner

Comments

Content