- Parallel Loops
- Parallel Matrix Multiplication
- Barriers in Parallel Loops
- Parallel One-Dimensional Iterative Averaging
- Iteration Grouping/Chunking in Parallel Loops
Coursera - Parallel Programming in Java Week3
http://www.omsn.de/blog/how-to-parallelize-loops-with-java-7-fork-join-framework
https://github.com/habanero-rice/PCDP
https://stackoverflow.com/questions/42342502/how-to-parallelize-loops-with-java-8s-fork-join-framework
Parallel Loops
Pointer chasing example :
finish {
for (p = head; p != null ; p = p.next) async compute(p);
}
Counted-for loops (the number of iterations is known on entry on the loop):
forall (i : [0:n-1]) a[i] = b[i] + c[i]
Above vector addition example can also be written with Java Stream :
a = IntStream.rangeClosed(0, N-1).parallel().toArray(i -> b[i] + c[i]);
Reading : Executing Streams in Parallel
Parallel Matrix Multiplication
c [ i ][ j ] = ∑n−1k=0 a[ i ][ k ] ∗ b[ k ][ j ]
Sequential algorithm :
for(i : [0:n-1]) {
for(j : [0:n-1]) { c[i][j] = 0;
for(k : [0:n-1]) {
c[i][j] = c[i][j] + a[i][k]*b[k][j]
}
}
}
we can see that it is safe to convert for-i and for-j into forall loops, but for-k must remain a sequential loop to avoid data races. There are some trickier ways to also exploit parallelism in the for-k loop, but they rely on the observation that summation is algebraically associative even though it is computationally non-associative.
Reading : Matrix Multiplication Algorithm
Barriers in Parallel Loops
forall (i : [0:n-1]) {
myId = lookup(i); // convert int to a string
print HELLO, myId;
next(); // barrier
print BYE, myId;
}
Inserting a barrier between the two print statements could ensure that all HELLO’s would be printed before any BYE’s.
Thus, barriers extend a parallel loop by dividing its execution into a sequence of phases. While it may be possible to write a separate forall loop for each phase, it is both more convenient and more efficient to instead insert barriers in a single forall loop, e.g., we would need to create an intermediate data structure to communicate the myId values from one forall to another forall if we split the above forall into two (using the notation next) loops. Barriers are a fundamental construct for parallel loops that are used in a majority of real-world parallel applications.
Parallel One-Dimensional Iterative Averaging
- Stencil Code
- Using barrier to reduce the number of tasks created from N^2 to N
Iteration Grouping/Chunking in Parallel Loops
Reduce the number of tasks created by Grouping or Chunking.
- by block
- by cyclic