2015年3月4日 星期三

[Parallel Algo.] 10. Balanced Binary Tree

Parallel Algorithm

10. Balanced Binary Tree

Main idea


      [1] Build a balanced binary tree on the input (or output) elements
           用input以balanced binary tree的方式建起一棵樹

      [2] Traverse the tree forward or backward to and from the root
           從root或往root尋訪整棵樹來找出答案

Example: Summing 

InputA[1...n] = {3, 7, 8, 3, 9, 2, 3, 7} (n=8) 
Output: A[1] + A[2] + ... + A[n]





[Pseudo code]


for k = m 􏰏 1 step –1 to 0 do
      for
i, 1 􏰁 i 􏰁 2k, pardo A[i] = A[2i 􏰏1] + A[2i

[Parallel Algo.] 11. Pointer Jumping(Doubling)

Parallel Algorithm

11. Pointer Jumping (Doubling)

Main idea

      每次和距離乘以二的processor的資料做運算直到遇到self-loop root為止

      Original version: 

          The computation proceeds by a recursive application of the calculation in hand to all   
          elements over a certain distance (in the data structure) from each individual element. 
          This distance doubles in successive steps. 

Example: Parallel prefix on a rooted directed tree
 

Input: [1...n] = {2, 5, 2, 7, 5, 8, 6, 3, 1} and W[1...n] = {1, 2, 3, 1, 0, 3, 1, 2, 3}.
           (P[i] and W[i], is the parent and weight of node i, respectively.)Output: For each node i, W[i] is set equal to the sum of the weights of nodes on the path from i to the root of its tree. 



 The tree is self-loop at its root and the weight of the root is 0.

      [Pseudo code]

      for i from 1 􏰁 to 􏰁 n, parallel do S[i] = P[i]
      for k = 1 to log n do

             for i from 1 􏰁 to 􏰁 n, parallel do 
             begin
                   W[i] = W[i] + W[S[i]]
                   S[i] = S[S[i]] 
             end 

[Parallel Algo.] 9. Partition

Parallel Algorithm

9. Partition

Main idea

      [1] Break up the given problem into p independent subproblems of almost equal size
          將問題分成p個獨立且大小幾乎一樣的子問題

     [2] Solve the p subproblems concurrently, where p is the number of processors available
          同時解決p個問題,其中p是可使用的processor數量

      Next, We discuss the difference between D&G and Partition:
      接下來,我們討論其和D&G的不同

      Divide-and-conquer: usually lies in the merging of sub-solutions
                                       通常倚賴在子結果的整合上

      Partition: lies in carefully partitioning the problem into independent sub-problems
                     倚賴在如何把問題分成獨立的子問題

Example: Merging

      Input: two sorted sequences A[1...n] and B[1...n
      Output: a sorted sequence C[1...2n]

(RA[i] be the rank of A[i] in (the number of values <= A[i] in B.) RB[i] is defined similarly. )


Stage 1, Determine RA[m], 1 <= i <= n/m. Then, partition A and B into Ai's and Bi's respectively, where Ai = {A[(i–1) x m+1], ..., A[i x m]} and Bi = {B[RA[(i–1) x m]–((i– 1) x m)+1], ..., B[RA[i x m]–i x m]}. 



Stage 2, For each pair of Ai and Bi, 1 <= i <= n/m, if |Bi|=O(log n), then rank all elements in Ai and Bi sequentially. Otherwise, apply the partition technique used in stage 1 to partition Ai and Bi into Ai,j's and Bi,j's of size O(log n) (in this case, Bi plays the role of A and Ai plays the role of B). And then, for each pair of Ai,j and Bi,j, rank all elements in Ai,j and Bi,j sequentially.
 
Remark:

The merge problem can be viewed as that of determining RA[1...n] and RB[1...n]. For simplicity, assume that all elements in A and B are distinct.


[Parallel Algo.] 平行程式設計技巧Index

Parallel Algorithm Index [未完]

1. Symmetry Breaking
因為 problem 有對稱性, 導致無法平行處理,於是先把問題的對稱性打破, 再去平行解

2. Brnet's Theorem
原本是很多個 processors 處理 M 工作量,花費的時間是 T 可以用較少的 P processors 來 simulate,時間變成 O(M / P + T),藉此降低cost

3. Euler-Tour Technique

4. Tree Contraction

"同時"利用 rake operations 可以在 O(lg n) 回合後,把 tree 變成 3-node binary tree

5. Sampling
透過取樣來估計答案,或縮小答案可能的範圍

6. Overlapping
將 computation 和 communication (或 broadcasting) 結合,兩件事同時做

7. Multi-level
將一回合的暴力做法分成多個 stages, 降低暴力法的 cost

8. Divide & Conquer [click]
將輸入分成幾乎相同大小的子問題,平行且遞迴地解決每個子問題,將每個子問題的輸出結合在一起

9. Partition [click]
將問題分成p個獨立且大小幾乎一樣的子問題,同時解決p個問題,其中p是可使用的processor數量

10. Balanced Binary Tree Method [click]
用input以balanced binary tree的方式建起一棵樹並從root或往root尋訪整棵樹來找出答案
11. Pointer Jumping(Doubling) [click]
每次和距離乘以二的processor的資料做運算直到遇到self-loop root為止

12. Accelerated Cascading
13. Pipeline
14. Double Logarithm-depth Tree

[Parallel Algo.] 8. Divide & Conquer

Parallel Algorithm

8. Divide & Conquer

Main idea
      
      [1] Partition the input into subproblems of almost equal sizes
            將輸入分成幾乎相同大小的子問題

      [2] Solve recursively the subproblems in parallel
            平行且遞迴地解決每個子問題

      [3] Combine the solutions of the subproblems
            將每個子問題的輸出結合在一起

Example: Convex hull

      Input: a set S = {v1, v2, ..., vn} of n points
      Output: The convex hull CH(S) of S (clockwise)

      In the following figure, S is our input points, CH(S) is the output convex hull, 
      UH(S) is the upper hull of S and LH(S) is the lower hull of S.




Step 0, Sort all the points in S by their x coordinates. compute UH(S).

Algorithm UH(S)
Step 1, If |S| <= 4, then use a brute-force method to determine UH(S), and exit.
Step 2, Let S1 = {V1, V2, ..., Vn/2}and S2 = {Vn/2+1, ..., Vn-1, Vn}
Step 3, Recursively, do step 1 with S1 and S2.
Step 4, Find the upper common tangent between UH(S1) and UH(S2), and deduce UH(S)



Similarly, we can use this idea to find LH(S). Combine LH(S) and UH(S) to find CH(S).

D&G used in Step 2 ~Step 4, the Algorithm UH(S).