2015年3月4日 星期三

[Parallel Algo.] 10. Balanced Binary Tree

Parallel Algorithm

10. Balanced Binary Tree

Main idea


      [1] Build a balanced binary tree on the input (or output) elements
           用input以balanced binary tree的方式建起一棵樹

      [2] Traverse the tree forward or backward to and from the root
           從root或往root尋訪整棵樹來找出答案

Example: Summing 

InputA[1...n] = {3, 7, 8, 3, 9, 2, 3, 7} (n=8) 
Output: A[1] + A[2] + ... + A[n]





[Pseudo code]


for k = m 􏰏 1 step –1 to 0 do
      for
i, 1 􏰁 i 􏰁 2k, pardo A[i] = A[2i 􏰏1] + A[2i

[Parallel Algo.] 11. Pointer Jumping(Doubling)

Parallel Algorithm

11. Pointer Jumping (Doubling)

Main idea

      每次和距離乘以二的processor的資料做運算直到遇到self-loop root為止

      Original version: 

          The computation proceeds by a recursive application of the calculation in hand to all   
          elements over a certain distance (in the data structure) from each individual element. 
          This distance doubles in successive steps. 

Example: Parallel prefix on a rooted directed tree
 

Input: [1...n] = {2, 5, 2, 7, 5, 8, 6, 3, 1} and W[1...n] = {1, 2, 3, 1, 0, 3, 1, 2, 3}.
           (P[i] and W[i], is the parent and weight of node i, respectively.)Output: For each node i, W[i] is set equal to the sum of the weights of nodes on the path from i to the root of its tree. 



 The tree is self-loop at its root and the weight of the root is 0.

      [Pseudo code]

      for i from 1 􏰁 to 􏰁 n, parallel do S[i] = P[i]
      for k = 1 to log n do

             for i from 1 􏰁 to 􏰁 n, parallel do 
             begin
                   W[i] = W[i] + W[S[i]]
                   S[i] = S[S[i]] 
             end 

[Parallel Algo.] 9. Partition

Parallel Algorithm

9. Partition

Main idea

      [1] Break up the given problem into p independent subproblems of almost equal size
          將問題分成p個獨立且大小幾乎一樣的子問題

     [2] Solve the p subproblems concurrently, where p is the number of processors available
          同時解決p個問題,其中p是可使用的processor數量

      Next, We discuss the difference between D&G and Partition:
      接下來,我們討論其和D&G的不同

      Divide-and-conquer: usually lies in the merging of sub-solutions
                                       通常倚賴在子結果的整合上

      Partition: lies in carefully partitioning the problem into independent sub-problems
                     倚賴在如何把問題分成獨立的子問題

Example: Merging

      Input: two sorted sequences A[1...n] and B[1...n
      Output: a sorted sequence C[1...2n]

(RA[i] be the rank of A[i] in (the number of values <= A[i] in B.) RB[i] is defined similarly. )


Stage 1, Determine RA[m], 1 <= i <= n/m. Then, partition A and B into Ai's and Bi's respectively, where Ai = {A[(i–1) x m+1], ..., A[i x m]} and Bi = {B[RA[(i–1) x m]–((i– 1) x m)+1], ..., B[RA[i x m]–i x m]}. 



Stage 2, For each pair of Ai and Bi, 1 <= i <= n/m, if |Bi|=O(log n), then rank all elements in Ai and Bi sequentially. Otherwise, apply the partition technique used in stage 1 to partition Ai and Bi into Ai,j's and Bi,j's of size O(log n) (in this case, Bi plays the role of A and Ai plays the role of B). And then, for each pair of Ai,j and Bi,j, rank all elements in Ai,j and Bi,j sequentially.
 
Remark:

The merge problem can be viewed as that of determining RA[1...n] and RB[1...n]. For simplicity, assume that all elements in A and B are distinct.


[Parallel Algo.] 平行程式設計技巧Index

Parallel Algorithm Index [未完]

1. Symmetry Breaking
因為 problem 有對稱性, 導致無法平行處理,於是先把問題的對稱性打破, 再去平行解

2. Brnet's Theorem
原本是很多個 processors 處理 M 工作量,花費的時間是 T 可以用較少的 P processors 來 simulate,時間變成 O(M / P + T),藉此降低cost

3. Euler-Tour Technique

4. Tree Contraction

"同時"利用 rake operations 可以在 O(lg n) 回合後,把 tree 變成 3-node binary tree

5. Sampling
透過取樣來估計答案,或縮小答案可能的範圍

6. Overlapping
將 computation 和 communication (或 broadcasting) 結合,兩件事同時做

7. Multi-level
將一回合的暴力做法分成多個 stages, 降低暴力法的 cost

8. Divide & Conquer [click]
將輸入分成幾乎相同大小的子問題,平行且遞迴地解決每個子問題,將每個子問題的輸出結合在一起

9. Partition [click]
將問題分成p個獨立且大小幾乎一樣的子問題,同時解決p個問題,其中p是可使用的processor數量

10. Balanced Binary Tree Method [click]
用input以balanced binary tree的方式建起一棵樹並從root或往root尋訪整棵樹來找出答案
11. Pointer Jumping(Doubling) [click]
每次和距離乘以二的processor的資料做運算直到遇到self-loop root為止

12. Accelerated Cascading
13. Pipeline
14. Double Logarithm-depth Tree

[Parallel Algo.] 8. Divide & Conquer

Parallel Algorithm

8. Divide & Conquer

Main idea
      
      [1] Partition the input into subproblems of almost equal sizes
            將輸入分成幾乎相同大小的子問題

      [2] Solve recursively the subproblems in parallel
            平行且遞迴地解決每個子問題

      [3] Combine the solutions of the subproblems
            將每個子問題的輸出結合在一起

Example: Convex hull

      Input: a set S = {v1, v2, ..., vn} of n points
      Output: The convex hull CH(S) of S (clockwise)

      In the following figure, S is our input points, CH(S) is the output convex hull, 
      UH(S) is the upper hull of S and LH(S) is the lower hull of S.




Step 0, Sort all the points in S by their x coordinates. compute UH(S).

Algorithm UH(S)
Step 1, If |S| <= 4, then use a brute-force method to determine UH(S), and exit.
Step 2, Let S1 = {V1, V2, ..., Vn/2}and S2 = {Vn/2+1, ..., Vn-1, Vn}
Step 3, Recursively, do step 1 with S1 and S2.
Step 4, Find the upper common tangent between UH(S1) and UH(S2), and deduce UH(S)



Similarly, we can use this idea to find LH(S). Combine LH(S) and UH(S) to find CH(S).

D&G used in Step 2 ~Step 4, the Algorithm UH(S).

2015年1月13日 星期二

[String Matching] Suffix Array and Suffix Tree

[Suffix Array]


Suffix Array(a.k.a. SA) is a sorted array of all suffixes of a string.
Suffix Array(簡稱SA)是一個字串的所有suffix排序後存起來的陣列,例如:

String S = "banana$"

All of its suffixes are listed below
S的所有suffix都列在下方:

row isuffix
1 banana$
2 anana$
3 nana$
4 ana$
5 na$
6 a$
7 $

After sorting in lexicographical order will looked like this
以字母順序排序後看起來會像是這樣:

row isuffix
7 $
6 a$
4 ana$
2 anana$
1 banana$
5 na$
3 nana$

So, the SA of S is 
所以S的SA是
{ 7, 6, 4, 2, 1, 5, 3 }

[Suffix Tree]

We still use S = "banana$" as our example
我們仍然用S = "banana$"做為我們的範例

The root is empty and the smallest of its child should be on the left subtree and the larger should be on the right subtree.
Root必須是空的,且ROOT最小的child要被放在左邊,最大的要在右邊

First, we put the root and its smallest child on the graph
首先,我們把ROOT和其最小的child放在圖上


The next one we should put on is "a$", but "ana$" and "anana$" both start with "a", so we draw like this:
我們下個要放的應該是"a$",但是"ana$" 和 "anana$都是以"a"開頭,所以我們要這樣畫:
Clearly, we can see that if 2 or more suffixes have the same prefix, then the prefix will be their parent.
我們可以看出來兩個suffix如果有相同的prefix,那那串相同的prefix就是他們的parent。

Finally, out suffix tree with contents of SA on it will look like this:
最後,加上SA的內容後,我們的suffix tree會長成這樣:





2014年3月7日 星期五

[openGL] implementation of radiosity

以下直接略過數學公式的解釋,直接開始介紹 implementation 的方法:

[一] 原理

      將每個 face 分成多個 patch,以熱輻射的觀點,每個 patch 都會有散發出熱能,

      而每個 patch 的顏色都會被附近 patch 的熱能所散發的輻射所影響,

      form factor 就是一個能量在 patch 之間傳遞比例的影響公式

[二] 想法

      因為在 form factor 中,每個 patch 的面積、patch 跟 patch 之間相鄰的距離也會有影響,

      為了簡化計算,我們將每個 patch 的面積都設為 1,

      同時以 hemicube 的概念來假設任兩 patch 之間的距離也都是 1,如此簡化計算。

      另外一個問題是某兩個 patch 不一定是面對面的情況,如圖,

      綠色面不應該傳遞熱輻射到 patch,即該 face 上所有 patch 不會影響目前 patch 的顏色

   

      因此將以 patch 所能看到的 "畫面" 來決定哪些面的 form factor 應該會影響到目前 patch。

      憑藉著這樣的概念,我們將目前這個 path 所看到的畫面切割成 8 x 8個方塊,


      又因為每個 patch 都可以看到一個切割成 8 x 8 個方塊的畫面,

      且透過之前將 form factor 簡化的方式,如此一來,對於每個 patch i 和 patch j 來說,

      因為 form factor 只要是被兩 patch 之間的距離和兩 patch 的面積所影響,

      patch i 所看到方塊 (0, 0) 傳過來的能量比例跟 patch j 所看到方塊 (0, 0) 傳過來的相同,

      所以少去了每個 patch 都要重新計算 form factor 的情形

[三] pseudo code

      Step 1: Divide each face into many patches (squares)
                  把每一個面分成多個小方塊應該不難吧

      Step 2: Calculate the form factor of all patches
                  由上面的想法可以知道,只要計算一次就好

      Step 3: Calculate the color with form factor
                  將所看到的畫面中所有方塊的顏色乘以該方塊的 form factor
                  再加上目前 patch 的顏色後平均即為能量傳遞一次的新顏色

      Step 4: Interpolation of adjacent patches
                  因為這樣出來的情形是一格一格的,所以對相鄰的兩個 patch 做 interpolate
                  來模糊格子之間的邊界,但是範例中是去用貼圖幫我模糊掉的    

[四] 後記

      因為不好解釋,所以附上程式碼,有參考其他人做得之後再做過修改