DI-logo

Data Fusion Phase

A hierarchical merge procedure combines pairwise the multiple gene profiles into a uniques expression profile. The algorithm performs exactly n - 1 iterations (in case of n different data matrices), as depicted in the below figure, and each iteration consists of the following 3 distinctive steps:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  1. Pairwise alignment. The newly constructed expression matrices (see Feature selection) are aligned pairwise against each other using the DTW alignment algorithm(1). Subsequently, the optimal DTW alignment path and the corresponding DTW distance for each matrix pair are obtained. The expression values need to be standardized (subtracting the mean and dividing by the standard deviation) before performing the DTW alignment. A detail explanation of DTW algorithm can be found in the article of Hermans and Tsiporkova(2) or downloaded from GenTχWarper tool(3) website.

  2. Optimal order. Assume that the expression matrices are numbered from 1 to n. The goal is to re-order them in such a way that the subsequent merge is performed on matrix pairs which are relatively close to each other in terms of DTW distance. The matrix pair with the lowest DTW distance is first identified. Then from the remaining (n - 2) matrices, the one with the lowest DTW distance to the one of the first two needs to be found. Subsequently, this matrix is appended and a new list is created. Then the same procedure as above is repeated, but this time for the two border matrices of the new list. One needs to carry on in this fashion until no matrices are left in the original list.

  3. Pairwise merge. Each adjacent matrix pair, i.e. (1,2),(2,3),..., (n - 1,n) is merged, including the timing information, into a single matrix by just taking the mean of each aligned value pair as specified by the optimal DTW alignment path.

Thus n - 1 new merged matrices are created: 1&2,..., (n - 1)&n. These can again be aligned pairwise, then ordered and subsequently merged generating another n - 2 matrices: 1&2&3,..., (n - 2)&(n - 1)&n. At step n - 1, the alignment and merge process converges into a unique matrix 1&2& ... &n.


(1) Sakoe,H. and Chiba,S. Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. on Acoust., Speech, and Signal Process, ASSP-26 (1978) 43-49.

(2) Hermans,F. and Tsiporkova,E. Merging microarray cell synchronization experiments through curve alignment, Bioinformatics, 23 (2007) e64-e70.

(3) Criel, J. and Tsiporkova, E. Gene Time Eχpression Warper: A tool for alignment, template matching and visualization of gene expression time series. Bioinformatics 22 2 (2006) 251-252.

Technical University of Sofia-branch Plovdiv, Tsanko Dyustabanov 25, 4000 Plovdiv, Bulgaria