Fusing Time Series Expression Data through Hybrid Aggregation and Hierarchical Merge

It is a common practice nowadays to monitor a particular biological phenomenon in multiple high-throughput experiments. A multi-experimental setup has the advantage of providing diverse evidence about gene role and function and consequently, may lead to interesting insights into the underlying gene interaction mechanisms of this phenomenon. The ability to reliably combine data from different microarray studies together is therefore of a crucial importance for the microarray data mining results. This work(1) combines hybrid aggregation with hierarchical alignment and merge algorithms for integration of time series expression data coming from different experiments. The proposed algorithms are evaluated and demonstrated on gene expression time series data coming from a study examining the global cell-cycle control of gene expression in fission yeast Schizosaccharomyces pombe(2).

Initially p-values for regulation(3) are calculated for each gene in each experiment. These are subsequently aggregated together in a recursive fashion employing a set of different aggregation operators. The convergence of the recursive aggregation process results in assigning to each gene an overall p-value, which can be interpreted as the consensus p-value supported by all the different experiments. These consensus p-values are further used to select a subset of genes, e.g., either by using a predefined p-value threshold or retaining a certain percentage of the genes with the lowest p-values, which are eventually of interest for the studied biological phenomenon. The multiple-experiment expression profiles of the selected genes are then fused together via a hierarchical merge procedure. This employs Dynamic Time Warping (DTW) alignment techniques(4) in order to account adequately for the eventual phase shift between the different experiments.

(1) Tsiporkova,E. and Boeva,V. Fusing Time Series Expression Data through Hybrid Aggregation and Hierarchical Merge. Bioinformatics (2008) 24(16): i63-i69.

(2) Rustici,G., Mata, J., Kivinen, K., Lio, P., Penkett, C. J., Burns, G., Hayles, J., Brazma, A., Nurse, P., Bähler, J. Periodic gene expression program of the fission yeast cell cycle, Nature Genetics, 36 (2004) 809-817.

(3) de Lichtenberg, U., Jensen, L.J., Fausbøll, A., Jensen, T.S., Bork, P., Brunak, S. Comparison of computational methods for the identification of cell cycle-regulated genes. Bioinformatics 21 7 (2004) 1164-1171.

(4) Sakoe,H. and Chiba,S. Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. on Acoust., Speech, and Signal Process, ASSP-26 (1978) 43-49.

Technical University of Sofia-branch Plovdiv, Tsanko Dyustabanov 25, 4000 Plovdiv, Bulgaria