A Multi-purpose Time Series Data Standardization Approach

In this work, a novel data transformation method aiming at multi-purpose data standardization and inspired by gene-centric clustering approaches is proposed. The idea is to perform data standardization via template matching of each expression profile with the rest of the expression profiles employing Dynamic Time Warping (DTW) alignment algorithm(1) to measure the similarity between the expression profiles. This algorithm facilitates the identification of a cluster of genes whose expression profiles are related, possibly with a non-linear time shift, to the profile of the gene supplied as a template. Consequently, for each gene profile a varying number (based on the degree of similarity) of neighboring gene profiles is identified to be used in the subsequent standardization phase. The latter uses a recursive aggregation algorithm(2) in order to reduce the set of neighboring expression profiles into a single profile representing the standardized version of the profile in question.

(1) Sakoe,H. and Chiba,S. (1978) Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. on Acoust., Speech, and Signal Process, ASSP-26, 43-49.

(2) Tsiporkova, E. and Boeva, V. Nonparametric Recursive Aggregation Process. Kybernetika. J. of the Czech Society for Cybernetics and Inf. Sciences 40 1 (2004) 51-70.