DTWimpute Algorithm
The Dynamic Time Warping (DTW) based algorithm for missing value estimation of time series data, called DTWimpute, employs DTW distance to measure similarity between expression profiles and a dedicated algorithm for construction of a gene estimation list, which indentifies for each gene with missing values a varying number (based on the degree of similarity) of candidate genes for imputation.
Assume that the expression profile of some gene gi has a missing value at time point t.Note that the accuracy of the DTW algorithm may degrade considerably when operating on expression profiles with a very high degree of missing values, due to the fact that there will not be enough data points to perform an optimal DTW alignment. Therefore, the DTWimpute algorithm works in two passes, as illustrated in the figure below.
An initial rough imputation is first performed by filling in the missing values with the mean expression over the respective row or, with the average of two non-missing neighbours in the row or, with another imputation method as for instance, the KNNimpute proposed in the article of Troyanskaya et al. (1). The second pass is a DTW-based algorithm, which first constructs a gene estimation list for each gene with missing values, consisting of genes with expression profiles which exhibit at least minimum relative (preliminary defined) similarity in terms of DTW distance to a target expression profile (one of the gene with missing values). Consequently, the missing value at the time point t of gene gi is estimated by using those values of the genes in the estimation list, which are aligned to time point t of gi. The latter is possible since the missing value at time point t has been estimated during the first pass of the imputation algorithm.
(1) Troyanskaya,O., Cantor,M., Sherlock,G., Brown,P., Hastie,T., Tibshirani,R., Botstein,D., Altman,R.B. (2001) Missing value estimation methods for DNA microarrays, Bioinformatics, 17, no. 6 520-525.