Complexity measures of symbolic sequences and their application to DNA analysis
Dr Nadia Chuzhanova, Department of Computer Science, Cardiff University.
Abstract: In this talk, I will introduce two generalizations of the Ziv-Lempel complexity measure. Both of them take into account direct, symmetric, and any isomorphic repeats occurring in symbolic sequences. By isomorphic repeats I mean fragments that are identical or symmetric modulo some permutation of the alphabet letters. The first measure, a complexity vector, is designed for small alphabets and allows to characterize a sequence (especially a long one) by the number of complexity values. It shows what type of regularities is predominant in a text being analysed. The second (scalar) measure can be used for alphabets of arbitrary cardinality. These measures could be computed in linear time. I will show how to use these measures to explore both the similarity and modularity of genetic sequences. Some interesting structures related to growth hormone gene promoters will be reported. I will give a brief description of other approaches being developed with their applications.