
Universal DNA Sequence Representation and Prediction of Coding


Stephen S-T. Yau

15:30:00 - 16:30:00

Universal DNA Sequence Representation and Prediction of Coding

308 , Mathematics Research Center Building (ori. New Math. Bldg.)

Universal DNA Sequence Representation and Prediction of Protein Coding Region" Graphical representation of DNA sequence provides a simple and intuitive way of viewing, anchoring and computing various gene structures, so a simple and non-degenerate graph representation is attractive to both biologists and computational biologists. We shall present a universal graphical representation for DNA sequence which is a generalization of S. S.-T. Yau's method. The method adopts a trigonometric function to represent the four nucleotides A, G, C, T. We exploit the frequency analysis with our representations on DNA sequences, which demonstrates possible applications in coding regions prediction of proteins. Based on the statistically experimental results from the frequency analysis, a simple coding region predictor and an optimized one are presented. Experiment on a broadly accepted ROSETTA data set demonstrates that the performance of the optimized predictor is comparable to that of the other popular methods. This is a joint work with Xianyang Jiang and Dominique Lavenier.