Maximum Likelihood Analysis of Phylogenetic Data
| One of the five demonstrations which Indiana University researchers contributed to the SC98 High Performance Networking and Computing Conference was an extended numerically intense bioinformatic computation carried out on 33 advanced computers geographically dispersed across three continents. The systems, linked by the vBNS, TransPAC, and APAN networks, were at Indiana University, the Institute of High Performance Computing at the National University of Singapore, and the Cooperative Research Centre for Advanced Computational Systems (ACSys CRC) in Australia. |
| Our analysis uses FastDNAml [Olsen et.al. 1994, based on Felsenstein 1981], modified and extended to run on a heterogenous and widely distributed parallel virtual machine. This program computes the likelihood of various phylogenetic trees, based upon experimental results concerning DNA replication modification rates, starting with aligned corresponding DNA sequences from a number of species. It explores all possible phylogenetic trees for an initial small set of species; it then adds additional species, and compares different arrangements to produce a sequence of estimated philogenies. Varying the order of introduction addresses the "local trapping problem." |
| Two data sets were analyzed. In the first data set, contributed by the collaborators at the BioInformatics Center at the National University of Singapore, cytoplasmic coat proteins [involved in intracellular membrane transport] were sequenced from human, rat, bovine and yeast organisms. The second data set addressed the controversial phylogenetic placement of microsporidia [a parasite group including important human pathogens], with a dataset including representatives of most eukaryotic lineages [> 100 taxa]. Some genetic studies find these to be highly degenerate fungi, while others, based upon small subunit rRNA, suggest an ancient eukaryotic lineage; resolving this question bears upon the reliability of ssu rRNA-based phylogenetic analysis. |
|
|
Demonstrating computationally intensive analyses using a globally distributed
collection of computational nodes paves the way for scientists connected by
advanced networks to access remote servers in the worldwide computational grid,
contribute key data sets, and collaborate with distant researchers.
Our initial focus is on molecular biology in Indiana, Singapore, and Australia.
Plans are underway to extend this partnership, addressing questions of performance
analysis, virtual accounting schemes, and the development and expansion of the user community.
Participants were: David Hart, Don Berry, Eric Wernert, Craig Stewart, Will Fischer, Chris Parkinson, Jeff Palmer, Meena Sakharkar, Zhang Lou Xin, and Tan Tin Wee. Special thanks are due to Mary Papakhian and Dan Lauer of IU's Research and Technical Services group, Tan Chee Chiang of the NUS Computer Centre's Supercomputing & Visualisation Unit, and Markus Buchhorn of ACSys CRC. |
IU receives TRANSPAC award







