Honours and Masters projects offered in 2010

A Novel Method for Building Phylogenetic Trees

Phylogeny is the study of the relatedness of species. The way this is done these days is through the computational analysis of genes in living organisms (after all, we can't go back in time to track speciation events as they happened). The phylogeny of organisms is often depicted as phylogenetic trees and there is a considerable literature on how best to create such trees. Most methods take as input data from a single gene or protein sequences. That is, the same gene is found in all the species of interest and then compared to build the tree. The problem with this approach is that it assumes that the gene is "typical" and that evolutionary pressures have acted in the same way across all the species to shape that gene. A second problem is to find a gene that is both ubiquitous and conserved in its function, but with sufficient variability to differentiate the various species possessing that gene. In this project you will create an application which extracts data generated by an existing genome analysis application as it traverses whole bacterial chromosomes, and then, after normalising the elements of the data vectors, try different methods for building phylogenetic trees from the data. In other words, rather than trying to find the idea gene around which to build a tree, this method will use all the data available in chromosomes or, by extension, entire genomes.

Computing the Genetic Basis for the Sit-and-Wait Hypothesis of Bacterial Pathogenicity

The Sit-and-Wait Hypothesis (Walther & Ewald 2004) relates the environmental durability of non-vector-borne microbial pathogens to their pathogenicity. Non-vector-borne pathogens, such as E. coli are passed directly from host to host, rather than being transmitted by another vector species, such as mosquitos. (For example, Ross River Fever, seen around Perth, is mosquito borne, while influenza is transmitted from person to person. Both, however, are viruses.) In the hypothesis, durability - the ability to survive the stresses associated with existing for a period outside a host - is, in effect, a cofactor for pathogenicity, in concert with the necessary presence of virulence factors. That is, without an assortment of virulence factors, a microorganism is unable to colonise a host, but if the microorganism is labile, i.e. not able to withstand desiccation, the pathogen cannot afford to be too virulent sick because an immobilised infective host is unable to move and thus unable to spread the infection, and the infection will die out.

GeneOntology (Gene Ontology Consortium 2000) is a set of 3 ontologies specific to molecular biology. The project will involve mapping GeneOntology (GO) terms to genes in bacterial genomes. This will be done either directly from anotations associated with the genes during the annotation of the genomes, or indirectly via text mining for keywords that have been extracted from the annotation using techniques similar to those employed in the Protein Annotators Assistant (Wise 2000). A third track will be association of genes with particular protein families based on hidden markov model searches.

Term extraction will focus on the GeneOntology biological process terms related to tolerance of abiotic stress. Examples might include GO:0042538 (hyperosmotic salinity response), GO:0009414 (response to water deprivation) and GO:0009409 (response to cold). In the hierarchic GeneOntology system, all these terms come under the parent terms GO:0009628 (response to abiotic stimulus) and GO:0006950 (response to stress). The sets of genes related to tolerance of abiotic stress from the different bacterial species will then be correlated with information from the literature about durability of the species and known pathology, much as was reported in Walther and Ewald (2004), except that in this case durability will be marked by the presence of genes thought most closely related to tolerance of abiotic stress.

References and Readings
1. Gene Ontology Consortium (2000) Gene Ontology: tool for the unification of biology, Nature Genetics, 25,25-29.
2. Walther, B. A. and Ewald, P. W. (2004) Pathogen Survival in the External Environment and the Evolution of Virulence, Biological Reviews, 79,849-69.
3. Wise, M. J. (2000) Protein Annotators Assistant: A Novel Application of Information Retrieval Techniques, Journal of the American Society for Information Science (JASIS), 51(12),1131-36, John Wiley.