PhD Project

Supervisors: Prof Amitava Datta and Prof Michael Wise

RNA pseudoknots: a challenging twist

There are two types of nucleic acids in the living cell: deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). For nearly a century, RNA was solely seen as the passive carrier of genetic information from DNA to proteins. In the 1980's, the research field experienced major turbulences with the discovery that RNA has the ability to act as a catalyst. Numerous non-coding RNAs have since been discovered and the prevailing view among scientists is that RNA is an extremely versatile molecule which easily keeps up with the countless functions and structures proteins exhibit.

Biologists demand computational RNA structure prediction methods as laboratory techniques are intricate. Several robust and efficient algorithms for RNA secondary structure prediction are available which neglect crossing structure elements, so-called pseudoknots, for ease of computation. However, pseudoknots have turned out to be of great biological relevance over the last decade. RNA pseudoknots perform essential functions as part of cellular transcription machinery, regulatory processes and viral replication. Consequently, prediction of these functional units in RNA sequences has important implications for antiviral drug design.

From a theoretical point of view, general pseudoknot prediction is not an easy task and has been shown to constitute an NP-complete problem. In general, most practical methods as reported in the literature suffer from high running times and low accuracy for longer sequences.

pk

Figure 1: Three different views of a simple hairpin type RNA pseudoknot.

Proposed Study

Due to their biological relevance and abundance, pseudoknots should no longer be neglected in RNA structure prediction for ease of computation. There is very high demand for structure prediction including pseudoknots, however most existing methods are inefficient and unreliable for longer sequences. In this study, two main points are proposed.

First, the majority of algorithms incorporate restricted types of pseudoknots into an algorithmic framework designed for secondary structure prediction. Generally, this leads to high runtime and usage of a simplified energy model. In contrast, this study proposes a different and much more efficient heuristic route. It is a promising approach to identify pseudoknot candidates as a first step and subsequently verify them in regards to a sophisticated energy model. After pseudoknot detection, the remaining sequence can be folded using state-of-the-art secondary structure prediction methods.

Second, additive energy models are successfully used in RNA secondary structure prediction. However, for pseudoknots this principle fails. A sophisticated pseudoknot energy model has to take into account stem-loop correlations. This is crucial for successful prediction, yet there is no method in the literature which efficiently employs such a model. It is the main goal of this study to incorporate an advanced pseudoknot energy model, resulting in a rapid and reliable pseudoknot prediction method.

From a practical point of view, goal of this project is to look for pseudoknots in a wide range of viral genomes and, where found, note their locations. As structure is strongly related to function, computational prediction can deliver the basis for laboratory experiments on detected pseudoknots. Most methods in the literature are tested only on short sequence stretches containing known pseudoknots. In contrast, the benefit of this approach will be demonstrated by applying it to long sequences and ultimately viral genomes.


Selected References

Biological Background

  • Brierley, I., Pennell, S., and Gilbert, R.J. (2007). Viral RNA pseudoknots: versatile motifs in gene expression and replication. Nat Rev Microbiol, 5 (8): 598-610. Abstract

  • Staple, D.W. & Butcher, S.E. (2005). Pseudoknots: RNA structures with diverse functions. PLoS Biol, 3 (6): e213. Abstract
  • Pseudoknot Detection

  • Sperschneider, J. and Datta, A. (2008). KnotSeeker: Heuristic pseudoknot detection in long RNA sequences. RNA, 14(4): 630-640. Abstract Web Server

  • Huang, C.H., Lu, C.L., and Chiu, H.T. (2005). A heuristic approach for detecting RNA H-type pseudoknots. Bioinformatics, 21 (17): 3501-3508. Abstract
  • Pseudoknot Energy Model

  • Chen, S.J. (2008). RNA folding: conformational statistics, folding kinetics, and ion electrostatics. Annu Rev Biophys, 37 : 197-214. Abstract

  • Cao, S. and Chen, S.J. (2006). Predicting RNA pseudoknot folding thermodynamics. Nucleic Acids Res, 34 (9): 2634-2652. Abstract

  • Gultyaev, A.P., van Batenburg, F.H., and Pleij, C.W. (1999). An approximation of loop free energy values of RNA H-pseudoknots. RNA, 5 (5): 609-617. Abstract