Database and algorithmic applications in nucleic acid sequence, structure and NMR frequencies, and an efficient chemical depiction

Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)




Philip N. Borer


Nucleic acid, Sequence analysis, Molecular structure, Chemical shifts

Subject Categories



A total of 1760 Human Immunodeficiency Virus type 1 (HIV-1) sequences from GeneBank covering the 5 ' -leader region (388 nucleotides long) have been analyzed. Variation at each position is obtained by parsing the output of the sequence alignment program, BLAST. Comparative sequence analysis has been performed and a total of 2218 covariances and highly favorable variances generated. Potential secondary structures for this region have been discussed. Results from this research should be helpful for studies of viral packaging and other interactions with the 5 ' -leader.

The three-dimensional conformation of a 24-nucleotide variant of the RNA binding sequence for the coat protein of bacteriophage R17 has been analyzed using NMR, molecular dynamics, and energy minimization. The unpaired A8-residue is stacked in the stem, and the entire region from G7-C15 in the upper stem and loop appears to be flexible. Several of these residues have a large fraction of S-puckered ribose rings. There is considerable variation in the low-energy loop conformations that satisfy the distance constraints at this preliminary level of refinement. The Shine-Dalgarno ribosome binding site is exposed and only two apparently weak base pairs would have to break for the 16S ribosomal RNA to bind and the ribosome to initiate translation of the replicase gene.

The distribution of 1 H nuclear magnetic resonance chemical shifts of nucleic acids in double stranded regions has been analyzed from a database consisting of 1395 non-exchangeable proton chemical shifts. A three-term expression to describe proton chemical shift has been proposed using a nearest neighbor approach. Values for those three terms have been determined for both sugar and base (H6 or H8) protons for each residue using singular value decomposition. The general agreement between the observed and calculated chemical shifts indicates that the expression accounts for the main effects on nucleic acid chemical shifts. By using this three-term expression, the chemical shifts of nucleic acids can be predicted for double-stranded sequences within 0.1 ppm.

An algorithm has been developed to depict two-dimensional organic structures ranging from simple chain and ring systems to fused, bridged, and caged ring systems. The algorithm combines algorithmic and template strategies to deal with complicated situations, such as structures with mixtures of chains, fused rings, and caged ring systems. The algorithm is implemented in C language on both UNIX and Microsoft Windows platforms.


Surface provides description only. Full text is available to ProQuest subscribers. Ask your Librarian for assistance.