Interpretation methods for complex and low quality DNA profiles

Date of Award

December 2018

Degree Type


Degree Name

Doctor of Philosophy (PhD)




Roy D. Welch


capillary electrophoresis, DEPArray, DNA interpretation, DNA mixtures, forensic DNA, machine learning

Subject Categories

Physical Sciences and Mathematics


DNA has long been the “gold-standard” for the identification of organisms. It plays a central role in revealing the past by answering the questions who, what, where and when. A priori knowledge of the source of the DNA from a specimen can yield answers that relate to evolution, migration, the spread of disease, familial relationships and criminal activity.

A significant challenge across the biological sciences is the interpretation of DNA profiles generated from non-pristine or complex samples, namely when samples are of low quality, low copy number, or contain multiple contributors. This is of particular concern in subfields focusing on ancient DNA, forensic DNA and processed food DNA, where non-pristine DNA samples are commonplace. Many times, the interpretational complexity of these sample types preclude the ability to generate high confidence conclusions. However, it is possible to decrease interpretational difficulty through: (1) wet-bench procedures that maximize yield and enrich for the DNA targets of interest and (2) implementation of computationally advanced software solutions that intelligently reduce complexity and increase the efficiency of analyses.

The work presented herein represents a multifaceted approach to the analysis of complex DNA samples through four distinct yet contextually related projects: (1) Improvements in DNA isolation methods can significantly decrease the complexity of downstream analyses. The first section describes a method to isolate opium poppy (Papaver somniferum L.) DNA from heroin. The method uses a combination of commercial DNA extraction methods, post-extraction purification and a dual stage, dual polymerase PCR amplification. DNA was successfully isolated from raw and cooked opium and black tar, white and brown powder heroin. (2) The second section describes a pre-extraction cell targeting and collection method for sexual assault samples using the DEPArray™ instrument. We demonstrate that DEPArray™ mediated cell targeting leads to single source male profiles in 96.2% of mixed samples (sperm and epithelial cell), whereas single source profiles were obtained in only 32.2% of samples using traditional differential extraction methods. We further demonstrate that single sperm cell analysis is possible using the DEPArray™ paired with standard PCR techniques used in forensics. (3) Process related components in the laboratory workflow significantly impact the interpretational complexity of DNA profiles. Artifacts introduced during amplification and detection procedures may preclude reliable conclusions regarding individual donors or the number of donors in a DNA profile. We introduce methods to maximize the information content of capillary electrophoresis-based fragment data through statistical and machine learning-based modeling. The system imposes a locus-sample-specific analytical threshold for peak calling, followed by stutter peak correction and a machine-learning algorithm that reduces aberrant signal. We observed an allele detection accuracy of 97.2%, with a less than 1% incidence of misidentified artifacts, whereas the use of a 50RFU threshold yields only an 85.8% accuracy. (4) The objective of forensic DNA mixture analyses is the separation (deconvolution) of the individual donors. Accurate deconvolution relies on the correct estimation of the number of contributors (NOC) in a sample. This prediction can be fraught with complexity due to low levels of DNA, disparate proportions of contributors, and high levels of allele sharing. Standard methods of NOC prediction often underestimate the NOC, particularly in high order mixtures. We have developed a probabilistic machine learning-based method that is over 98% accurate in predicting the number of contributors in samples with one to four contributors. This method outperforms standard maximum allele count methods with over 6% improvement in the classification of three-contributor samples and greater than 20% improvement in classification of NOC in four-contributor samples.


SURFACE provides description only. Full text may be available to ProQuest subscribers. Please ask your Librarian for assistance.

This document is currently not available here.