Date of Award

5-14-2023

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Communication Sciences and Disorders

Advisor(s)

Jonathan Preston

Keywords

authomated mispronunciation detection, clinical speech technology, clinical trials, principles of motor learning, rhotics, speech sound disorders

Abstract

Approximately 1-2% of the American population enters adulthood with a residual speech sound disorder (RSSD) that impacts the clear pronunciation of speech sounds, most frequently /ɹ/ in fully-rhotic dialects of American English. RSSD is commonly encountered by clinicians, but traditional treatment practices have not been consistently effective in improving an individual’s clarity of /ɹ/. Unresolved RSSD may lead to a lifelong negative impact on quality of life. Recent research has shown that motor-based intervention, which involves high intensity speech practice that adapts in difficulty, can improve production of /ɹ/ even for those who have not responded to traditional treatment. However, not all who might benefit from sufficiently-intense intervention are able to receive this level of clinical service in real-world scenarios, resulting in an intervention intensity gap between the traditional intensity of available practice and the practice demonstrated to be therapeutic in recent research. Computerized therapy with automatic speech analysis might be one way to narrow the intensity gap, but no available system targets /ɹ/ and existing systems, broadly, are likely insufficient for clinical use. The three fundamental issues impacting the development of effective clinical speech technology systems have been the lack of clinically-relevant speech samples for system training, limited technical descriptions of system development, and few empirical assessments of therapeutic benefit for existing tools. Each of these issues are addressed in this dissertation, which describes the development of the PERCEPT speech analysis Engine. Chapter 1 of this dissertation empirically assesses the benefit of a clinical speech technology system for RSSD in a phase II, multiple baseline single case clinical trial with five participants. Participants in this study received ten sessions of artificial intelligence (AI) assisted motor-based intervention. Practice during nine of these ten sessions was largely conducted by computerized motor-based intervention software, Speech Motor Chaining, that was driven by perceptual predictions from the PERCEPT Engine. This combined tool is called ChainingAI. Study outcomes were derived from masked expert listener perceptual ratings of /ɹ/ produced by learners throughout no-treatment baseline, treatment, and post-treatment phases. Perceptual ratings of /ɹ/ in treated stimuli were rated as having significantly more rhoticity after ChainingAI than directly before, providing efficacy evidence for ChainingAI. Separately, perceptual ratings of /ɹ/ on untreated words showed significant nonoverlap with ratings from the no-treatment baseline phase indicating a response to the AI-assisted treatment package for three of the five participants. All five participants demonstrated statistically significant improvements in /ɹ/ from pre-treatment to post-treatment, with standardized effect sizes ranging from 0.36-1.6 and a group-average of 30% improvement over baseline accuracy. PERCEPT-Clinician agreement when rating the /ɹ/ in practice attempts (i.e., F1-score) was largely within the range of agreement seen between human clinicians for four of five participants. Exploration of survey data indicated that parents and participants largely felt that computerized intervention could positively impact service delivery for those with RSSD, most frequently mentioning hybrid models in which computerized systems facilitate at-home practice. Chapter 2 presents a series of supervised machine learning experiments evidencing the technical development of the PERCEPT-R Classifier. The goal of these experiments was to determine the acoustic features that best distinguish clinically correct and incorrect /ɹ/ in word-level audio recordings from children with RSSD, as well as to train a neural network classifier to predict how a human clinician would have rated these sounds. All testing was done with speakers whom PERCEPT has never heard before, which is important for validly estimating accuracy for future use in therapy. To achieve these goals, formant features and Mel-frequency cepstral coefficient features were extracted from the /ɹ/ within each recorded participant utterance. Shallow and deep neural network classifiers were trained to associate input feature patterns with PERCEPT-R Corpus labels indicating human perceptual judgment of /ɹ/ (i.e., correct/fully rhotic, incorrect/derhotic). Age-and-sex normalized formants outperformed other feature sets. In replicated experiments, the gated recurrent neural network trained on these features outperformed the participant-specific average F1-score from existing literature by 17 points (x̅ = .81, σx = .10, med = .83, n = 48). An explainability analysis indicated that the age-and-sex normalized third formant was the most influential feature in classifier predictions, aligning with acoustic phonetic descriptions of /ɹ/. Exploration of model performance regarding age and sex of participants did not highlight model bias issues in the current set of participants regarding these demographic variables. Chapter 3 details the curation of datasets that permitted the training of the PERCEPT-R Classifier. The open-access PERCEPT Corpora contain over 36 hours of 125,000 syllable, word, and phrase utterances. These data come from children, adolescents, and young adults aged 6-24 with speech sound disorder and age-matched peers, and have been published in PhonBank. Sample educational exercises are included with the chapter appendices to emphasize the educational utility of this corpus. Together, the chapters of this dissertation directly confront three main hindrances to the development of clinical speech technology for RSSD impacting /ɹ/ in American English. This work accelerates the long-term development of paradigm shifting treatment for RSSD through clinician-supervised, AI-assisted precision-treatments that adapt to a child’s specific speech patterns. Such tools may ultimately narrow the intervention intensity gap and improve therapeutic and quality of life outcomes for individuals with RSSD.

Access

Open Access

Share

COinS