Combining machine-readable lexical resources with a principle-based parser

Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)


Information Science and Technology


Sung Myaeng


natural language, Information Systems, Linguistics

Subject Categories

Library and Information Science


This research was motivated by the premise that the ability to process unconstrained, natural language text would ultimately provide information retrieval (scIR) with a very useful tool. To date, most syntactic based Natural Language Processing (scNLP) systems that support scIR have taken one of two approaches: domain independent syntactic processing; or syntactic and semantic processing in limited domains. The purpose of this research was to investigate an approach to domain independent semantic processing--the combination of a principle based parser (scPBP) with a semantically enhanced machine-readable dictionary (scMRD).

The parser is an implementation of Chomsky's Government-Binding (scGB) theory and therefore provides complete syntactic coverage. The coverage of a parsing system is, however, ultimately a function of the size and richness of its lexicon. To provide both size and richness, the lexicon for the system was extracted from Longman's Dictionary of Contemporary English (scLDOCE) and semantically enhanced using Roget's International Thesaurus.

The research investigated: (1) the impact of using an scMRD as the lexicon for a scPBP; (2) the automatic extraction of thematic roles from the scMRD; and (3) methods to enhance those roles using Roget's.

The results show that (1) An scMRD can indeed be used with a PBP though the larger, more ambiguous lexicon requires controls in the parser to avoid producing a large forest of candidate parse trees. With such controls, the impact of the larger lexicon becomes no greater for a scPBP than for a traditional phrase structure grammar (ex., scATN, scAPSG) dealing with lexical ambiguity. (2) scLDOCE contains patterns in its definitions that can be exploited in the determination of thematic roles; a simple form of semantics. The majority of these roles were extracted using simple lexical patterns. (3) The simple thematic roles can be enhanced using semi-automatic methods. A decomposition of Roget's hierarchy allowed for a procedural mapping of the simple thematic roles to over 1000 roles with 7 levels of abstraction. It is anticipated, but not shown here, that the enhanced roles will provide an improvement in scIR capabilities over the simpler thematic roles.


Surface provides description only. Full text is available to ProQuest subscribers. Ask your Librarian for assistance.