Title
Empirical selection of NLP-driven document representations for text categorization
Date of Award
5-2006
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Electrical Engineering and Computer Science
Advisor(s)
Can Isik
Keywords
Document representations, Text categorization, Natural language processing, Classification
Subject Categories
Computer Engineering
Abstract
Text Categorization is the task of assigning predefined labels to textual documents. Current research in this field has been focused on using word based document representations called bag-of-words (BOW) with strong statistical learners. Few studies have explored the use of more complex Natural Language Processing (NLP) driven representations based on phrases, proper names and word senses. None of these had definitive results on these features' benefits for text categorization problems.
This dissertation extensively studies the use of NLP-driven document representations captured at many different levels of language processing for text categorization, and shows that NLP-driven document representations improve text categorization. A methodology, called "Empirical Selection Methodology for NLP-driven document representations", was developed to select document representations for each category in the categorization problem. A highly configurable software system was developed to create document representations and carry out experiments. The methodology has been tested on two widely used text categorization evaluation datasets, and showed that statistical learners generalize better with the help of NLP-driven document representations.
Access
Surface provides description only. Full text is available to ProQuest subscribers. Ask your Librarian for assistance.
Recommended Citation
Yilmazel, Ozgur, "Empirical selection of NLP-driven document representations for text categorization" (2006). Electrical Engineering and Computer Science - Dissertations. 33.
https://surface.syr.edu/eecs_etd/33
http://libezproxy.syr.edu/login?url=http://proquest.umi.com/pqdweb?did=1310407271&sid=2&Fmt=2&clientId=3739&RQT=309&VName=PQD