"A neurogenetic agent system for text categorization and feature select" by Edmund Szu-li Yu

Electrical Engineering and Computer Science - Dissertations

Title

A neurogenetic agent system for text categorization and feature selection

Author

Edmund Szu-li Yu, Syracuse University

Date of Award

2001

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Electrical Engineering and Computer Science

Advisor(s)

Intelligent agents, Neurogenetic, Text categorization, Feature selection

Keywords

Intelligent agents, Neurogenetic, Text categorization, Feature selection

Subject Categories

Computer Sciences | Physical Sciences and Mathematics

Abstract

Text categorization is the problem of automatically assigning predefined categories to natural language texts. A major difficulty of this problem stems from the high dimensionality of its feature space, which normally consists of tens or even hundreds of thousands unique words and phrases. It would be a formidable task to train a neural net with so many inputs. Hence, reducing the dimensionality without sacrificing accuracy is of great importance for neural networks to be successfully applied to text categorization. In our research into neurogenetic algorithms, a genetic algorithm with a special crossover operator is used to search the space of all possible subsets of those words. Each 3-layer feedforward neural net is trained based on the given feature subset, and its performance is used as a factor for determining the fitness of the given candidate feature subset. The other factor being used is what we call 'parsimony,' which is the measure of how many features we have managed to 'save.' The Baldwin effect, which concerns the tradeoffs between learning and evolution, is used to guide and improve the evolution of the feature subsets. Due to the nature of this 'wrapper' approach, our system is also a complete, fully functional text categorization system. Experimental results based on the de facto benchmark data, the Reuters 22173 data set, show that its performance can surpass the neural-net based systems, while using much fewer features. Most recent research in machine learning relies more on the wrapper approach rather than the filter approach, our neurogenetic approach can be considered as the first attempt to exploring the wrapper approach in text categorization. Furthermore, our neurogenetic system has also been applied to text mining on the Web, and combined with the agent technology for the Internet. The resulting system is a collaborative multi-agent system that can automatically classify and filter new or existing Web pages and Web sites, and to mine numerical online data. Benchmark results performed on these aspects of the system also show promise.

Access

Surface provides description only. Full text is available to ProQuest subscribers. Ask your Librarian for assistance.

Recommended Citation

Yu, Edmund Szu-li, "A neurogenetic agent system for text categorization and feature selection" (2001). Electrical Engineering and Computer Science - Dissertations. 113.
https://surface.syr.edu/eecs_etd/113

http://libezproxy.syr.edu/login?url=http://proquest.umi.com/pqdweb?did=726042711&sid=5&Fmt=2&clientId=3739&RQT=309&VName=PQD

Link to Full Text

COinS

Electrical Engineering and Computer Science - Dissertations

Title

Author

Date of Award

Degree Type

Degree Name

Department

Advisor(s)

Keywords

Subject Categories

Abstract

Access

Recommended Citation

Browse

Search

Author Resources

Links

Electrical Engineering and Computer Science - Dissertations

Title

Author

Date of Award

Degree Type

Degree Name

Department

Advisor(s)

Keywords

Subject Categories

Abstract

Access

Recommended Citation

Share

Browse

Search

Author Resources

Links