Document Type

Conference Document

Date

10-2010

Keywords

Computational social science, evaluation, corpora, text categorization, topic clustering

Language

English

Disciplines

Library and Information Science | Linguistics

Description/Abstract

In this poster we describe a pilot study of searching social science literature for legacy corpora to evaluate text mining algorithms. The new emerging field of computational social science demands large amount of social science data to train and evaluate computational models. We argue that the legacy corpora that were annotated by social science researchers through traditional Qualitative Data Analysis (QDA) are ideal data sets to evaluate text mining methods, such as text categorization and clustering. As a pilot study, we searched articles that involve content analysis and discourse analysis in leading communication journals, and then contacted the authors regarding the availability of the annotated texts. Regretfully, nearly all of the corpora that we found were not adequately maintained, and many were no longer available, even though they were less than ten years old. This situation calls for more effort to better maintain and use legacy social science data for future computational social science research purpose.

Recommended Citation

Yu, B. and Ku, M. (2010). Collecting legacy corpora from social science research for text mining evaluation. ASIST 2010 Annual Meeting, Pittsburgh, PA, October 22-27, 2010

Collecting Legacy Corpora from Social Science Research for Text M_accessible.pdf (176 kB)
Accessible PDF version

Creative Commons License

This work is licensed under a Creative Commons Attribution 3.0 License.

Download

Included in

Library and Information Science Commons, Linguistics Commons

COinS

School of Information Studies - Faculty Scholarship

Collecting Legacy Corpora from Social Science Research for Text Mining Evaluation

Document Type

Date

Keywords

Language

Disciplines

Description/Abstract

Recommended Citation

Creative Commons License

Included in

Browse

Search

Author Resources

Links

School of Information Studies - Faculty Scholarship

Collecting Legacy Corpora from Social Science Research for Text Mining Evaluation

Author(s)/Creator(s)

Document Type

Date

Keywords

Language

Disciplines

Description/Abstract

Recommended Citation

Creative Commons License

Included in

Share

Browse

Search

Author Resources

Links