Document Type

Conference Document

Date

4-2010

Embargo Period

11-22-2010

Keywords

language, linguistics, natural language processing

Language

English

Disciplines

Library and Information Science | Linguistics

Description/Abstract

This paper presents an analysis of the legislative speech records from the 101st-108th U.S. Congresses using machine learning and natural language processing methods. We use word vectors to represent the speeches in both the Senate and the House, and then use text categorization methods to classify the speakers by their ideological positions. The classification accuracy indicates the level of distinction between the liberal and the conservative ideologies. Our experiment results demonstrate an increasing partisanship in the Congress between 1989 and 2006. Ideology classifiers trained on the House speeches can predict the Senators' ideological positions well (House-to-Senate prediction), however the Senate-to-House prediction is less successful. Our results provide evidence for a long-term increase in partisanship in both chambers with the House consistently more ideologically divided than the Senate.