language, linguistics, natural language processing
Library and Information Science | Linguistics
This paper presents an analysis of the legislative speech records from the 101st-108th U.S. Congresses using machine learning and natural language processing methods. We use word vectors to represent the speeches in both the Senate and the House, and then use text categorization methods to classify the speakers by their ideological positions. The classification accuracy indicates the level of distinction between the liberal and the conservative ideologies. Our experiment results demonstrate an increasing partisanship in the Congress between 1989 and 2006. Ideology classifiers trained on the House speeches can predict the Senators' ideological positions well (House-to-Senate prediction), however the Senate-to-House prediction is less successful. Our results provide evidence for a long-term increase in partisanship in both chambers with the House consistently more ideologically divided than the Senate.
Yu, B. and Diermeier, D. (2010). A longitudinal study of language and ideology in Congress. The 68th National Conference of Midwest Political Science Association, Chicago, IL, April 2010.