Description/Abstract
When surveys ask about race or ethnicity, a growing number of Americans select more than one category. The multiracial population now represents over 10% of the U.S. population and is the fastest growing racial group in the country. Yet researchers routinely collapse these individuals into an “other race” category for statistical analysis, rendering specific subgroups invisible. This brief introduces CATAcode, a free software tool that helps researchers systematically explore, document, and prepare check-all-that-apply demographic data for statistical modeling. In a demonstration with over 8,000 high school students, CATAcode revealed 85 distinct racial identity combinations from just eight response options. The tool shows how coding decisions dramatically affect representation. In one dataset, standard approaches identified only 12 Native American participants, while a priority approach increased that number to 128. The analysis shows that even seemingly small methodological choices can mean the difference between communities being statistically invisible or present in findings.
Document Type
Research Brief
Keywords
check all that apply, demographic data, social identity, self-identification, R package, open data, open materials
Disciplines
Databases and Information Systems | Demography, Population, and Ecology | Programming Languages and Compilers | Race and Ethnicity
Date
2-3-2026
Language
English
Acknowledgements
The author used Claude (Anthropic) to assist with brainstorming, organizing, and editing this brief. The extent of use was moderate. All content was reviewed, verified, and edited by the author, who takes full responsibility for the accuracy and integrity of this work. The author thanks Alyssa Kirk and Shannon Monnat for assistance with copyediting and publication.
Recommended Citation
Merrin, G. J. (2026). A New Tool for Handling Multiracial and Multi-Identity Data in Social Science Research. Lerner Center Population Health Research Brief Series. Research Brief #141. Accessed at: https://doi.org/10.14305/rt.lerner.2026.3.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Included in
Databases and Information Systems Commons, Demography, Population, and Ecology Commons, Programming Languages and Compilers Commons, Race and Ethnicity Commons
