Description/Abstract

When surveys ask about race or ethnicity, a growing number of Americans select more than one category. The multiracial population now represents over 10% of the U.S. population and is the fastest growing racial group in the country. Yet researchers routinely collapse these individuals into an “other race” category for statistical analysis, rendering specific subgroups invisible. This brief introduces CATAcode, a free software tool that helps researchers systematically explore, document, and prepare check-all-that-apply demographic data for statistical modeling. In a demonstration with over 8,000 high school students, CATAcode revealed 85 distinct racial identity combinations from just eight response options. The tool shows how coding decisions dramatically affect representation. In one dataset, standard approaches identified only 12 Native American participants, while a priority approach increased that number to 128. The analysis shows that even seemingly small methodological choices can mean the difference between communities being statistically invisible or present in findings.

Document Type

Research Brief

Keywords

check all that apply, demographic data, social identity, self-identification, R package, open data, open materials

Disciplines

Databases and Information Systems | Demography, Population, and Ecology | Programming Languages and Compilers | Race and Ethnicity

Date

2-3-2026

Language

English

Acknowledgements

The author used Claude (Anthropic) to assist with brainstorming, organizing, and editing this brief. The extent of use was moderate. All content was reviewed, verified, and edited by the author, who takes full responsibility for the accuracy and integrity of this work. The author thanks Alyssa Kirk and Shannon Monnat for assistance with copyediting and publication.

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS