Document Type




Embargo Period



Statistical information access, Naïve users, Natural language expressions, Information needs, Statistical queries, Natural language processing, NPL




Library and Information Science


Naïve users need to access statistical information, but frequently do not have the sophisticated levels of understanding required in order to translate their information needs into the structure and vocabulary of sites which currently provide access to statistical information. However, these users can articulate quite straightforwardly in their own terms what they are looking for. One approach to satisfying the masses of citizens with needs for statistical information is to automatically map their natural language expressions of their information needs into the metadata structure and terminology that defines and describes the content of statistical tables. To accomplish this goal, we undertook an analysis of 1,000 user email queries seeking statistical information. Our goal was to better understand the dimensions of interest in naïve users’ typical statistical queries, as well as the linguistic regularities that can be captured in a statistical-query sublanguage grammar. We developed an ontology of query dimensions using this data-up analysis of the queries and extended the ontology where necessary with values from actual tables. We proceeded to develop an NLP statistical-query sublanguage grammar which enables the system to semantically parse users’ queries and produce a template-based internal query representation which can then be mapped to the tables’ metadata, in order to retrieve relevant tables which are displayed to users with the relevant cell’s value highlighted.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.