Date of Award

June 2015

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

School of Information Studies

Advisor(s)

Jian Qin

Keywords

Cyberinfrastructure, Data Reuse, eScience, Open Science, Research Data, Scholarly Communication

Subject Categories

Social and Behavioral Sciences

Abstract

The development of e-Research infrastructure has enabled data to be shared and accessed more openly. Policy mandates for data sharing have contributed to the increasing availability of research data through data repositories, which create favorable conditions for the reuse of data for purposes not always anticipated by original collectors. Despite the current efforts to promote transparency and reproducibility in science, data reuse cannot be assumed, nor merely considered a “thrifting” activity where scientists shop around in data repositories considering only the ease of access to data.

This research was driven by three main questions: 1) What are the factors that influence scientists’ research data reuse? 2) To what degree do these factors influence scientists’ research data reuse? and 3) To what extent do scientists reuse research data? Following a sequential mixed-method approach, this study sought to provide a more nuanced view of the underlying factors that affect social scientists’ intentions to reuse data, as well as the impact of these factors on the actual reuse of data.

Findings from a preliminary small-scale exploratory study with 13 social scientists produced 25 factors that were found to influence their perceptions and experiences, including both their unsuccessful and successful attempts to reuse data. These factors were grouped into six theoretical variables: perceived benefits, perceived risks, perceived effort, social influence, facilitating conditions, and perceived reusability. The variables were articulated in a conceptual model drawing upon the Unified Theory of Acceptance and Use of Technology (UTAUT) in order to examining social scientists’ intentions and behaviors towards the reuse of research data. The proposed hierarchical component model and the research hypotheses were validated through a survey, which was distributed to 4,500 social scientists randomly selected from the Pivot/Community of Science (CoS) database.

A total of 743 social scientists participated in the survey, of which 564 cases were included in the analysis. The survey data were analyzed using the Partial Least Square Structural Equation Modeling (PLS-SEM) technique, and supplemented by ad-hoc group comparison analyses. Survey results demonstrated that social scientists’ data reuse intention and reuse behavior were indeed influenced by different factors beyond frugality. More specifically, the more practical and social benefits social scientists perceive from reusing research data, the more likely they intended to reuse data. Similarly, peer and disciplinary influence had a positive effect on social scientists’ intention to reuse data collected/produced by others. On the contrary, the construct perceived risks was found to negatively influence social scientists’ intention to reuse existing research data collected by others. Facilitating conditions and intention to reuse were found to positively correlate to actual data reuse behavior. Perceived effort was found not statistically significant, indicating that reusing data from others did not involve as much effort as collecting/producing primary data. Perceived reusability failed to be measured, due to the lack of convergent validity. Ad-hoc group comparison tests found that intention and data reuse behavior depended on sub-disciplines’ traditions and the methodological approach social scientists followed.

The findings of this research provide an in-depth understanding about the reuse of research data in the context of open science, and provide a collection of factors that influence social scientists’ decisions to reuse research data collected by others. Additionally, they update our knowledge of data reuse behavior and contribute to the body of data reuse literature by establishing a conceptual model that can be validated by future research. In terms of practice, it offers recommendations for policy makers, data scientists, and stakeholders from data repositories on defining strategies and initiatives to leverage data reuse and make publicly available data more actionable.

Access

Open Access

Share

COinS