This Is AuburnAU Scholarly Repository

Reusable for who? Discussing "data science ready" repository design

Author

Krzton, Ali
0000-0001-9979-2471

Abstract

Research data repositories are intended to make data FAIR - findable, accessible, interoperable, and reusable - but implementing these principles in practical terms entails deciding how to evaluate FAIRness. Recent interpretations of interoperability and reusability, in particular, have based their metrics of progress in implementation around facilitating machine processes. In other words, improving the “I” and the “R” of data in a repository translates into structuring data to be crawled and ingested by automated agents and incorporated as seamlessly as possible into aggregate datasets. This is referred to as “data science ready” research data, and some repositories are working towards this vision in consultation with computer scientists. This talk explores the implications of prioritizing the machine digestibility of research data in repository curation processes. What does this perspective imply about the relative value of datasets intended for reuse by researchers working manually to interpret, restructure, and analyze the data? About the fields of study that primarily produce data of that nature? There are tradeoffs in metadata structure and content when the intended “audience” of data is human vs. machine, and there are also risks associated with stripping research data of its context, as can easily happen when employing big data methodologies. Finally, potential impacts to the research ecosystem are considered. For instance, when data is automatically scraped at scale, what happens if the researchers that made it available miss out on attribution and citation of their work?Who will have oversight over later use, and perhaps misuse, of that data? How would the original creators know if this happened, and what could they do about it?These considerations are important because design choices that can shape the future of research practice should not go forward unexamined and unchallenged.