"Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI
Those who work with data have learned the importance of provenance, documentation, standardization, context, and metadata in maintaining the quality of datasets. This was historically done to preserve their utility for human reuse and re-examination, but in recent years the emphasis on machine-readability of datasets has increased, in part to allow for their use in AI (artificial intelligence) applications. Just as those involved in creating and maintaining datasets benefit from an improved understanding of how they might be used with AI, the developers of AI systems should pay attention to issues that affect the data upon which their models rely. Several Google researchers present this perspective in “‘Everyone wants to do the model work, not the data work’: Data Cascades in High-Stakes AI”, a conference paper based on their qualitative study of AI practitioners (Sambasivan et al., 2021).