Continuing the discussion from Technical measures to improve/incentivise better data quality:
The Open Ag team has learned that focusing on fields/elements/attributes (“you’re missing this, you’re missing that”) is the quickest way to get data publishers to tune out. It’s been much more effective in our conversations with donors to focus on data use cases or themes - how’s your results data? how’s your location data? etc. This is the way we intend to frame any data validation tools that we develop/fund.
@YohannaLoucheur took it a step further (as she does!) by proposing that we generate a small set of questions, ~10, where each one covers multiple elements/attributes that together enable data users to accomplish something useful. Here’s her illustrative suggestions:
- How much will publisher’s operational projects disburse in country x in the next 12 months? (this requires planned disbursements broken down by quarter)
- Which national NGOs are involved in the delivery of the publisher’s activities? (this requires all the details on the implementing partner)
- Has the publisher implemented projects in support of the following sectors (work needed to create a list of sectors where 5-digits are necessary, eg primary education, basic health infrastructure)? (this would test the use of 5-digit DAC codes instead of 3-digit)
- Has the publisher implemented projects at the district/village/something level (choosing a relevant admin 3 level)? (this would test geographic data)
If we can get the right set of questions, it would be of practical use for those working on validation tools, but more generally useful as questions we can ask donors, publishers, our colleagues, ourselves, etc. about data quality without sounding too pushy.
So we need to do some joint thinking. I’ve started a Google Doc for that purpose. If you would like to join this effort, raise your hand in the comments or jump into the doc and start ideating!