How to ask "How's your data quality?"

Continuing the discussion from Technical measures to improve/incentivise better data quality:

The Open Ag team has learned that focusing on fields/elements/attributes (“you’re missing this, you’re missing that”) is the quickest way to get data publishers to tune out. It’s been much more effective in our conversations with donors to focus on data use cases or themes - how’s your results data? how’s your location data? etc. This is the way we intend to frame any data validation tools that we develop/fund.

@YohannaLoucheur took it a step further (as she does!) by proposing that we generate a small set of questions, ~10, where each one covers multiple elements/attributes that together enable data users to accomplish something useful. Here’s her illustrative suggestions:

  • How much will publisher’s operational projects disburse in country x in the next 12 months? (this requires planned disbursements broken down by quarter)
  • Which national NGOs are involved in the delivery of the publisher’s activities? (this requires all the details on the implementing partner)
  • Has the publisher implemented projects in support of the following sectors (work needed to create a list of sectors where 5-digits are necessary, eg primary education, basic health infrastructure)? (this would test the use of 5-digit DAC codes instead of 3-digit)
  • Has the publisher implemented projects at the district/village/something level (choosing a relevant admin 3 level)? (this would test geographic data)

If we can get the right set of questions, it would be of practical use for those working on validation tools, but more generally useful as questions we can ask donors, publishers, our colleagues, ourselves, etc. about data quality without sounding too pushy.

So we need to do some joint thinking. I’ve started a Google Doc for that purpose. If you would like to join this effort, raise your hand in the comments or jump into the doc and start ideating!

Thanks for starting this thread @reidmporter, and count me in!

I forgot a key dimension in my initial suggestions:

  • Can recipient country officials understand the project titles and descriptions of the publisher’s activities? (ie are they in an official language of the country)

I’ll add it to the Google Doc, but also wanted to share here to perhaps help generate more ideas.

One thing I’d like to do in the document is work out in detail how each question would be answered - the actual data elements required, the sequence. This could become useful guidance for data users. There may be more than one potential set of elements for a given question, but one would likely be more desirable (so an assessment could give a higher score if the publisher’s data enables set A, lower score for set B, etc).

Watching with interest! I also love the idea of framing this as “can I do x with this data?” (I remember @stevieflow proposing the very same approach at the Open Ag workshop!)

Hope you’ll do more than watch!

I think we’re going to dust this off and possibly use it at the Open Ag Workshop next week - anybody want to add their two cents/pence? …and take a break from all the 2.03 discussion;-)?

I’ve been thinking (with some guilt) about this document. Did you end up using it for the workshop?

No:-( Too many things, not enough time… Let’s keep thinking about it with guilt and eventually we’ll act on it. Was hoping it would be useful to @robredpath eventually as CoVE for IATI takes shape, to help us focus our “opinionated review” (credits to @andylolz there I think) on specific user needs, to make data more useful, usable, and in-use (credits to @stevieflow there!)

1 Like