Thanks for your thoughts @bill_anderson, @siemvaessen, @hayfield and @Herman
I would still strongly argue that preventing the publication of invalid data via the Registry will make a needed difference to the quality and usability of IATI data, which is key to IATI achieving the impact it seeks.
The key benefit of validating data on initial publication and each update is that it ensures we catch errors at the point of entry - and at the time when publishers are most invested in making their data available.
If such a measure is introduced, we must think carefully about the publisher experience. If a user attempts to publish invalid data via the Registry, there should be a clear user interface to help users understand:
- If there are technical issues with their data
- Exactly where these issues appear in their file/s, and
- Clear guidance on how to fix each issue, so that they can publish their data.
Other approaches (including scoring invalid data more harshly in publisher statistics) are good ideas but are inherently reactive approaches. I.e we would be trying to chase publishers to resolve issues (hours/weeks/months later?) when they have moved already onto other tasks. Previous experience along these lines have resulted in limited impact, whilst inefficiently consuming support resources.
With particular regard to publisher statistics, the effectiveness of this approach assumes publishers routinely look at and care about their performance there - which may or may not be the case.
I disagree that preventing publication of invalid data amounts to censorship. It seems that all serious attempts to make use of IATI data at any scale have encountered significant data quality issues limiting outcomes. Given the renewed focus on data usage, we are saying that we want IATI to be a good quality and usable source of data, thus fulfilling IATI’s mission to ‘make information about aid spending easier to access, use and understand’.