This is a timely topic, particularly given that new updates to DAC codelists keep on coming! Additionally, representatives from the IATI Technical Team met with the OECD DAC last week to discuss how we can better support each others work.
Our use of replicated codelists was a key topic as part of this, and we gained a greater understanding into how DAC codelists are reproduced and shared our experiences of working with them.
There were a number of issues which will be of interest to the IATI and development data communities:
Machine readability of codelists
As we know the DAC codelists are made available in XLS (spreadsheet) and XML (fully machine-readable) formats. However, the DAC recognise that in the past there have been occasions where the content of XLS and XML versions have been inconsistant. In part, this appears to be related to resources available to manage the codelists. Currently, codes are stored at source on an internal (SQL) database, with the output XLS and XML versions curated manually.
However, by Autumn 2017, the DAC are planning to implement an automated systems to generate both XLS and XML versions in an automated way, which should see an end to inconsistencies between versions and make it possible to work with DAC codelists in a fully machine-readable way.
Changelog
The DAC have been responsive to requests from their user community and have introduced a summary sheet to the main sheet detailing changes since the last version. This is of great help for highlighting modifications, as it prevents the need to manually identify these differences.
Example from the latest spreadsheet of codes:
With full machine readability of codes, it will be possible to generate changelogs more easily anyway. I would suggest that a daily script is created to identify changes to XML version and store a new version in a git repository.
Reuse of codes
From our discussion, it was clear that the DAC share our frustration when codes are re-used. It seems that they feel they are forced into reusing codes when member organisations supply data. Nonetheless this is something that they are striving to avoid. Additionally, we are grateful to DAC staff who have been responsive and helpful in responding to queries to help understand the scope of changes to code names (for example relating to DAC sector code 15114).
Making old codes available
The DAC are responsive to this ask and are planning to modify their source database to include metadata fields for code introduction (and presumably withdrawal) dates. Our understanding if that they will seek to make this public in output versions too. The above idea on storing codelists as a git repository will provide a way to generate this metadata even if not feed through to output versions.
Summary
All in all, this was an encouraging meeting and a good insight into working practices. The publication of these lists in fully machine-readable XML format will be a game-changer for the management of the codes.
We would continue to encourage the DAC user community to make visible the positive impact that the production of DAC codelists makes to the wider development data community, as well as the importance of codelist management policies and our excitement about the potential of full machine-readability to improve efficiency and effectiveness.