This post is mostly a write up of the lightening talk I gave at the TAG, with some additional info on how the work is progressing.
CSV2IATI is going to be retired this year, so at Open Data Services Co-operative we’ve been looking at alternate approaches for converting spreadsheets to IATI XML.
For OCDS (Open Contracting Data Standard) and 360Giving, we built flatten-tool. This is a command line tool, and python library, for converting from flat spreadsheet formats to nested structured formats (JSON or XML). We’re working on adding IATI XML support to this.
Key features of flatten-tool:
No mapping tool
Instead of a mapping tool, we require that headings contain an xpath (or close to an xpath) for the part of the IATI XML the column corresponds to.
e.g. this spreadsheet:
becomes this XML:
Most users won’t need to worry about xpath though, as we’ll have a pre-made template for them to download and populate. However, the approach does also allow custom templates to be easily crafted for particular use cases.
Since all information is contained in the spreadsheet, it will be possible to upgrade to a new IATI standard version by just changing the data, without any change required to the tool.
One-to-many relationships
In IATI, one activity usually has multiple child elements of certain types (e.g. multiple transactions). flatten-tool supports multiple ways of representing this:
- Multiple columns
- Separate sheets, linked by the IATI Identifier in both sheets
- Repeated rows, for compatibility with CSV2IATI
e.g. multiple columns might be convenient for mulitiple recipient countries:
but a separate sheet might be more convenient for multiple transactions:
Excel workbook (XLSX) support, in addition to CSV
CSV and XLSX are the supported input file formats.
CSVs exported from Excel often have problems - e.g. encodings and date formats. By supporting XLSX we can more easily get at the original data. It’s also an easy way to supply multiple sheets with one-to-many relationships (above).
Progress so far
Basic XML support has been merged into flatten-tool, and there’s instructions for using it to produce IATI XML in the docs. We’re already using this code internally to help people publish IATI data. As @johnadams would say, we’re eating our own dog food.
There’s a bunch of issues still to be worked on. The most pressing, which we’re working on currently, are sorting the XML correctly, and allowing spreadsheets to specify the IATI standard version. We’re currently working around these by manually adding the version number, and sorting with this script.
Future plans
We’re working on a web interface, so that users can upload a file, instead of having to use a commandline tool.
We want this web interface to have integrated validation. Additionally validation messages will list the cell in the spreadsheet that caused the error, allowing data to be fixed more easily. @stevieflow made a presentation about this approach to the IATI Members’ Assembly. Here’s a screenshot of what this currently looks like for 360Giving:
For OCDS and 360Giving we also have conversion to spreadsheet (which means it’s possible to roundtrip, ie. edit and then convert back to XML). We think this could be useful for IATI too, but at the moment we’re focusing on conversion from spreadsheet first, because that appears to have the greater need.