Challenge: create super-simple conversion from Excel to IATI XML

My Challenge to the IATI community: Are you interested in pitching for seed funds to develop something super human-friendly that converts Excel worksheets into IATI XML with no fuss?

So this is my concept: the source data from the user would be in an Excel workbook with multiple worksheets.Pretty much every person I work with in organisations uses Excel at some point, so the skills are already there - no learning new tools. I also think workbooks are more human-friendly way to work than one massive Excel sheet/CSV file. And it’s way easier to collect data from different systems into different worksheets in one workbook.

The source Excel workbook could be based on a downloadable template so it is pre-populated with the right elements/column headings to make the mapping go more smoothly. Or you could tell users the headings to use in their workbook. Some systems eg. SAGE also allow you to export data to sheets with alias field names that could replicate the required headings.

Then the ‘something’ would take all the data across all the worksheets (which probably have the activity IDs in the same field on every worksheet to link the data) and map it to the IATI elements.

It will create the XML following the schema (it needs to produce data to the latest IATI Standard version without the user even knowing what a ‘version’ is) and validate it. So the user will get the XML and the ‘green ticks of perfection’ with little to no effort.

And then all the publisher has to do is host the XML data on their website and upload the link to the Registry. Job done.

Oh yes, and the something also needs to be …

  • available both online and as a download/app to use offline (but with auto-check for updates when the user is also connected to the internet).

  • available in French and Spanish and Dutch/Flemish.

  • ideally would also work with OpenOffice spreadsheets and Google Sheets for the online version (although the majority of users will be on Excel).

  • usable on mobile devices.

  • open source and stand alone, ie not embedded into another tool.

  • sit alongside other publishing software that is currently available, to give publishers a great choice of options.

The ‘something’ would need to have a clear business model - so it would may sit alongside/complement other products you already offer, for example. I think it would need to be subsidised initially by a/some donors so that it can be offered at little or no cost for small organisations. But I can also see the potential that medium to large organisations would pay you to integrate it into their processes.

What do you think? Any takers?

A few months ago I wrote down some thoughts on “spreadsheets to IATI”.

We’re exploring some options for an organisation. I think the “integrate it into their processes” is actually the key requirement here: that also means integrating it with quality and compliance, publishing, and using the data in one way or another.

Thanks for sharing the document Rolf, great technical summary of the challenges and potential solutions.

dear Sarah, i think this is a great idea and i would be willing to provide contributions to this suggested work. my organisation is just starting to publish data according to IATI and as we have funding from the NMFA, for the consistency reasons, i will be advising the senior management to publish data using NMFA’s extended requirements. i can foresee that we are going to have difficulties in transforming our financial data from Quickbooks to CSV or even more so to IATI XLM formats. To transfer data manually would be too time-consuming.

i am looking at Rolf’s document and trying to figure out if it already brings solution, as apparently it allows for converting Excel into IATI XLM format? we publish data through Aidstream.

i am not sure how helpful i can be as our organisation have beginner’s level in terms of IATI. do you know if many organisations publish the financial data at the transaction level? as i believe only in this case the solution for an easy conversion is really important (contrarily to the scenario when aggregated data per quarter is published)?

many thanks

Thanks for putting this post and it was really interesting to read though the thoughts by @rolfkleef too. Together, they set out a vision with a greatly improved user-experience than the IATI Secretariat’s existing CSV2IATI tool.

I like the idea of having multiple sheets within a single workbook, which would deal with the multi-dimensional nature of the IATI Standard. However, some thought might be needed to deal with elements that are themselves multi-dimensional (for example, an iati-activity can have multiple transaction elements, which itself can have multiple sector elements). The usability of such sheets in the workbook would need some consideration, particularly if the entire tool is to fulfil its goal of being ‘IATI beginner’-friendly.
Of note on this point - the element iati-activity/result/indicator/title/narrative is the ‘deepest’ repeatable element that I have found in that v2.02 standard.

In terms of development, I know IATI Studio and Aidstream were considering adding the ability to generate IATI XML from spreadsheet-like formats - perhaps @siemvaessen or @bibhusan might be able to comment on their plans??

Whilst the secretariat have made a commitment to host the existing CSV2IATI tool until new tools and publishing solutions emerge to replace it, it is highly unlikely that there will be further development work on this legacy tool as our focus for development work is currently focussed on streamlining the internal architecture and supporting the data usage agenda.

Greetings,

Would it not be better to have a web service/REST API hosted on the same domain as iatistandard to perform the conversions as opposed to distributing software that would be liable to change at each update of the specification?

The service could verify submitted Excel workbooks and maintained centrally. Just like a website it would serve desktops, tablets and mobiles.

Looking at the last post, it seems Aidstream essentially provides a way of does this

Regards

Thanks for raising this @SJohns

I agree that there is a need for something around this. I did a lightening talk at the IATI members’ assembly to highlight experiences of how people can get “confused” during the conversion and validation process (warning: contains cats!)

Over at the 360Giving data standard, we’ve worked hard to support organisations to publish spreadsheets (via templates), which can then be “easily” converted to the JSON format (360 is JSON rather than XML) via such tools as CoVE. In this context. it has felt like a smoother process to publication, as organisations are able to control and publish data in a format they know - which is particularly important in smaller places where there is litlle or no IT support/infrastructure.

Of course, it should be noted that no process (imho) can ever be “super simple”. Even with 360 there are a few gotchas. We’ve all been down the date formats in windows / excel rabbit hole!

We should be minded that IATI has lots of one-to-many relationships: a typical activity will have multiple transactions at the very least, but perhaps also more than one participating-org, document-link, sectors and possibly geographies. When we get to results, then this starts to get more complex - multiple indicators with multiple time periods for reporting targets and actuals anyone?

As you state, this starts to make more sense when in a workbook. It should also be worth flagging some of the efforts around this already. I made a version of 2.01 (warning: this isn’t fully tested) and added the files here. These files provide “user friendly” titles for the various parts of the IATI schema, which is your point about the mapping to the standard. Indeed, with Bond we provided guidance about working with such files, “flattening” them to a CSV, and then running through CSV2IATI to mint IATI XML. I’d agree that this isn’t “super-simple”, but it was interesting to see some organisations adopt this approach, once initial support / training / explanation had been given.

There’s another school of thought which we should flag: “messing” around with spreadsheets can lead to inefficiencies and data quality loss. Outputting IATI XML data in the right and valid format should not be anywhere near the frontline. We’ve seen many organisations produce IATI data via teams working together at their strengths - with IT controlling the data production end. I just wanted to flag this, as no matter how great a tool is, it could be argued that solutions are better sourced in-house, within processes and systems already in place. That’s a whole other discussion though!

That said, at Open Data Services Co-operative, we will start to work around some of this in 2017. We’re not ready to announce timelines just yet, but will be testing how the various libraries and tools we have can address these user needs, starting with our own work. We will, of course, share our thoughts along the way.

Hi Sarah and All,

We made a standalone offline excel publishing tool a few years back and still use it for our publication - see https://github.com/WaterAid/iati-xl2xml/raw/master/src/xl2xml.xlsm . We needed to take the technical aspects out of publishing IATI and allow the various people involved in the publishing process (different areas of responsibility, approval processes etc.) access to the data on their terms so we chose Excel as a common tool. There is a different worksheet for each of the main areas of the standard, eg basic information, Transactions, Activity dates etc. linked via the first column of each table, and we use a mixture of manual data entry and automatic data feeds (where we have corresponding systems). We only fill in the blue cells of the table (grey and green are autofilled based on the IATI standard and WaterAid’s publishing model - saving about 2/3rds of the publishing burden).

To create the XML you just run the “GenerateXml” macro and choose an output directory. At the time we anticipated our country offices would need to enter some of the data so we made this tool 100% offline and requires no additional installers etc. beyond having a windows machine with Excel installed (vba using standard windows libraries) - meaning it can be emailed back and forth with all relevant sign-off data in one place (useful for approval processes).

Although we have used WaterAid publication as an example, technically it can publish as much or as little of the IATI standard as you like without restriction - you just change/ add / remove the table column headings (row 20) as appropriate and copy in new sheets for any additional repeating elements (under the hood the tool starts with the “iati-activities” sheet then traverses sheets (linking to its appropriate parent through cell C1) to recursively build out each sub-element on a row by row basis before moving to the next row, using the IDs in column C and row 20 to guide output - so its actually a general purpose excel to XML export tool really!).

While this has proven great for our publishing, what’s needed to make this super simple for other organisations is a number of standard templates (with the worksheets configured accordingly) that users can choose according to their publishing style and language. Even better would be a front-end setup GUI allowing users to choose which elements of the IATI standard they wish to publish (the tool would then use the IATI schema to create worksheets, columns and codelist drop-downs appropriately, even setting up auto fills) - we started looking into this (eg using the IATI validator rulesets), however we couldn’t justify further time on the project as we didn’t have an internal need at the time. Instead we open sourced it in the hope that others might take this further (@stevieflow - this is the same tool and workflow I talked you through the other year, should this help jog memory!). We’d be very happy to talk more about this, and given the new DFID requirements for NGOs to ensure our partners publish data we may well start another round of development/ create some new templates for this - anyone interested in collaborating? Funding would also help maintain/ build communications for this tool - anything available?

Best,

Mike

I’ve just done a spreadsheet-to-IATI conversion for Aids Fonds last week, which enable me to set up a first rough tool chain following the “pseudo-iati merger” approach from my document:

  • They can extract individual Excel spreadsheets from their Management Information Systems (in their case one, but could be multiple): basically tables of information with an IATI activity identifier in each row (results, transactions, basic project information, …)
  • I convert them to CSV files, set up the mapping, and merge the results into a single IATI file. Setting up the mapping is still a manual step that needs to happen once.

We’re working along the way of Steven’s “different school of thought”: extracting useful spreadsheets from existing systems, rather than setting up a special IATI spreadsheet. It’s an incentive to fix the data at the source, rather than during an IATI publication process.

And the mapping allows for some additional transformations (for instance from internal codes to IATI code list values), and mostly is a one-time set up.

Hi @rolfkleef - sounds like some great progress with Aids Fonds.
Please do consider the tool I mentioned - it could save some effort in having to do your second bullet as your first bullet sounds compatible with the tool (and we used to do the second bullet before we made the tool) - always happy to walk people through the code - especially if it means we build a tool for the sector rather than just one-offs for specific organisations!

I agree that where possible one should aim to fix data quality issues at source and not set up special downstream IATI spreadsheets. We’ve actually used our IATI reporting process to highlight where we need to pinpoint and fix the problems in our source systems, then have slowly migrated the fixes back up the chain (using the data driven evidence (and business case!) that we didn’t have before our IATI reporting process :slight_smile: ). The problem is, practically we’ve found there will always be some manual fixes you just can’t/ shouldn’t propagate back to source (not worth the investment to fix the source system just for IATI!). More than happy to discuss more!

Hi Mike - Thanks for this, very useful tool.

We’re trying to commit to publishing as much data as we can manage, am I right in thinking we can just add the additional columns to the main info tab to accommodate data for which you select one variable per activity (or where the split is 100%)?

If there are multiple rows, per activity (e.g. documents, conditions) can that be added without changes to the macro?

We’re also starting to look at the impact of DFID requirements and how we might support our partners in publishing. This could be a good starting place for that so thanks again for sharing.

Gary

Hi @garybrough , great to hear that you are also looking into how to support your partners to publish - it would be great to hear how you get on - maybe you could post a new topic with your progress?

To answer your specific questions, yep all changes can be done in the spreadsheets, no need to change any macros etc.:

  • you can just add any additional columns into any of the sheets and it will write the new element under the corresponding IATI element listed in cell C1 of each sheet (most of the additional elements you might want to add are on the main sheet “Activity Main Information”, but you might also want to add to others such as transactions etc. - you can use the navigation tree on the left hand side of http://iatistandard.org/202/activity-standard/iati-activities/iati-activity/ to figure out where a particular element should go)
  • for multiple rows per activity you add another sheet following the same pattern as the other sheets - I’ve already done this for documents so thought I’d attach it incase it is of use (just copy the whole sheet into the tool): IATIDocuments.xlsx (16.4 KB)

Hope that helps!

Mike

Hi all,

Just to chip in here. As part of our development for the ZOOM project (a data analysis platform built for Aidsfonds in cooperation with HumanityX -Leiden University-) we are in the final stages of the Datamapper component: a (lots of different types of) CSV convertor to the ZOOM data model. ZOOM is built on top of the IATI Studio core, which runs on top of the OIPA Datastore, so we’re trying to re-use existing technology.

With regards to the ZOOM Datamapper component, I’d be happy to see how we can amend that conversion process so it could fit a ‘generic’ CSV file for organisations to publish to IATI. In this setup we have many moving parts: data input, data normalisation, data storage and data output (API).

Seeing how ZOOM will touch some of the IATI Studio UI elements and OIPA, I could see the following scenario play out:

Datamapper (ZOOM data conversion component)
CSV input (different types)

OIPA Datastore (input/conversion/storage/API)
IATI XML input

IATI Studio
IATI Publisher component (Manage IATI data/connect to Registry etc.)

I’d be happy to have a call on the ZOOM Datamapper for:

  • CSV ‘IATI’ Ready files for input
  • Map CSV ‘IATI’ to OIPA/IATI Studio
  • Publish the converted CSV ‘IATI’ XML file to the Registry.
  • Continue & repeat

Can we setup a call and see how we would like to move on this specific subject?

Thanks, Siem

I’ve now written up our approach to Spreadsheet to IATI conversion, as introduced in my lightening talk at the TAG:

The command line tool is currently usable offline, but I would concede that’s not super simple. We’re working on a simple web interface though, but that won’t be available offline.

Interested to discuss this further, to see if we could better address @SJohns bullet points. One approach we discussed very briefly at the TAG, was whether @mikesmith’s tool and ours could be made more compatible.