Future infrastructure to support the IATI Standard

Since the TAG meeting in Tanzania, I have had several strands of thought floating around in my head and on scraps of paper.

These are concerned with the idea of how we in the tech community can help make IATI more usable.

I’m encouraged by the range of people wanting to use the data for a wide range of purposes, but I also hear from people that they are frustrated at not being able to get hold of the data in the way that suits them best.

You can read the latest iteration on this Google Doc. Thank you to those who have given constructive feedback so far.

In summary, I am proposing four actions

  1. Separate the definition of the standard from its technical implementation
  2. Build a Repository, not just a Registry
  3. Simplify the standard for different user needs through a modular approach.
  4. Encourage the development of interoperable tool components as global public goods

I’d love to hear what you think, either online here or if you are able to be with the MA in Rome next week. Please post your comments in this discussion thread and not in the document.

3 Likes

Interesting proposal(s) John. I applaud the spirit, perhaps I am missing the context/user story, but I’m scratching my head a bit:

  1. Simplify the standard by moving from one format to many? This seems like Androiding the standard - there would be a multitude of formats to consider. You admitted this technical challenge, but aren’t we struggling to get validation and upgrade-adapted tools as is? Wouldn’t this user need be better met by better multi-format export and data manipulation tools? Funny, I know some guyzz

  2. At first I thought “so your solution to the creation of 3 separate databases is to build a 4th?” But I’m intrigued by the quality gatekeeping and enrichment capabilities you described. (Centralized repo, quality gatekeeping, post-publication data enrichment - I feel like I just wandered into a minefield. Should this post have a trigger warning?:wink: )

  3. I think I’m with you here. Open Ag has already anticipated the name-pending element with its own extension (credit to Open Data Services there - http://openagfunding.opendataservices.coop/en/latest/extensions/) There are also several rulesets out there already - Open Ag has one, PWYF has one, etc., some of which we’re integrating with ODS’ Cove/Validation tool as part of the “Open Aid Publishers Toolchain” (working title for now). This doesn’t seem as earth-shattering as your other proposals. So - sure, let’s do it!

  4. This one deserves special attention:

In order to build the integrated and interoperable infrastructure that IATI needs, it would be good to agree the areas where investment is needed, and seek a collaborative approach among tool developers to build those tools and components. That might require IATI to act as a broker, seeking broad consensus on tools and possibly pooling funding towards common goals.This approach is challenging where the community is made of independent organisations and businesses.

Amen! But similar to 3, isn’t this already a thing? The majority of tool developers have been in the same room not once, not twice, but three times in one calendar year so far, and all their work is open source (and I say that not to toot my own horn but to toot Z&Z, Young Innovations, Wet Genes, DI Tech Team, Dev Gateway, Foundation Center, Open Data Services, Neon Tribe -'s horns :trumpet:) As @stevieflow says, “what’s to stop us from doing this again?” Funding and coordination, yep. I suspect most members feel like they already contribute their fair share to support the standard/initiative - so where would this come from? My suggestion: a consortium of donors that either care about data accessibility strongly or have requirements on their implementers to publish IATI data (therefore, the logic goes, that data is a public tax-payer funded good, therefore it shouldn’t be restricted to the technically literate population).

1 Like

Hi Reid, many thanks for being the first to comment and react, your comments are very helpful. A few comments back (maybe each of the points need a separate thread!)

  1. The core user need I have in mind here is to help the standard work for people whose natural technology is Excel. Publishers should not need to know about the IATI relational model - tools (like CoVE) should obfuscate this for them. Similarly, end users should be able to download in different formats, and again should not need to know the full detail of the IATI data model. I think I’m asking here for tools to hide the relational model from end users, much like we already do with BI tools within organisations.

  2. I think there is a need for IATI itself to provide a curated data store so that end users can extract the data they need in the format that they need it. Otherwise, everyone who needs to use the data has to build one, or understand the differences between the different aggregators. But I’m not sure whether that is indeed a consensus among the TAG community, and I’d be interested to hear what others feel about it.

  3. Yes, although I think we need to revisit how extensions and rulesets would work better, and some practical examples would help.

  4. Yes, this is indeed already a thing, and the community is brilliant in building things in the open. But I wonder how we could be more strategic in co-ordinating this effort so that we can better meet the data use user needs. I like your thinking around a consortium of donors who would be willing to support strategic investment.

2 Likes

I think this is a really interesting proposal and look forward to discussing more next week. I wondered though if we can also take one step back. If the core issue for usability is data that makes sense, is it also a good idea to re-look at the XML file requirement for publishers?

Although I understand the merits of XML in relation to the nested structure of the data standard and the open nature of the data, it’s not a file type that’s easy to produce using the software most organisations have access to. And although the number of publishers are growing rapidly, we still only have one stable, tested, public access free tool that converts CSV data into XML - AidStream (I know CoVE and Aid Studio are being developed to do this conversion, but they are still not available and I’m not sure what the sustainability plan is for these tools). The market forces to encourage more conversion tools to be developed are not working and it’s really not a great state of affairs to have the majority of publishers dependent on just one tool. If we can’t get the tools, let’s cut out the conversion process entirely.

There is a proposal in your paper to effectively ‘normalise’ the data at the point of entry to the Repository. If this is the case, and given that you want to hide the workings from end users and the fact that JSON and CSV can be used as export options, isn’t it time to open up the import options too? Imagine how many more publishers you would get if they could publish in CSV. Maybe the point has passed that this is possible because the standard is so nested and complicated now, but we should still consider the idea of making it way more simple to publish and create usable data.

I’d be interested in what others think.

1 Like

Sarah, I do think that would be a good goal, and I would like us to move towards that.

Tools that do the CSV-IATI translation are a good first step, which means that the IATI relational model is preserved.

Hi,

What would the advantages of a repository be over the datastore? Most users (currently) just want IATI data in an excel sheet - I am not sure they would notice the difference between a curated repo and a registry. On the other hand, if IATI has a repository, IATI is taking far more direct responsibility for the numbers/quality due to the curation…lots of issues there - e.g. the OECD curate CRS data and it takes them 12 months to do so.

I am not sure what making it more modular means e.g. in comparison to having everything included and just using the bits you want? On the other hand, I think that there are a lot of expansions to the standard that would help, and perhaps these could be called modules e.g. X is publishing IATI 2.03 with OpenAg extension module 1.04 - and this does suggest that the modules/extensions, need to be self contained to a degree i.e. in their own part of the tree.

I am not convinced that there is a need for more money for tools (and I may regret saying this a lot) but for example, how much money has the OECD CRS spent on making tools, websites, visualisations, integrations etc - comparatively nothing, and yet it is still far more used than IATI. This suggests that the problems really lie elsewhere. Either we are making the wrong tools, or the data is not what people are looking for. I suspect a bit of both.

As I read all the recent country profiles of IATI use that keep popping out, I think that your suggestions on ‘data quality is at a median score of 35%’ is much closer to the truth. Add that to ‘IATI data is not official data’ and ‘IATI data cannot solve double counting’ and I think that is as much of the problem as the format/complexity issues. There are a few situations where I prefer IATI data to OECD data, e.g. single donor queries for DfID data, the quality means that it is as good as the OECD data, it is official (as it is the same as used on their website), and with a tool like http://spreadsheets.aidonbudget.org, I can easily get it into an easy to use flat format. Perhaps this is a ‘ruleset’ like you suggest - a harder enforced quality standard, that means the data it covers meets a specific need.

Now if the OECD decided to only accept IATI data as an input for the CRS, then we would be talking…how about that OECD?

Matt

I like the part of IATI data being repository which can be accessed in different formats and not confined in XML format only.

Going forward we need to adopt a collaborative method amongst developers to come up with tools that will simplify the data.

I say this so that we are able to communicate easily and that we may not rebuff the new publishers and users who will come on board who may site it being too technical.

Agree 100%

Disagree 100% (and I think you will regret this). FOSS means free to use, not free to develop. If we want to stimulate the community to develop tools and services that are of benefit to all publishers/developers/users then someone needs to pay for this. The marketplace isn’t that mature (yet?) to sustain this.

Indeed. @OJ_ is WP-STAT ready for this???

@bill_anderson @matmaxgeds

if IATI has a repository, IATI is taking far more direct responsibility for the numbers/quality due to the curation…lots of issues there - e.g. the OECD curate CRS data and it takes them 12 months to do so.

I don’t doubt the challenges in maintaining a repository. The IATI Tech Team are already maintaining the Datastore and are using it for IATI quality monitoring.

But how much more useful would a Datastore be if it was able to serve up different cuts of data (by provider, country, sector, and other dimensions), in different output formats (XML, JSON, Spreadsheet).

The alternative without a Repository is that every user has to download all the raw XML data from the Registry links and then process those files. There must be a better way.

The benefit of IATI is that it has everyone’s data, therefore the whole is definitely greater than the sum of the parts. The network tells the story much better than any individual provider’s dataset.

The Datastore provides this already

CSV via a user interface

Or a more advanced search and xml and json outputs via API

The problems with the current datastore are:

  • It is not comprehensive (eg. Results and locations are not accessible)
  • Its update procedures are not sufficiently robust
  • Its user interface and API are incomplete

Work on fixing these problems will start before the end of the year. The Tech team is committed to providing a robust basic ‘vanilla’ service to all activities, all elements, all filters, etc.

3 Likes

And I am fully supportive of those upcoming changes, Bill. Particularly as those changes are based on the robust user research carried out earlier this year.

In addition to the comments made by @reidmporter, @matmaxgeds and @bill_anderson, I doubt that it is possible to define a repository which will serve all the different use cases. A lot of functional design decisions would have to be made which are use-case dependent. E.g.:

  • Do we split up transactions by their sector and country percentages in order to facilitate data use? If so, how? By sector, by region/country or both?
  • How would we implement traceability? Would we attribute outgoing flows to the relative contribution of incoming flows or not?
  • Etc.

Secondly, looking at the conclusions being drawn the last few years about the use of IATI data in country pilots, an very important problem is the completeness and accuracy of the published data. Wouldn’t we therefore better invest our very limited recourses in tooling, procedures, etc. enabling publishers to publish better quality data instead of trying to build another repository?

It would indeed be nice if the technical representation of IATI data would be separated from its semantic definition. A way to do this, would be to keep the current XML representation as the core representation and provide tooling to:
1 Convert non XML IATI data to IATI XML (e.g. convert CSV to IATI)
2 Retrieve non XML IATI data from IATI XML (e.g. convert IATI to CSV)

So I would keep the rich and proven XML format as the core technical representation for IATI data. The advantage is that we keep one canonical technical representation for all IATI data which has enough power to model all use-cases.

In such an approach it would i.m.o. be important to automatically check the consistency and conformance of the data being converted to the IATI. So the tooling should not only do the technical conversion, but should also do an automated quality check. Ideally there would be a centrally maintained data quality service, which could be used by all tool developers.

1 Like

We can’t blame OECD for our own blunders - we have designed the IATI standard in a way that allows for a conversion of DAC-formatted CRS++ data into a (simple) IATI-file, but we have made it impossible to go the other way; IATI activity-files cannot be converted into CRS++ and are thus unable to fit into, or feed into the statistical databases of the DAC.

What we can do (and I make the statement at any chance) is to advice the DAC to use IATI as a secondary datasource whenever they draft statistical analyses that trancends CRS++ (e.g. assessing flows from private charity-funds or working on the TOSSD measure for the future).

Hi @OJ_ I would be really interested to know about why IATI > CRS++ is not possible - is there anywhere I can read about it?

I made a paper for a session at one of the TAG’s in Canada - not at hand right now - but the obstacle identified at that time was the org_type etc. codes of IATI that can’t be translated into CRS channel-codes. Will need to update that paper in order to verify wether this problem is solved with version 2.03 of our standard.

Another issue should be noted as well: We must recognise that the DAC has a strong need for ‘single point of contact’ to governments of donor countries, for the dialogue and data-validation. Even though some of us are currently reporting on behalf of our Government, making sure that our IATI-reported ODA-volume reach 100% of our DAC-reported ODA, this is not the case for all donors, and might not continue to be the case for us; IATI is designed for reporting by individual organisations rather than Governments. There are advantages, certainly, but it will be an organisational challenge for IATI-formatted CRS-reporting.

Thanks @OJ_ really interesting to know - I found it in the 2015 schedule in case anyone else was there and can share the paper

1 Like

If you allow me,

I do support this, most of the data users depend mainly on some of the attributes(fields), we could have something such as “Basic” data or “Core” data and “Extended” or “Complete” data, this will:

  • Simplify exporting the “Basic” data to Excel format for the user

  • Simplify the publishing of the “Basic” data which minimize the invalid data and increase the reliability and quality of data

Hi @ibrahim,

I think this is a great idea. Does the recent work that was done on ‘IATI users’ help to define what data is needed by the different use cases? Or for the ‘basic’ needs, I can share the data that the various countries I work with would love to be able to easily access (in flat/excel) format, basically: a row per project: funder(s), implementer(s), total project value, local sector, start date, end date, grant/loan, disbursements to date, disbursement in last local FY, disbursements in next local FY, description (incl in local language), local contact name, local contact email. ADM1 geographic information, humanitarian/development.

If agreed, we could then work on making sure that that very restricted set of fields is comprehensive, not double-counted, uses local language.

Matt

Hi @matmaxgeds,
Thanks for your reply, to me these fields is fair enough, I only want to add a note for how to represent the multi-valued fields such as sectors or donors in a flat Excel file, I propose to separate them by either “,” or “/” or whatever else, this will help to import this in a database later, and keep that each row has one activity.