Update on IATI Datastore API

IATI-techteam · April 23, 2019, 3:19pm

New DataStore API

As everyone is aware the new IATI DataStore is in the process of being built. There have been a few questions about the API that this discuss post seeks to address. We have a bit of background on the old DataStore for context then the information about the new DataStore API.

History of the old Datastore

When the IATI Standard was launched there were strong arguments against IATI maintaining a database: this was seen to be overlapping the service provided by the OECD CRS which is a curated database. The datastore was instead agreed as an uncurated view of the files held on the Registry.

The original datastore was built in 2011/12 by Open Knowledge (who also managed the IATI Registry at the time). It replicated all (readable) data on the Registry. It was sold as everything in the Registry today will be in the datastore (DS) tomorrow.
Locations and results data were not included in this alpha - only transactions, budgets and activity-level data were imported into the DataStore (the whole activity was only available through an xml blob stored for each activity). The alpha product did not clean data, it just show exactly what was on the Registry.

The plan had been that these additional features would be added in future phases of the DataStore but due to a number of reasons (including budget and contractual matters) the project never went beyond phase one.

New IATI datastore

The new datastore is based on existing open source software ‘OIPA’ and is actively being maintained by Zimmerman & Zimmerman. OIPA has been in use with a variety of international organisations and governments like UNESCO, IOM, DFID, MFA and many others. This new datastore will scrape the IATI Registry for IATI Publishers and their XML data sources, validate XML using the new IATI Validator service and will then transform, store and interface that data into API for anyone to use. The API has 14 different API endpoints each with their specific purpose. The datastore will also allow users to export XML, CSV and xlsx format if so desired. Snapshot of functionality as per original specification:

ETL (Extract, Transform, Load) from XML to JSON
Validation provided by new IATI Validator
IATI Version support
XML exports
Range of filters available
API output
CSV/XLSX Serialisations

Timeline for delivery

The new DataStore will be launched together with the IATI Validator this summer.

The IATI technical team met with Zimmerman and Zimmerman along with Data4Development earlier this month to agree how the two systems will integrate. We will share more information on the system integration of the two products in early May with an update on the timeline.

Moving from old to new API

The new OIPA-powered Datastore will differ from the current one both in API calls and results returned. This is because as it shows more of IATI data, a new structure is needed to do so logically.

For API calls we have limited the changes as much as was possible to do whilst still delivering a product that has a different core structure and more capabilities. Mostly what will be required is small tweaks in the url in use so that it points to the new location. The mapping is not going to be a 1:1 mapping so we cannot use the old API structure with redirects.

For returned results, the underlying structure will once again be mostly the same, with a few changes. The new Datastore allows for a more comprehensive and precise resultset to be explored. An example is the participating-org result row: in the current system, the users will receive a result containing participating-org.role = 3, while the new Datastore offers an expanded view, providing both participating-org.role.code = 3 and participating-org.role.name = Extending, effectively removing the need for cross checks and extra calls.

Don’t panic!

The technical team are here to help. There will be documentation of all the new parameters, queries possible and outputs so that the transition can happen as smoothly as possible.

We are also making sure there is a grace period where the old DataStore will exist in parallel till the end of 2019 so that there is plenty of time to make necessary changes.

Details of who is using the current DataStore are being collected here.

Herman · April 29, 2019, 12:37pm

Thanks for the update on the status of the DS. Is there an overview of what functionality will be delivered when against the functional requirements as specified in the final version of the Terms of Reference for the DS? If not, it would be very helpful to have such an overview because it will enable planning of the migration of existing applications of the old DS to the new DS.

Kind regards
Herman

matmaxgeds · April 30, 2019, 2:05pm

+1 for the overview against the ToR

IATI-techteam · May 1, 2019, 12:27pm

Full details of the project are publically available here: https://github.com/zimmerman-zimmerman/OIPA/projects/2 so you can clearly see the implementation of the project.

When the IATI DataStore is launched there will be clear user documentation and developer documentation made available that will help to smooth the transition between old DataStore and new. For the question “what will be delivered when” the full TOR will be met at the point of release. Any future releases would be for upgrades / new features.

markbrough · May 2, 2019, 9:19pm

Thanks for the update.

Please can you explain / review the below paragraph? If the new API will return the original XML and full, unpaginated results (as I believe it will), then I don’t understand why it would not basically be a fairly trivial process to convert the old URLs to the new URLs, by passing the current (very limited!) list of filters on the old Datastore to their equivalent values on the new Datastore.

Please can you confirm that this will include meeting the following requirement included in the draft ToR, and if not, why the Secretariat decided to remove this requirement?

Provides support for (all ?) existing routes for the IATI Datastore? So that existing software using the current Datastore API does not break.

At the very least, the IATI Secretariat should be working to ensure continuity of service for anyone who has managed to begin using IATI data over the last ten years, especially when it should be fairly simple to implement once on the IATI Secretariat’s end rather than many times on many users’ ends. Please can this be reconsidered?

IATI-techteam · May 8, 2019, 1:13pm

As mentioned in our original post, there is no 1:1 mapping therefore the mapping would be infinite. It also would introduce a new potential for technical debt on the new Datastore. By issuing a redirect at Datastore level:

We would essentially be Introducing a static mapping file
It would by no means sort the old domain to new domain out (eg: the redirect from old-datastore.iatistandard.org to new-datastore.iatistandard.org should live on the old Datastore’s server; this redirect will cease to exist once the old datastore is switched off, so there’s no benefit to it
In future iterations of 3rd party software, the new urls will need to be used

There are 6 months available to update the API calls and there will be information available with the launch of the new Datastore to help people transition.

We have already said that we will be meeting the terms of the TOR, where there has had to be a deviation in order to bring technical benefits to the whole community we have posted here to notify users. If there are any other changes there will be notifications.

The IATI Secretariat are committed to ongoing improvement to services. In in order to do this change will sometimes be required. Changes are not made in an arbitrary way; they are considered carefully and we engage with the community to keep everyone informed of possible impacts. . In this instance, in order to provide a better datastore the structure of things has had to shift. We are providing a long grace period within which we are happy to speak to you directly if you need further support.

markbrough · May 8, 2019, 5:36pm

The response above has worried me a lot. This decision breaks many of the few country systems using IATI data, so it seems worth spending some time seeing if we can avoid this, which will set us back significantly in terms of data use at country level. I put together a quick mapping file here which suggests it could be fairly straightforward to redirect requests (at least for XML data). Perhaps we can have a quick chat about any remaining technical barriers?

andylolz · May 9, 2019, 12:33am

I’ve had a go at turning @markbrough’s quick mapping file into a tiny redirect application:

The code is on github.

matmaxgeds · May 9, 2019, 7:19am

Thanks (amazing) @andylolz

Not suggesting you should do it now now, but trying to think about any other potential issues, I guess it wouldn’t be hard to add the header that comes with datastore xml?

<result><ok>True</ok>
<iati-activities generated-datetime="2019-05-09T10:11:11.870934"
<query><total-count>1</total-count><start>0</start>
<limit>50</limit></query>

The redirects for the datafiles route seem to require the internal name? but I guess we could do a lookup for this - or maybe the new datastore will include both options.

It also looks like as soon as the new datastore can do =activity&transaction, not just =transaction then we should also be able to redirect those.

markbrough · May 9, 2019, 11:40am

This is super good, thanks @andylolz! I think it shows how simple it would be to avoid breaking systems at country level. On @matmaxgeds point, as the XML output is still under active development and some additional metadata (total number of results and status) would be useful and something similar is in the JSON output already, perhaps @siemvaessen could just adjust the XML output to include this metadata?

andylolz · June 6, 2019, 1:53pm

Can I make the suggestion that we refer to “datastore v1” and “datastore v2”, rather than “old datastore” and “new datastore”?

“Old” and “new” are not future-proof terms. The passage of time means “new” will inevitably become “old” some day!

But more importantly, I think this helps encourage a more agile mindset. The “new” datastore is not the solution, but rather the next iteration. As with all software projects, it probably won’t perfectly solve all problems for all users. There will probably be future iterations (the “new new” datastore? Or v3, v4 etc.)

Similarly, the “old” datastore was part of the same process – an iteration we can gain insights from. It should be deprecated in such a way as to minimise disruption to existing users, just as we’d deprecate other end-of-life software products and services.

samuele-mattiuzzo · June 10, 2019, 10:35am

Considering the difference between the two Datastores for what concerns v1 and v2 data, I’d say even that’s confusing.

Luckily enough, we only have 2 products, so it should be easy enough for everyone to know which one is the current and which one is the next iteration of it. Plus (fingers crossed!) we’ll have this new iteration for quite a long time, and we should never end up in a situation where we have more than 2 tools.

Whether we call it v1/v2, current/next, old/new, it makes absolutely no difference and it’s anyway clear enough which one is which

andylolz · August 6, 2019, 3:32pm

It would be good to talk about the lifecycle of the new datastore. Will it launch as alpha / beta / stable? Is there a planned timeline for moving to a stable release?

I ask this with the current datastore in mind, which never progressed out of its alpha release. Providing a schedule for moving to a stable release could help give developers and users the confidence to rely on the new datastore.

Thanks

samuele-mattiuzzo · August 7, 2019, 9:00am

Hi Andy, like we said the new Datastore will be soft-launched for a month of testing (which we can consider our public Beta release) after which we’ll start, for a period currently set to 5 months but flexible, supporting our stakeholders with their upgrades and migrations. This can be called the Release Candidate period, during which both iterations of the Datastores will live alongside each other. Automatic redirects will start being rolled out.

After these months, we’ll have the Stable release, which will also see the old Datastore being dismissed.

Locations (URLS, that is) of each instance and a basic user manual will be shared promptly with users as we approach the soft-launch (which will also be announced on Discuss, so watch this space).

For what is worth, the Beta testing month is aimed at having our stakeholders and whoever else is interested in this try the Datastore and feed back on it, so no redirects will be in place nor we’ll support usage of the tool in an existing production environment (as we need to consolidate its functionality in a real life scenario)

andylolz · August 7, 2019, 10:20am

Great – many thanks for this update.

So: a month of beta, ~5 months of RC, and a production-ready release in early 2020.

samuele-mattiuzzo · August 7, 2019, 10:59am

Yes, with caveats. The decision to extend beyond those months ultimately will sit with us after proper consultation with our active stakeholders during those ~6 months prior to live release + deprecation of the old one.
(The above simply means that it’s a rough estimate of the upcoming months, and we are happy to be accommodating and flexible around it, so nothing’s set in stone)