Generating a list of all IATI file URLs

Any thoughts on the best way to generate a list of all IATI file URLs? Next, how about on a regular basis?

For a project, I manually generated a regional list by copying them from iatiregistry.org for example but this took some time and steps.

The only source that you can be sure of always being up-to-date is the IATI Registry API. You can use that directly through https://iatiregistry.org/api/action/package_search (only suitable for programmers I guess).

Would be good if a download in CSV functionality would also exist in the datasets page at the registry.

@BrentPhillips - the IATI-Registry-Refresher can do this for you, although again it’s more of a developer tool.

Running the grab_urls.php script will find the data URLs for each publisher, and create one file per publisher in a urls folder. Each file will contain the dataset names and the dataset URL for every dataset published by that organisation.

The above code is considered legacy, and will at some point time be replaced by functionality in iati.fetch part of the IATI Python Library - an experimental start on this has type of thing been made here, although it won’t be able to download dataset URLs at this point.

@VincentVW Are there any &fields= type filters on that API call, to just return the actual resource.url field in a list?

I don’t see anything for it in the documentation.

Would be good to have something like this indeed which is more performant / flexible. Side note; I would not recommend using OIPA for this use case, the registry API should be the single source of truth here.

1 Like

+1 for using IATI Registry Refresher. If you use the version on this branch, it fetches all the URLs ~10x faster.

[Thinking out loud… I guess if data pipes accepted JSON source, it would be easy to generate CSV output from CKAN. Or is there another tool to do this, @rufuspollock?]

1 Like