Data use observation: people go to the documents

(discussion started at Members’ Assembly 17)

To contribute to the focus on people using IATI data, I wanted to provide some observations from supporting people to do so. Hopefully this is useful. I’d suggest that we try to refrain from getting into a solution immediately, but try to gather insights on how people do actually use IATI

A first observation relates to documents. Specifically, this is about the log frames, evaluation strategies, reports and such that are often linked to IATI activities. DFID, for example, publish links to many documents via their IATI data: here’s a random example.

My observation is really quite simple: when using interfaces such as devtracker of d-portal, people often end up at these documents.

People usually do this after a search and filter process: “show me the Agriculture activities in Tanzania”. On receipt of a list of relevant projects, people then scan through, check, and hone in on ones they are interested in. On landing on a page about one of these activities, then the documents are a hotspot for attention. Often, people are both surprised and engaged by the access to these documents: it seems to make transparency more evident than the numbers and dates we more readily associate as data.

And, thats it! It’s nothing groundbreaking, but worth noting.

What does this mean for the concept of data use and IATI? Perhaps this presents the community with a couple of challenges:

1 - would we consider people accessing documents via IATI data to be data users?
2 - and if documents are useful, then would the publishing them instead of data be acceptable (it’s possible to point to a results document, for example, without publishing results data , for example)?

Underneath all this is probably a discussion about the relevant merits and uses of quantitative and qualitative information. I want to stress, however: this is an observation from people using IATI data…

Has anybody witnessed the same?

2 Likes

I think frequently the documents provide more information and are easer to read and comprehend than the long list of results that frequently get jumbled in d-portal so at times portions of the results don’t make sense at all. I think the biggest challenge with the documents that sometimes have the best and most useful information is that it’s very time consuming to search through them and find the information that has what you are looking for.

1 Like

At ZZ we have been extracting data from external documents, enriching search and context analysis. not implemented in any front-end yet, but available in the OIPA API.

@VincentVW was this released into the Master version yet? Or still in development branch?

1 Like

The ‘search in document-link texts functionality’ It is on the master branch of OIPA. We don’t have the documents indexed on any of the instances we host though. We did try that once and the disk was full after an hour, filled with 500+ mb PDFs.

If anyone’s wondering how it performs; indexing the text of the text documents (~620k documents atm) takes long but once indexed it should be quick to search through. For example, retrieving all activities that have a document attached that names the word ‘cashew’ should at most take a few seconds.

[sorry to be a bit technical, but…] If storage is an issue, can’t you download, index, dump? Surely each PDF only needs to be stored until indexing of its contents is complete?

[np, I’ll try to keep it to a short tech answer haha] True, Its not really a problem, we just need to up the storage if we want to enable this. The upsides to storing the documents are that we:

  • Don’t have to re-download the document when we improve our document indexing capabilities.
  • Can do checksum comparison when updating the indexes (only download the document, if it didn’t change, don’t update the indexes, better for performance) [as Andy pointed out, this is not a valid reason and we actually store the checksum]
  • can host the files and be a mirror for them if ever necessary? Not sure if that is ok to do, definitely not the primary reason.

Polite reminder > @andylolz @VincentVW @siemvaessen

I know you’re keen :slight_smile:

3 Likes

Absolutely – you’re right. It’s okay, we took it to twitter.

1 Like

From the research I have done on in-country data use, this was also absolutely the case, people needed very basic data to find/identify a project, and then for anything more complex, either wanted the documents, or the email address of the project manager.

I think this has implications for IATI development i.e. even if we keep adding more features, can we realistically expect the standard to compete with the linked documents e.g. for M&E, or results tracking, or descriptions of the target populations, or even for the detailed locations. I think maybe not.

@matmaxgeds thanks. Yes. As the data standard becomes more complex in scale and implementation, the observation that people value the narrative documents seems “inconvenient”. As @ariag rightly adds, these documents then need time to digest and comprehend.

I guess my original observation was simply: this is a thing. If we want to talk about data use, then we should be prepared for the fact that much of this may be right-clicking and saving PDF documents. And (for the tooling folk): perhaps we can respond to these user stories.

3 Likes

It was just pointed out to me that USAID added links to evaluation documents and like the world bank project appraisal documents they are long. Potentially good info that is actually helpful and useful, which just makes me want a tool to search and filter so that data is the trifecta of available, useful and accessible. On my first thought that tool would let you search for services provided (results of all kinds) and subnational locations (in addition to the search criteria that is available already) before opening the doc, so it can be a way to find the projects that meet the criteria you are looking for. Maybe I am dreaming and asking for too much, but if it’s possible maybe there can be a way to figure out how to make it happen??

Just a very quick update/fyi on this - Shi noticed this conversation/other feedback about documents in IATI and made some subtle changes to make the document links on an activity page (such as this one) a bit more prominent in d-portal. A small change but hopefully useful for users interested in an activity’s docs!

3 Likes

Subnational locations would be taken care of in project data, ideally (can’t wait to test the auto-coder from OpenAg!). So you’d use the geolocation data to identify projects that meet your geo criteria, then assume the documents will be relevant (rather than identifying projects via documents).

@Matt

very cool! Maybe it can be made collapsible e.g. give a taste and click to see the rest - in case adding documents really takes off and projects have hundreds!

1 Like

… A hidden aspect of document links could be that it’s feasible for a publisher to create a very basic IATI activity (just observing the minimum for the schema) and then add narrative, budget, results and conditions (for example) as PDFs … the IATI Document Category codes would support this.

Would people consider that against the spirit of IATI , or welcome transparency?

(side note: apparently, PDF is now 3-star open data - via @rory_scott @andylolz)

@matmaxgeds there’s the option to show/hide various aspects now on the individual project pages, including the list of documents - thanks for the suggestion (useful on pages like this! http://d-portal.org/ctrack.html#view=act&aid=SE-0-SE-6-7100174403-BIH-15150)

2 Likes

@Matt ace, thanks (and nice choice of example project!)

wow

Might be useful to have a count and some indication of the category

+1!

Also found this one interesting, as an example of project with lots and lots of documents. (I came across it in relation to another discussion with @matmaxgeds - related projects reported by GAC and DFID, I want to start a thread on comparing the 3, both in terms of data published and presentation on portal)

@YohannaLoucheur - yes, please start the thread, in addition to our discussion, I think that different different IATI portals showing different data is going to start being a v. serious problem, true for documents, but v. bad for numbers - which is the ‘right one’? Or do you have to start quoting the source portal when telling someone to ‘get it from IATI’.