[sorry to be a bit technical, but…] If storage is an issue, can’t you download, index, dump? Surely each PDF only needs to be stored until indexing of its contents is complete?
[np, I’ll try to keep it to a short tech answer haha] True, Its not really a problem, we just need to up the storage if we want to enable this. The upsides to storing the documents are that we:
- Don’t have to re-download the document when we improve our document indexing capabilities.
- Can do checksum comparison when updating the indexes (only download the document, if it didn’t change, don’t update the indexes, better for performance) [as Andy pointed out, this is not a valid reason and we actually store the checksum]
- can host the files and be a mirror for them if ever necessary? Not sure if that is ok to do, definitely not the primary reason.
From the research I have done on in-country data use, this was also absolutely the case, people needed very basic data to find/identify a project, and then for anything more complex, either wanted the documents, or the email address of the project manager.
I think this has implications for IATI development i.e. even if we keep adding more features, can we realistically expect the standard to compete with the linked documents e.g. for M&E, or results tracking, or descriptions of the target populations, or even for the detailed locations. I think maybe not.
@matmaxgeds thanks. Yes. As the data standard becomes more complex in scale and implementation, the observation that people value the narrative documents seems “inconvenient”. As @ariag rightly adds, these documents then need time to digest and comprehend.
I guess my original observation was simply: this is a thing. If we want to talk about data use, then we should be prepared for the fact that much of this may be right-clicking and saving PDF documents. And (for the tooling folk): perhaps we can respond to these user stories.
It was just pointed out to me that USAID added links to evaluation documents and like the world bank project appraisal documents they are long. Potentially good info that is actually helpful and useful, which just makes me want a tool to search and filter so that data is the trifecta of available, useful and accessible. On my first thought that tool would let you search for services provided (results of all kinds) and subnational locations (in addition to the search criteria that is available already) before opening the doc, so it can be a way to find the projects that meet the criteria you are looking for. Maybe I am dreaming and asking for too much, but if it’s possible maybe there can be a way to figure out how to make it happen??
Just a very quick update/fyi on this - Shi noticed this conversation/other feedback about documents in IATI and made some subtle changes to make the document links on an activity page (such as this one) a bit more prominent in d-portal. A small change but hopefully useful for users interested in an activity’s docs!
Subnational locations would be taken care of in project data, ideally (can’t wait to test the auto-coder from OpenAg!). So you’d use the geolocation data to identify projects that meet your geo criteria, then assume the documents will be relevant (rather than identifying projects via documents).
very cool! Maybe it can be made collapsible e.g. give a taste and click to see the rest - in case adding documents really takes off and projects have hundreds!
… A hidden aspect of document links could be that it’s feasible for a publisher to create a very basic IATI activity (just observing the minimum for the schema) and then add narrative, budget, results and conditions (for example) as PDFs … the IATI Document Category codes would support this.
Would people consider that against the spirit of IATI , or welcome transparency?
(side note: apparently, PDF is now 3-star open data - via @rory_scott @andylolz)
@matmaxgeds there’s the option to show/hide various aspects now on the individual project pages, including the list of documents - thanks for the suggestion (useful on pages like this! http://d-portal.org/ctrack.html#view=act&aid=SE-0-SE-6-7100174403-BIH-15150)
+1!
Also found this one interesting, as an example of project with lots and lots of documents. (I came across it in relation to another discussion with @matmaxgeds - related projects reported by GAC and DFID, I want to start a thread on comparing the 3, both in terms of data published and presentation on portal)
@YohannaLoucheur - yes, please start the thread, in addition to our discussion, I think that different different IATI portals showing different data is going to start being a v. serious problem, true for documents, but v. bad for numbers - which is the ‘right one’? Or do you have to start quoting the source portal when telling someone to ‘get it from IATI’.
Im not sure I fully understand from the examples cited (d-portal | open Sida) - but look forward to the new thread!
Slightly edited my previous message, hoping it’s a bit clearer why I posted the SIDA example.
In terms of the 3 portals showing different things, it’s totally normal - they are different projects, one isn’t “more true” than the other. But each is missing some useful data, so that’s interesting to compare and contrast. They also present the information in very different ways - again interesting to compare, especially for those of us trying to improve presentation tools. Would be great to have user feedback (hence the thread, hopefully later today).
Thanks @YohannaLoucheur
Yes, interesting how different portals use the data. I think that could be a whole new Data Use Observation thread - agree? Could include screenshots too!