Terms of Reference for a new IATI Datastore

We would like to invite you to participate in a consultation on the Terms of Reference for a new IATI Data Store

In realising IATI’s vision that “transparent, good quality information on development resources and results is available and used by all stakeholder groups” it is the responsibility of the IATI secretariat to ensure a reliable aggregated flow of all data published to the standard is accessible. We need to provide a robust, timely and comprehensive data service which can be used by developers and data scientists to produce information products tailored to their specific needs.

The current datastore, developed in the early days of IATI, is not fit for purpose and needs replacing.

At a meeting of the IATI Governing Board in January this year it was agreed that it was now a priority to put a reliable and sustainable data service into place. The developer workshop held in Manchester in January had a brainstorming session on the desired functionality. This has formed the basis of these ToRs.

As our Technical Team is currently stretched (over and above its day-to-day commitments) with the consolidation and re-design of IATI’s many websites and web-delivered services, we advised the Board in its March meeting that we should consider outsourcing both the build and initial maintenance of a new datastore.

Your comments on our draft document are most welcome. If you could manage this in the next two weeks we would be very grateful.

Please add comments relating to specific text in the document itself, but add more general feedback in this thread.

6 Likes

Is there some user research for each of the 3 personas identified (and which @stevieflow wants to add a 4th) to supplement the work we did in Manchester? It’d be great to see the “so that…” part of the user stories. Reading through, I felt like it read a bit more like a requirement spec using parts of user story language, rather than user stories.

I’d be interested in the user research too. Is this something @IATI-techteam could share?

Thanks Bill for sharing these specs with us. Reading through the specs and all the comments already presented I see 3 crucial topics being discussed:

1 - Should the DS accept all publisher data or not? And if not, what are the acceptance criteria. Derived from this question is the question if an IATI data validator should be part of the datastore (DS) or not.

2 - Should the DS limit the users how often, how much and what they can search on? This question seems to be driven from perceived technological limitations.

3 - The requirement that all software developed for the IATI validator, should be written in pyIATI with as assumption that in the end the IATI technical team will maintain all IATI core software products.

Ad 1
Since the existing IATI data quality is one of major obstacles for using IATI data in practice, I think the answer on the first question should be no. This implies that the definition of the acceptance/rejection criteria should be a part of the design of the DS. It is argued by several people that data validation should not be part of the DS, to keep its functionality ‘lean & mean’.
Therefore I suggest that a separate data validator should be developed, possibly reusing the software already being developed by Rolf Kleef for a number of organizations, including the Netherlands MFA and the DFID.

Ad 2
Although I understand the inclination of the more technical people to limit the technical complexity of the DS, i.m.o. the user needs should drive the design process. Technology should follow (and not the other way around). We are not talking ‘big data’ here and there is a lot of experience in the market to design flexible and robust Data Marts.

Ad 3
When we want to have a chance to achieve any results the next 12 - 18 months, I think we should reuse as much existing software as we can. Rebuilding everything from scratch with a yet unproven pyIATI library, would preclude the reuse of any existing software components (e.g. the Data Validator software mentioned under point 1).

I think we should seriously consider if all IATI core products in the long term must be maintained solely by the IATI technical team. In a loosly coupled ecosystem of applications/software components, it is i.m.o. quite possible to have multiple vendors being responsible for the maintenance of IATI core products. The most important crucial responsibility of the IATI technical team in such a scenario, would be to define and guard the overall conceptual and technical architecture, especially with regards to the interfaces between the software components. This would have the additional benefit of limiting the resources needed from the IATI technical team, which is already overburdened as it is.

@herman

On Ad 1:
I feel this is derailing the main objective of a DS ToR tbh. Stating supplier preference is something we should avoid at this stage. Lets keep this exercise very agnostic and functional.

Ad 1 is based on to either accept all data or not. Lets talk about acceptance criteria first then before talking about the many validators out there.

On Ad 2
I believe the ToR has been produced with the user need in mind (Userstories), or that’s my understanding when reviewing the ToR. I see the user stories definitely need more work, but they are in essence build on the user needs.

Perhaps as part of the RfP it should require an analysis phase with different types of users panning out user stories that may use DS, rather than speccing all the user needs upfront.

And what is a ‘Data Mart’?

On Ad 3
On Ad 3 I agree about re-using software. But re-using for the sake of re-usability does not make sense to me.

I am not too sure either about rallying around pyIATI w/o any real-life scenario’s on how pyIATI is currently being used by IATI Publishers and consumers alike. I am not aware of any platforms/tools that currently make use of pyIATI.

I also agree with the scoping the technical architecture that would allow for multiple vendors to take part in an eco-system of IATI data services, with the IATI technical team leading the technical architecture effort, oversight and overall strategy for technical development on the short and long term.

Ad 1:
The wording was too strong (‘possibly’ should be ‘e.g.’). Discussion should indeed be about requirements and acceptance criteria.

Since the consensus seems to be that that data validation is not a part of the data store functionality, this discussion is not directly relevant for the data store requirements. What is important though is to make sure that bad quality data are not fed into the data store and that we have clear criteria what constitutes bad data.

Ad 2:
A ‘data mart’ is a well known data warehouse component, which enables high volume data queries with flexibility and good response times.

Ad 3:
Agree, reuse should not be done for the sake of reuse. It should be done to avoid duplication of effort, limit throughput times and reduce costs. Especially important given the limited IATI budgets.

I wanted to bring out one conversation from the document. This may seem a somewhat esoteric discussion for most of the community but I think it is important for ensuring we get a useful and working Datastore as soon as possible.

For me the Datastore is the #1 most important tool for enabling data use at country level. Given that pyIATI has not yet been implemented in any user-facing tools, I am against using the Datastore as a “guinea-pig” to attempt to demonstrate the usefulness of pyIATI – and thereby potentially holding up Datastore development even more. If there are useful features of pyIATI then the developers should want to use this library anyway. So I am against including this as a requirement in the ToRs.

Finally, would be great to understand how this conversation proceeds from here! Will a more formal ToR / specifications document now be shared for community feedback? Thanks!

3 Likes

My understanding from the discussion at MA was that the RFP has been finalized and we are actively soliciting bids. With that understanding I’ve been trying to find the final version of the specs/ToR so I can attempt to get my head around what the query capabilities are going to be from the new and improved datastore. Above is a link to drafts but I can’t seem to find “final” documents. Can someone point me to the correct place on the site to find the RFP currently circulating.

Thanks in advance for your help.
Kind Regards,
Michelle

Ah, very good point @Michelle_IOM! It’s available here:

https://www.ungm.org/Public/Notice/74108

The deadline to submit proposals to create IATI’s new Datastore has been briefly extended to 23:59 UTC, Tuesday, 7 August 2018 to avoid closing the bidding on a Sunday night. Please see the UN Global Marketplace for more.

Are bids going to be made public or portions of information about bids? It would be interesting to see who responded and generally what folks are proposing to do, mainly out of curiosity.

2 Likes

Hi @IATI-techteam,

I think I read somewhere that the contractor for this had been selected, is there anywhere we can follow the development progress?

I also wanted to know who to ask a) to confirm whether the query API will remain stable between the current datastore and the new one and b) whether the datastore will supply the original data, as well as the version mentioned in the ToR that has been transformed to other versions of the standard?

Thanks a lot, Matt

3 Likes

Hi @IATI-techteam,

Sorry to impatiently bump this, but we are in the middle of development and needing to take decisions - I think I have identified that Zimmerman & Zimmerman are building the new Datastore - if you can confirm, I can contact them directly - or can you confirm @siemvaessen and perhaps share some more information on what the new datastore will be able to do?

Thanks a lot,

Matt

1 Like

Hi Matt,

Contract process in place, but nothing signed yet. This is the slow train…

If you have any questions, do send me a message offline.

Thanks, Siem

Thanks @siemvaessen - slow is fine by me - just keen to know which train is coming :slight_smile: will message offline to ask more details.

hi @siemvaessen - I don’t have your email address, so I emailed the generic Zimmerman one - checking whether it has reached you, or if there is a better way - should I send you a twitter PM to get sent your email address back?

Hi @matmaxgeds did not see it. do send a PM or use my first name @ companyname.nl

I believe Zimmerman & Zimmerman and the IATI secretariat sat down to define final technical requirements on the Friday after the TAG in Kathmandu. It would be great if someone from the secretariat or Zimmerman & Zimmerman could share this so we all have an idea what’s currently in/out of scope.

/cc @matmaxgeds @siemvaessen @IATI-techteam

1 Like

Hi @matmaxgeds @andylolz

@IATI-techteam and Zimmerman & Zimmerman indeed have a kick-off post IATI TAG in Kathmandu. Based on that conversation Zimmerman & Zimmerman requested some clarifications. ZZ extracted point by point tasks from the original ToR.

Those tasks have been transferred to a Github project running under the OIPA repository, which anyone can (re)view here.

Essentially the focus forward is to plan and deliver Phase 1 from the ToR as an initial deliverable early 2019. Any other outstanding issues that Zimmerman & Zimmerman have for OIPA will also be included into that same project, otherwise we have to manage multiple boards. We expect Phase 2 to follow directly so we are currently aiming at delivering both phases in Q1 2019.

On the topic of non ToR work, we should find a way on how to create and maintain an inventory of potential features/requirements/nice-to-haves etc. by and for the community and come up with a good way of ‘upvoting’ or ‘downvoting’ them?

2 Likes

Thanks @siemvaessen - great to see the detail.

Can you or @IATI-techteam explain a bit more about what a ‘standardised output/activity’ is, ideally giving an example of a standardised activity with a diff against the original XML?

Can I also confirm whether this will be a breaking change to the current datastore API endpoints? If so, I think we need to discuss how this is going to be managed for all the existing tools that use the current datastore API. Is someone able to share a list (and/or perhaps start logging) all the current datastore API requests so we can see what level of issue this is going to be?

I also wanted to comment on the conversion to USD - this doesn’t seem like a task for a datastore to me, rather it is the role of the different tools that draw on the datastore as a data source, and repackage it for users, to provide their own exchange rates and in doing so, explain their choice of rates to the users.