I’ve been working on some notes below - but the more I look at this, the more confused I’m finding I’m getting. So posting for now - in the hope this parses clearly enough. But planning to return to it more later too - perhaps with a few more worked examples.
(initial) Problem analysis
It would be interesting here to understand what business logic (if any) data users are currently employing to cope with the fact that, if working with data from different sources, a user has to look at both activity and transaction level to understand funding flows, and to find out what sectors an activity is in.
Whilst in theory the rules might aim to get at 100% of the financial flows, I’m not sure how far it works in practice.
With the current rules, I think the current business logic (for sector as an example) would go something like this:
If iati-activity/sector
exists but not at the iati-activity/transaction/sector
then:
…For each sector/@vocabulary
:
… … Sum up sector/@percentage
and check it equals 100
… … For each sector in that @vocabulary
… … … Multiply the total activity (budget? planned disbursements? commitments?) by sector/@percentage to get total allocated to that sector
If iati-activity/transaction/sector
exists but not iati-activity/sector
then:
… For each iati-activity/transaction/sector/@vocabulary
:
… … For each transaction using that sector vocabulary:
… … … For each mentioned sector in that @vocabulary
… … … … Add up the value of commitment transactions to get the total allocated to that sector
If iati-actiity/sector
and iati-activity/transaction/sector
exist then:
… What? Error condition.
For faceted browse tools which want to find all the activities in a given sector, they have to look at both iati-activity/sector
and iati-activity/transaction/sector
.
(Would be good to hear if anyone is using business logic like this…)
But, in cases where we might have two sector coding vocabularies, this creates problems. For example, an activity could be coded:
- At the activity level with the DAC Vocabulary (@vocabulary=‘1’)
- At the transaction level with an agriculture vocabulary (@vocabulary=‘99’)
This breaks the rule that “Sector can also be reported at the transaction level rather than the activity level. Sector must only be reported at EITHER transaction level OR activity level.”
There is also a rule for iati-activity/transaction/sector
which states that “This element can be used multiple times, but only one sector can be reported per vocabulary.”
- i.e. Each transaction can only be tagged with one sector from each vocabulary
(I think this last issue is something we’ve not picked up properly in our analysis to date…)
Possible responses
(1) Require sectors/countries/regions at the activity level, but make them optional at the transaction level
A validation rule might be added along the lines of:
The value of transactions (commitment) to a sector should not exceed the value of the budget/planned disbursements (?) [1] for that sector as calculated using activity level sector percentages
Whilst the logic to calculate this isn’t entirely trivial - it would just be a job for validation of data, not for individual users.
This change would make things easier for users: they can get high-level summaries at the activity level, and dig deeper at the transaction level.
From the publisher perspective, it introduces some redundancy, and the need for publishers who have transaction-level data to calculate the activity level information, but otherwise it introduces little extra burden.
This would impact publishers who are currently providing only transaction level sectors/locations etc, but it seems to me from the dashboard that might not be very many publishers right now.
(2) Adjust the restrictions to only apply to OECD DAC Codes, or just to be per-vocabulary
For example, updated rules could state something along the lines of:
Any particular sector vocabulary must only be reported at EITHER transaction level OR activity level. Any individual vocabulary should not be used at both levels.
This doesn’t really deal with the recipient country issue though.
(3) Propose an alternative <tag> element for the kinds of coding we are looking for
This would help with the sector issue, but not country.
Tag could be defined differently from sector as:
- Any concept associated with this activity or transaction that indicates the nature of the funding or activity
I’m generally cautious about adding new fields, but if the semantics of the tagging we are aiming for is different from the semantics of sector, then maybe we should have a different field.
Notes:
[1]: I realise here I’m not entirely clear on the business logic for aggregations from the transaction level and what it equates to. Clarifications welcome.