Using IATI data - uniquely identifying transactions

stevieflow · November 1, 2018, 9:01pm

Hi everyone

I wanted to share something @David_Megginson & @ximboden encountered via importing IATI data into the Financial Tracking System (FTS).

This might is an issue that some may have hit before. It’d be great to hear from others who regularly ingest IATI data into their projects, to understand if this is a headache, or not, or whether you have a workaround. Thinking of @Herman @markbrough @tdavis @siemvaessen @matmaxgeds and others…

Scenario

With FTS, we want to import IATI activities, to construct Flows. A (obvious) vital cog to this is the IATI transaction.

We also know that IATI publishers will be adding transactions to their activities over time - which is expected behavior.

So -

On first import an activity has two transactions.
When the activity is updated, a month later, two more transactions have been added by the publisher. The activity now has four transactions.
When we import the updated activity, we know it’s some data we’ve encountered before, because the activity-identifier is present. That’s helpful!
However, the transactions have no identifier. Our import now gives us six transactions (the first two, plus the new four)!

I can picture people frowning at this! Of course, our system and data import routines could have some extra process to generate and store a hash of the data, for example - or even check the transactions dates and values etc etc. We understand that, but the point is: do we think it efficient to shift this overhead from the publisher to data users, each of whom would have to work out the problem and compute a workaround?

A solution

The IATI transaction element does actually include a @ref attribute, which could be use to declare an identifier for a transaction. For example:

<transaction ref="1">
<transaction ref="2">
<transaction ref="3">
<transaction ref="4">

Or, more verbosely:

<transaction ref="EX-AMPLE-ORG-ACTIVITY100-transaction1">
<transaction ref="EX-AMPLE-ORG-ACTIVITY100-transaction2">
<transaction ref="EX-AMPLE-ORG-ACTIVITY100-transaction3">
<transaction ref="EX-AMPLE-ORG-ACTIVITY100-transaction4">

Or even:

<transaction ref="oranges">
<transaction ref="apples">
<transaction ref="pears">
<transaction ref="cashews">

(OK, that last example isn’t advisable, but @reidmporter might like it)

For our import to FTS, we know this solution actually works! When publishers include a @ref, it acts as an identifier we can check, to understand if this is a transaction we’ve seen before.

We are aware of scenarios such as transactions being edited, or data being added to a transaction (a receiver-org activity-identifier, for example), but would appreciate we try and stick to the main point here: should we strive to include a @ref that can act as a transaction identifier within any activity, to help data users?

It’d be great to hear from others.

David_Megginson · November 1, 2018, 9:23pm

Thanks, @stevieflow — that was a good overview of the problem.

We’ve always assumed that activities would be first-class objects in an IATI user’s database — we attach identifiers not only for cross-references, but so that the user knows if they’re dealing with the same activity they saw last time (possibly with some updates), or an entirely-new activity.

Unfortunately, without transaction/@ref, we can’t keep track of transactions the same way. So if a data user does extra work, like linking a transaction to one of their own business objects (eg a budget line in a partner country’s accounting system), they have to redo that work every time they read an updated IATI activity file.

matmaxgeds · November 4, 2018, 8:55pm

Thanks from me too for setting this all out @stevieflow

Because of the difficulties of transactions not having any unique reference all the imports I have worked on have been to present all the transactions available to the user importing the activity, and allowing them to select the specific transactions to import. As you can tell - this means that using IATI data is semi-automatic at best - and only one activity at a time - certainly not the sales pitch that is circulated at the high level meetings about machines talking to machines etc! I know the system @markbrough and I worked on in Bangladesh allowed for a degree of dynamic import e.g. once a user has initially semi-automatically imported an activity, then the system can interpret those choices going forwards to keep it up to date - but Mark is much better placed to explain how.

I think transaction refs would be great (what can we do to make this happen?) - and the publisher systems I have seen either have them anyway, or if not, could easily make them by concatenating timestamps with some other transaction data.

My only other thought was that these references only have to be unique within the activity (could they even be just a counter that increases by 1 for each transaction?) and in combination with the activity code, they would then be globally unique - and that would be very cool to be able to instantly point to a single transaction.

Matt

tdavis · November 5, 2018, 9:07pm

Hi Steven, indeed we have encountered this issue. To avoid duplicates when importing IATI data to AMP, we check if the transaction already exists in the AMP activity by checking if the transaction type, adjustment type, transaction date, currency and transaction amount in the source activity match any of transactions in the AMP activity. Having a unique identifier for the transaction would make that a simpler process.

andylolz · November 7, 2018, 7:49am

As mentioned above, transaction refs do already exist in the standard. From the definition, it’s apparent that they weren’t originally intended for this purpose, though. They’re missing some guarantees that would make them useful for this purpose.

For this to work, transaction refs would need to at least be guaranteed immutable (i.e. can’t change over time), and (as @matmaxgeds mentions) unique within an activity. This would mean changing the definition.

bill_anderson · November 6, 2018, 4:20pm

I know FTS is different from an AIMS, but I would still opt for this advice. Replace all transactions on every import.

Handling Activity Updates

As an activity progresses your development partners will continue to keep their project and financial management records up to date and modified versions of the activity will be published. The way IATI works is for the whole record, not just those parts that have been changed, to be republished each time. In a typical IATI reporting scenario most of the details of the activity are in place when first reported, and updates contain new financial transactions, activity budgets (forward-looking predictions of spend) and, where reported, information on results.

If your import system is configured to alert you to any and every difference between your partner’s version of an activity and your own record, all the changes you made when the activity was new (as suggested in the previous section), will be flagged up as discrepancies to be resolved every time the activity is updated. This will bog your staff down in hours of unnecessary time wasting. For that reason it is suggested that:

Financial transactions, budgets and results are automatically accepted during updates.

Other fields that you decide (in your configuration) you will never modify may be added to this list.

You NEVER make any changes to these fields. (See the next section on fixing data in these fields)

All other fields are NOT updated. If your system is clever enough to alert you to when your partner has made a change (from their original) to one of these field – this is a bonus.
https://sites.google.com/site/useofiatidataincountrysystems/the-guide/auto-configure

markbrough · November 6, 2018, 5:33pm

Hi @stevieflow thanks for raising this isssue. Here is the methodology we used in Bangladesh for handling updates. We use a similar process to @tdavis’ approach in AMP. We first see whether the transaction already exists in the AIMS, considering a transaction as unique based on the transaction date, the transaction type and the transaction value (see the code here). We stop import if there is data in the AIMS that is not in IATI, on the basis that a user may have entered data into the AIMS that they wouldn’t then want overwritten by IATI (because the effect would be to either double-count funds or to remove existing information).

This seems to have worked in practice (because it is unlikely to happen that a transaction with identical values for (date, type, value) is retrospectively added on the same date as an existing transaction). Having said that, I think I agree with @bill_anderson that wiping all the transactions and then inserting new ones is the safest way of proceeding. I guess it would just make things slower, though possibly not that much slower, I’m not sure.

It agree it would be nice to have unique references for transactions and perhaps we can move that way over time, but in the interim we will need techniques like this to continue to handle this kind of issue.

danmihaila · November 7, 2018, 2:02pm

My 2 cents about this.
When I’ve been in DRC with @bill_anderson for the first IATI Pilot, we wiped all existing transactions (for 3 donors) and imported the transactions coming from IATI feeds. It was a matter of trust and decision with local AIMS which decided that the IATI transactions were more accurate compared with what they had.
This can also happen if there is an update: old transaction will be dropped and the system will get the latest published transactions.

It will be indeed great if transaction could carry an ID (similar with activities or organizations). I assume it is not an easy task, but it could be a composite ID: date, type, value, ref, etc.
As recommendation: at AIMS level there could be some rules that will allow transactions to be updated, dropped, not touched in case IATI feeds are bringing in or other input.

samueldjohnson · November 7, 2018, 4:07pm

I’m very new to IATI (just trying to get my head around it all), but I’ve worked with a wide range of information systems, and although your suggestion to replace all transactions makes sense, the problem with doing this without an identifier is that you’re very limited in the processing and analysis that you can then do with this data. If, for example, you add further coding or link this data to other datasets, then without identifiers you’ll have to re-do this coding/linking for the full history of your data every time it’s re-imported - whereas if you use identifiers, then even if you refresh the full dataset, your existing codes and mappings will still work.

Herman · November 9, 2018, 8:39am

When the trx id is a composition of existing trx elements, there is no need anymore for a seperate technical id (it can be derived automatically if needed, so there is no need for complicating the standard). So the question seems to be what are the elements which make a real world trx unique and immutable?

matmaxgeds · November 9, 2018, 9:04am

I understand the logic, but on that basis, shouldn’t we also campaign for removing the activity IDs on the basis that they complicate the standard, as presumably they could also be replaced by an ID made from a composition of their elements?

I suspect that either making a composite ID that is valid in all the edge cases is too complex (is it not valid/possible to have two transactions on the same day for the same amount?), or that there is also something inherently useful about having an independent ID e.g. as a simple reference - assuming that we are not suggesting to hash the unique information for each transaction to get the composite ID?

bill_anderson · November 9, 2018, 11:07am

The intention of the (optional) transaction/@ref was to allow publishers to make a link back to their own systems.

To change the standard to make this mandatory would involve:

A major upgrade
Backfilling all existing activities

While I understand the problem, I don’t think transaction/@ref is the solution.

Firstly, my reading of the community is that there is currently no appetite for a major upgrade.
Secondly, it is possible that members could object to such a proposal.
Thirdly, if actioned, it would take a further couple of years to be implemented.

Herman · November 9, 2018, 11:32am

@matmaxgeds The comparison between the activity-id and the transaction-id is an interesting point you raised. The key difference i.m.o. between those two identifiers is that the activity-id should be globally unique and the transaction-id is publisher-unique. That is the reason why i.m.o. the transaction does not ‘deserve’ its own identifier. The use case for the transaction-id seems to be purely technical, and not functional.

Another question I have is if it is really necessary to have 100% correctness by handling all edge-cases, since IATI is not an accounting system?

I doubt that there are many real life transactions with the same amount on the same date to the same receiver with the same currency. When I find such transactions in an IATI publication, I treat them as duplicates since it happens that activities are being published in multiple files from the same publisher.

matmaxgeds · November 9, 2018, 12:25pm

@Herman - a) I am not sure I understand your technical vs functional differentiation - the case for a transaction ID is a technical thing that would help systems that use IATI data to function better (plus other stuff like direct links to specific transactions). b) It is not necessary to be 100% accurate but this change would make it easier to be - why wouldn’t we want that? Is the ‘accounting system’ reference your way of saying that you feel that the extra effort to use unique transaction references is not worth the benefit?

@bill_anderson - a) seven respondents (out of 9) said that having transaction IDs would be helpful/the best option, as they currently have to use a workaround that has some flaws identified in the thread. b) Would members object…it would be great to know - how do we ask them? c) If actioned, it would take several years to be implemented…sounds like we had better get started asap then! d) I am not sure why anything would have to be made mandatory - a first step could be changing the guidance to say that if you do use transaction references, please keep them unique within an activity?

bill_anderson · November 9, 2018, 1:51pm

Surely, from a user point of view you need to adopt a lowest common denomenator scenario. If some publishers don’t have refs you need logic to deal with them, so you might as well apply that logic to everybody. Personally I have no objection to changing the guidance, but, being pragmatic, I’m not sure it would help you that much.

For all our good intentions this forum is a bit of an echo chamber for quite a small group of committed enthusiasts

Here, but see ^^

Herman · November 9, 2018, 3:44pm

I.m.o. the IATI standard should only contain non-derivable atomic business object for which there is a common understanding(e.g. ‘currency’, ‘disbursement’, ‘sector’, etc.).

What I mean is that an unique identifier can be derived by the data ingesting system (e.g. FTS) by combining the existing transaction elements (as mentioned by @danmihaila). As you already mentioned, there might be some edge cases, but as long as those have a low frequency, it might be sufficient to produce sensible information. It is i.m.o. not necessary that IATI has the same accuracy as an accounting system (where those edge-cases would not be acceptable).

matmaxgeds · November 11, 2018, 9:24pm

I think this was the point of @stevieflow’s original post

I agree with @stevieflow that it would be more efficient if this was shifted to data publishers, rather than data users (especially as I suspect that most publishers have this data already), with the benefits outlined by @David_Megginson, and side benefits of a) no more edge cases, and b) the ability to directly link to transactions.

@bill_anderson - good point, without making it mandatory, we will need both the @ref and logic - but it seems like a step in the right direction (with no impact for those publishers who want to opt out) to me. For users who want to use data from secondary publishers, the @ref will become even more useful as an alternative to adding the publisher to lengthening list of factors needed to make a transaction unique.

@Herman - it sounds like you have had some bad experiences with accounting systems! However, I suspect that those using IATI for e.g. results data might prefer accuracy, as might those importing IATI data into systems like the FTS/AIMS, where things often need to add up a little bit more more precisely.

bill_anderson · November 12, 2018, 6:13am

Don’t want to sidetrack this conversation, but results data and accuracy are a bit of an oxymoron.

I think @Herman makes an important point. Neither IATI nor AIMS are accounting systems. IATI data may be derived from a selective report run off a publisher’s accounting system. Most AIMS are a couple of steps removed from treasury systems. With a bit of luck they both contain good management information, but that is not the same thing.

David_Megginson · November 27, 2018, 5:59pm

Bill – your suggestion to replace all transactions on every import is suitable for a system that passively tracks IATI reporting. For any system that involves active curation (e.g. transactions manually checked for duplicates, linked to non-IATI information like ministry accounting systems, etc) that requires an enormous amount of redundant manual effort on every import, which means that (at best) such systems will update from IATI less often because of the high cost.

I agree with leaving them optional, but I think it would be good (a) to update the guidance in the next major number release to require persistence and activity-scope uniqueness, and (b) to require them for specific initiatives.

In fact, there’s nothing inconsistent with the standard in formulating stronger requirements for specific applications (e.g. if you want your IATI reporting to be counted for humanitarian response plan or Grand Bargain counting, you must follow these additional constraints). But it sounds like making those constraints general (when transaction refs are present) in the next major release would be helpful to a wide range of users, not just FTS.