Modify definition of secondary publisher (included 2.03)

any secondary publisher that commits significant unfunded effort to surface primary publisher open data as IATI datasets would need to reconsider that a primary publisher could simply “adopt” the datasets (download, retag and upload) without acknowledgement/engagement, not upload any new datasets or stop at the next budget/election, and simply marginalise the secondary publisher with the net result efforts are stopped

This topic has been included for consideration in the formal 2.03 proposal

Notes from consultation calls w/c 3rd July

Discussion:
There were some questions around double counting; the difference between ‘secondary publisher’ and a ‘proxy’ was explained
There was agreement that more use cases would be good in addition to a clearer definition.
IATI tech team to provide more examples of secondary publishers and more use cases on what information is being published by existing secondary publisher.

Outcomes:
The proposal was reviewed by those on the call and there was no objection from the group.

According to yesterday’s update, this modification is listed as having consensus.

This status is not clear to me from the above. If people asked for more examples of secondary publishers and use cases, does it not mean the proposal is not fully supported (or at least fully understood)?

I agree with @YohannaLoucheur. It seems no harm is done if we take some more time to find out what is the business case.

If someone Is of the opinion that harm is done and explains why, we might have our business case.

I think the use case is clear.

Swedish SIDA publishes another institution’s activities but is NOT a secondary publisher - because it is mandated to publish the Ministry of Foreign Affairs activities as primary data. The current definition makes no distinction between this case, and publishers such as FTS or AidData.

Ok, from that perspective the definition change is fine with me.

For my understanding: does the definition of secondary publisher mean that when an activity is published by a secondary publisher, it has already been published by the primary publisher AND that a different IATI identifier is used by the secondary publisher than by the primary publisher?

Cannot find any guidance in standard about this topic, but maybe I am missing something.

No.

  • OCHA’s FTS publishes ALL its data irrespective of whether the primary source publishes to IATI.
  • The US Foundation Center has recently published “$4.3 billion worth of grants from nearly 1,900 funders to more than 3,000 organizations around the world”, but “to avoid duplication of data on the IATI Registry, we have removed funders already publishing to IATI from our IATI data”

Yes

NB that on Standards Day there was another proposal …

Add attribute reporting-org/@secondary-unique
“A flag indicating that this activity, reported by a secondary reporter, is not reported to IATI by a primary publisher and can therefore be expected to be unique.”

… which wasn’t taken forward.

@bill_anderson Maybe not for the 2.03 discussion, but would it be an idea to use exactly the same IATI identifier as the primary publisher, when you are republishing an activity as a secondary publisher?

Now there is no way to find out that the same activity is published twice with different IATI identifiers, leading to double counting. Doesn’t that violate a core IATI principle: publish once, use often? When using the same IATI identifier, you can at least identify that an activity is a duplicate.

Just to clarify, I assume you mean using the same Activity ID as the primary publisher?

Would support this 1,000%. In fact, I naively assumed this was the case… Not using the same Activity ID should be a cardinal sin of republishing.

Reusing the same IATI identifier – albeit for the same (republished) activity breaks a standard ruleset rule:

It MUST be globally unique among all activities published through the IATI Registry

I’d suggest instead a new RelatedActivityType. But I’d also agree that this is out of scope for 2.03 discussions.

@andylolz Yes I see your point. The concern is about the duplication of activities with all the risks of inconsistencies and double counting. Your suggestion to add a new RelatedActivity type might help at data use time to identify such duplications.

@reidmporter started a discussion about this subject in the
community zone

@YohannaLoucheur Yes, that is what I meant. @andylolz though has a valid objection against using the same identifier when republishing.

In fact, the problem isn’t so much the need for a globally unique iactivity identifier - it would remain unique - but the requirement that the activity ID start with the reporting org ID even in the case of re-publishers. (Side note: we should distinguish secondary publishers and re-publishers. This issue arises with re-publishers, not secondary publishers.)

Let’s say we publish an activity CA-3-D123456, and this activity is also reported/pulled to FTS. If FTS publishes the exact same activity again, they could name it CA-3-D123456. Why should they rename it some random FTS number in IATI data if it’s the same activity and this activity was already published in IATI under CA-3-D123456?

This rule seems to be the source of the problem:
“This MUST be prefixed with EITHER the current IATI organisation identifier for the reporting organisation (reporting-org/@ref) OR a previous identifier reported in other-identifier, and suffixed with the organisation’s own activity identifier.”

Should this rule be relaxed in the case of re-publishers?

@YohannaLoucheur: for my understanding, what is in your view the fundamental difference between a secondary publisher and a republisher? Don’t they both use existing IATI data, modify or add to these data, and publish the modified data as IATI again?

I would suggest that a secondary publisher is publishing data from organizations that don’t publish IATI data themselves - like Interaction or US Foundation (per examples provided by Reid and Bill). This creates low/no risk of double-counting.

Whereas republishing involves taking data from IATI and publishing it again, like FTS. In some cases they may add content to it (like could happen for instance if someone adds detailed agriculture codes, or geo locations), but for the most part it’s data already available in IATI format - hence high risks of double-counting.

Given this definitions, wouldn’t this suggest that:

1 - ‘Secondary publishers’ should use the organization prefix of the primary publisher in the activities id’s. There is no risk for publishing the same activity twice, since the primary publisher does not publish themselves. The secondary publisher is nothing more than a administrative service provider.

2 - ‘Republishers’ (e.g. FTS) should NOT reuse the already published activity identifiers of the primary publishers, since that would cause confusion about who is the original data owner. It would also introduce great risks for double counting. Republishers should additionally ALWAYS mark an activity as ‘Republished’ and preferably refer back to the original activity with the related activity type, as suggested by @andylolz . This would enable data users to easily distinguish between original data and republished data.

I am not sure though if the proposed definition of a ‘secondary publisher’ according to @bill_anderson matches with your definitions above.

Maybe the use of the term ‘secondary publisher’ is too confusing. Wouldn’t the terms ‘original publication’ and ‘republication’ be a better way to describe the status of the data? It looks more important to know if you are dealing with the original data or the reprocessed data, than to know that you are processing the data of someone else.

@IATI-techteam: The name of the attribute in the standard is @secondary-reporter, not @secondary-publisher (ref). Could the proposal be amended to reflect that?

@YohannaLoucheur: If both publishers use the same IATI identifier then it is not globally unique. I expect this global uniqueness is core to various systems, so I think it would likely be problematic to relax that (i.e. by allowing/encouraging republishers to use the same identifier). As an example: If I were to look up the activity on d-portal, what would I expect http://d-portal.org/ctrack.html?#view=act&aid=CA-3-D123456 to show? Should it amalgamate all information from all publishers of iati-activitys with that IATI identifier? I can anticipate problems with that.

When a new reporting org starts publishing, a ‘secondary publisher’ (of the reporting org’s activities) becomes a ‘republisher’. That change is outside of the control of the secondary publisher. So I don’t think we should expect the secondary publisher/republisher to declare which one they are, because I suppose that information could easily become inaccurate.

This proposal has been been included in the 2.03 upgrade. It can be viewed in the following two Discuss posts: