Redefine selected codelists as “Non-embedded” (included 2.03)

This proposal is part of the 2.03 upgrade process, please comment by replying below.

Standard
Activity and Organisation

**Schema Object** None

**Type of Change** Redefine embedded codelist as non-embedded

**Issue** The IATI standard contains a mixture of “embedded” and “non-embedded” codelists. Embedded codelists can only be modified through the formal upgrade process. Non-embedded codelists can be modified through a [light-touch consultation process](http://iatistandard.org/202/codelists/codelist-management/). In order to increase the flexibility and responsiveness of the standard only those codelists that impact on the functionality of data processing should remain embedded and all others should be redefined as “Non-embedded”.

**Proposal** Redefine the following Embedded codelists as Non-embedded.

  • ActivityScope; BudgetIdentifier; BudgetIdentifierSector-category; BudgetIdentifierSector; BudgetIdentifierVocabulary; CRSAddOtherFlags; ConditionType; ContactType; DescriptionType; DisbursementChannel; DocumentCategory-category; DocumentCategory; GazetteerAgency; GeographicExactness; GeographicLocationClass; GeographicLocationReach; GeographicVocabulary; GeographicalPrecision; IndicatorMeasure; LoanRepaymentPeriod; LoanRepaymentType; OtherIdentifierType; PolicyMarker; PolicyMarkerVocabulary; PublisherType; RegionVocabulary; ResultType; SectorVocabulary; TiedStatus; VerificationStatus
**Standards Day** Accepted in principle but check details of which should be moved

**Links** http://bit.ly/2m1jy70 Previous discussions - https://discuss.codeforiati.org/t/vocab-codelists-make-non-embedded/495

Migrating a comment from the previous discussion

By changing Codelists from Embedded to Non-Embedded, what are they deemed to be at an earlier version of the standard?

For example:

  1. You have a Codelist. It is currently Embedded.
  2. By this change, it becomes Non-Embedded.
  3. At some point, it is decided to withdraw certain values on the Non-Embedded version of the Codelist.

Are these values deemed withdrawn against versions of the Standard at which the Codelist was Embedded?

  • If yes, is this permitted? There is nothing under Codelist management or the withdrawal discussion to indicate whether it is permitted to withdraw a value from an Embedded Codelist outside an integer upgrade (it’s backwards incompatible since Embedded Codelists are a fixed part of the Standard).
  • If no, the versions of the Standard where the Codelists are each Embedded and Non-Embedded are backwards incompatible. As such, changing Codelists from Embedded to Non-Embedded would have to be an integer change.

Also, there similar questions about adding new values and whether they are deemed part of the Codelist for earlier versions of the Standard (but without the backwards-incompatibility problems).

Hi Hayden
Code lists i.m.o. should never depreciate old values since these values could legitimately have been used in past activities which are still published. A solution could be to flag old values. Values in non-embedded code lists which are no longer valid, could be flagged by 2 fields ‘valid from’ and ‘valid until’ where ‘valid until’ is empty when it is the current value.

In this way downward compatibility is guaranteed.

At 2.02 a method was added to deprecate (withdraw) codes through use of use of the status, activation-date and withdrawal-date attributes on Codelists. The proposal wasn’t, however, fleshed out to cover usage. For example, the following points were not answered:

  • What are valid values for the status attribute?
  • Given the stated values of active and withdrawn for the status attribute, what do they mean?
  • How are narrative modifications dealt with? withdraw the old value and add a new active one? Overwrite the existing narrative? Something else?
  • How serious is a withdrawal in terms of backwards-compatibility and use in data? Rule->Guideline? Rule->Rule? Guideline->Guideline?

As such, there stand the points raised in my previous post.

This topic has been included for consideration in the formal 2.03 proposal

I think the follow codelists may impact on the functionality of data processing:

These codelists don’t seem to be used anywhere:

Can we consider adjusting the language here? “Embedded” v “non-embedded” is not intuitive language, and several different sorts of codelists are mixed up together in the existing categorisation. How about something like:

  • flexible => can be adjusted between upgrades
  • core => can only be adjusted in decimal upgrades
  • third-party => can be adjusted between upgrades, should (generally?) remain faithful to external codelists.
2 Likes

Good idea - I’d add a +1 for this. IATI essentially maintains two types of codelists which currently fall under the ‘non-embedded codelist’ banner (‘flexible’ and ‘third-party’ using the terminology that you suggested).

In terms of language, could I suggest we user the term ‘replicated’ instead of ‘third-party’? This might better convey that we seek to simply copy the latest version of these codelists as soon as we spot they are released, regardless of their content (e.g. duplicate codes, non-backwardly compatible changes, etc). I would also suggest that changes to these codelists will not require consultation on Discuss, with changes noted on a changelog that is presented alongside the codelist on the IATI Standard documentation pages.

1 Like

:thumbsup: agree – ‘replicated’ better conveys the desired meaning.

Eek… This suggests losing status="withdrawn" code information… Is that right? I think it’s probably preferable to maintain that information for replicated lists :sweat:

This suggests losing status="withdrawn" code information… Is that right?

Our (mine / Dale’s) current thinking is that the status attribute should be better defined to accommodate the various types of state that a Code may be in. This would lead to something like the following list of statuses:

  • active
  • modified
  • withdrawn
  • removed
  • external

With these statuses that a Code may have, a Code may be withdrawn (marked with status="withdrawn" (or some language-independent equivalent)) from a Replicated Codelist, but never removed (physically removed from the Codelist).

It would also provide a method of maintaining previous definitions of Codes so that Replicated Codelists clearly indicate when a third party has changed the definition of a Code. We’ve not fully looked at how implementation of this part may work.

Okay, cool. But it should be clear that ‘replicated’ doesn’t mean it’s a direct copy, because the list also includes withdrawn codes that are no longer part of the original.

Names & descriptions of third party codes are changed quite frequently. I worry you’ll make a rod for your own backs if you attempt to track these changes.

I’m not sure I can guess what the external code means here.

I really like the premise of streamlining the third party codelist management process. But I’m concerned that further extending the status options could begin to make that process more involved.

This discussion digressed quite far from the proposal! I wonder if renaming codelist types (i.e. embedded; non-embedded) should become a new thread. Is it actually a doable thing in the near future?

Btw I suspect that the meaning of embedded & non-embedded changed over time, but the names were never updated. The description of the non-embedded codelist repo is: “IATI codelists that are derived from third party lists.” which is not entirely correct. Pretty sure I’ve seen other muddled definitions elsewhere.

Was thinking about this and policies for externally-managed codelists. I understand there is an argument that including status="withdrawn" information is useful so that withdrawn codes that are referenced in publisher’s datasets can be identified later.

However is there an argument that IATI should not store modified or withdrawn codes for these lists precisely because external codelists are meant to be externally managed by someone else? Should IATI be getting into the business of archiving codes for codelists that we have no jurisdiction over? Just a thought anyway…

This is a good question! Perhaps the official @IATI-techteam account can comment about this given the rules for decimal upgrades?

1 Like

Linked to the issue of redefining codelists, I’d also suggest adding something to the codelist schema, so that the category a codelist is in can be defined within the source. I’d suggest a new attribute under the top-level codelist element, perhaps source-category, with allowed values: ‘core,flexible’, and replicated.

The existing category-codelist attribute seems to cover something different, but I’m unsure how this is meant to be used.

Indeed, related to the codelist schema, many of the elements and attributes could also do with an xs:documentation element to add a definition on their purpose, however the may be classed as a bug fix - will leave @bill_anderson to comment.

This is not the intent as far as I understand. This is about an internal IATI registry rule which will stop validating sector vocabulary codes as part of the standard check. Will not affect the IATI standard itself.

-MA

Yes, category-codelist is the codelist for the category, not the category for the codelist.
As an example, the AidType code A01 has category A. category-codelist="AidType-category" tells us that we can go look this up on the AidType-category codelist, to find out that the English name for that category is: “Budget support”, and a description explaining how it is used.

@bjwebb Could you please elaborate on the reasons for keeping ConditionType as embedded? We agree that there is a case for keeping IndicatorMeasure as embedded, but cannot see how ConditionType impacts on the functionality of data processing.

I don’t have any specific reasons for ConditionType, only the reasoning that a code that can’t be “other” and is required might be handled in a way that assumes it’s one of those three codes.

This proposal has been been included in the 2.03 upgrade. It can be viewed in the following two Discuss posts: