Language - recommend use of ISO 639-1 (included 2.03)

This proposal is part of the 2.03 upgrade process, please comment by replying below.

Standard
Activity and Organisation

Schema Object
All uses of @xml:lang attribute

Type of Change
Change Definition

Issue
Xml:lang references the generic XML spec which allows for a range of languages, locales, regions and scripts. It is used in our schema with a more limiting definition - specifying sole use of ISO 639-1. This is not enforceable. Moreover, there are users who need to specify languages that aren’t on the ISO 639-1 list, but can be specified with BCP 47.

Proposal
Change all definitions of xml:lang

  • From ISO 639-1 code specifying the language of text in this element.
  • To A code specifying the language of text in this element.It is recommended that wherever possible only codes from ISO 639-1 are used.

Standards Day
Accepted

Links

This topic has been included for consideration in the formal 2.03 proposal

Notes from consultation calls w/c 3rd July

The proposal was reviewed by those on the call and there was no objection from the group.

@petyakangalova has noted that Codelists have a boolean complete attribute which impacts usage in a way relevant to this proposal.

The documentation on this attribute is:

Some codelists, such as the ISO country codes, are not ‘complete’ lists of all possible values that might be used. In the case of countries, publishers may use extra user defined codes (such as ‘XK’ for Kosovo) or valid historical values that are not on our maintained list.

For other codelists, such as the DescriptionType codelist, if the value is not on the codelist the data doesn’t make any sense - it is invalid. This is an example of a ‘complete’ codelist.

We distinguish between these two types of codelists by the use of an xml attribute: complete="1"

and

complete - boolean that describes whether the codelist is ‘complete’ ie. having a value not on the codelist is definitely invalid. An example of an incomplete codelist is country codes, where extra codes may exist for disputed countries.

Reading through this, it appears that:

  • complete="1" - use of values from this Codelist is mandatory - using other values makes the data invalid
  • complete="0" - use of values from this Codelist is recommended, though using other values is absolutely fine (but might lead to a warning in a validation tool)

The Language Codelist is marked as complete="0". As such, this proposal appears redundant in its current state.

An alternative course of action would be to improve documentation around the complete attribute.

This change makes the xml:lang definition consistent with the fact the language codelist is complete="0". So I don’t see it as redundant.

That said, this isn’t my proposal and I have no strong feelings either way.

So it does… redundant probably isn’t the best term.

Looking at this again, there are currently multiple definitions for valid values within the xml:lang attribute:

  1. ISO 639-1 code (from the attribute definition)
  2. A value on the Language codelist (auto-generated documentation)
  3. Any value, recommended that it is on the Language Codelist (Codelist documentation)
  4. Any valid BCP 47 value (XML spec)

Looking at how these interact…

  • Points 1, 2 and 3 are part of the IATI Standard.
  • Point 4 is part of a standard that IATI builds upon.
  • Points 1 and 2 are stricter than Point 3.
  • Point 4 restricts the values permitted by Point 3.

At present, there is no documented manner in which contradictions within the IATI Standard should be resolved. I would premise that the more permissive of valid interpretations of the contradicting statements be deemed the correct interpretation of the IATI Standard.

Based on the above, this proposal does not need to go through the 2.03 upgrade process, and should instead be implemented as a backwards-compatible bug fix.


Separately, the auto-generated documentation stating presence on Codelists should be fixed to take into account the complete attribute.

1 Like

Ace – sounds good to me.

This proposal has been been included in the 2.03 upgrade. It can be viewed in the following two Discuss posts: