Non-statistical secondary sectors (excluded 2.03)

This proposal is part of the 2.03 upgrade process, please comment by replying below.

Standard
Activity

**Schema Object** iati-activity/sector

**Type of Change** Attribute and rule

**Issue** There is currently no place in the standard to include tags, keywords or sector-type classifications that cannot be assigned a percentage split of the activity’s costs. Rather than create a new tag or keyword element it is proposed that the sector element can be adapted for this purpose.

**Proposal** Add an attribute iati-activity/sector/@no-aggregation

  • Format: Boolean
  • Definition: A flag to indicate that the sector specified is provided as additional information and should not be assigned a percentage in apportioning shares of the activity’s resources. The default value, if not present, is false.
**Standards Day** Proponents of this proposal cited that there is a clear user need for this. Others supported this as it would make activities more searchable. An alternative approach was also mentioned - i.e. the introduction of a 'tag' element. It was suggested that changes to the sector element may cause existing tools to break, and that the least disruptive approach should be taken.

**Links** Previous discussions - https://discuss.codeforiati.org/t/standards-day-proposal-accommodating-non-statistical-secondary-sectors/753

1 Like

I think it’s a reasonable approach, though using the same top-level element might cause confusion.

The real problem I’m seeing is that no one (so far) has told me they actually believe the percentage splits. I’d be tempted to make the default value TRUE if it weren’t for thr problem of backwards-compatiblity.

D

Rereading the proposal again, I have doubt that adding this attribute to the sector element is the way forward. The use-case is seems clear and valid. The proposed solution uses the sector element for something it was not designed to do.

Therefore I like the alternative solution (adding a TAG element) much more. It does not create confusion or backward compatibility problems. Sectors will be purely statistical as it is right now. The current definition of the sector element will not change.

The TAG element allows for any classification to be added, allowing a great deal of flexibility to search through and select activities.

From the process point of view: adding an attribute to the sector element is not backward compatible for IATI data users (existing software making use of the statistical nature of sector classifications, must be changed). Therefore it can i.m.o. not be part of this decimal upgrade. The TAG alternative can though.

adding an attribute to the sector element is not backward compatible for IATI data users (existing software making use of the statistical nature of sector classifications, must be changed)

It would be a change that is backwards-compatible (any dataset valid at 2.02 would still be valid and have the same meaning at 2.03 under this change), but is not forward-compatible (valid 2.03 data with the 2.03-specific parts stripped may be interpreted incorrectly at 2.02).

At the moment, only backwards-compatibility is required.


I agree that the addition of a <tag> element would be a cleaner solution since forward-compatibility should be aimed for where possible even if not strictly followed in all circumstances.

I agree with @Herman on this - I think a new element would be preferable. Any tools that already exist for using IATI data would expect sectors to be statistical rather than non-statistical, and would not know to check for such an attribute. It would potentially cause problems for data users, many of whom would not be able to adjust their tools in any reasonable timeframe.

So I would also be in favour of the <tag> alternative.

Agree with @Herman and @markbrough for reasons stated here and at TAG (as in Dar, not as in alternative solution :slight_smile: )

The proposal as it stands has been rejected in favour of adding a new “tag” element as raised in the discussion.

If you feel that this should still be included in the current upgrade, please do respond here

With the addition of a <tag> element, how would it be used and which attributes / sub-elements of <sector> should be included along with it?

If the proposal is to add a “non-statistical sector”, this would add the following:

  • iati-activity/tag/@vocabulary
  • iati-activity/tag/@vocabulary-uri
  • iati-activity/tag/@code
  • iati-activity/tag/narrative
  • iati-activity/tag/narrative/@xml:lang
  • iati-activity/transaction/tag/@vocabulary
  • iati-activity/transaction/tag/@vocabulary-uri
  • iati-activity/transaction/tag/@code
  • iati-activity/transaction/tag/narrative
  • iati-activity/transaction/tag/narrative/@xml:lang

As such, a few questions…

  1. Are all the listed attributes and elements required?
  2. Are there any other attributes or elements that a tag should have?
  3. Are tags required under both transaction and activity?
  4. Should tags be available as sub-elements of anything that a sector is not?
  5. Which Codelists (if any) should be used as restrictions for the various attributes?

I don think we have the discussion about the mix of sector codes at activity and transaction in the right space at the moment, so I would suggest that tag is held at activity level only…

1 Like

Although I have a few reservations about this solution, I can see the appeal and think it’s a decent compromise. My greatest reservation remains that we should really be looking to radically reshape classification at the next integer upgrade, and adding this element will make it harder to rationalise the standard down the line**.

That being said, @hayfield has preempted my questions very well. Here are some responses:

Are all the listed attributes and elements required?

Yes - a tag without a code and a vocabulary is, in my view, basically just noise.

Are there any other attributes or elements that a tag should have?

Depends a bit on whether you think there should be statistical tags :stuck_out_tongue:. Seriously though this is a legitimate question. Is a statistical tag different from a sector? For the use cases I had in mind when I devised this, I think we could live without an aggregation-status, but do we really want to be splitting our use of a given vocabulary over two different classification elements based on whether or not our intended use is statistical?

Are tags required under both transaction and activity?

I think a precedent has been set to, by and large, allow for similar classification at the two levels, and I think it would be strange not to follow it. (Brief aside, sectors are non statistical at the transaction level, but regardless…).

Should tags be available as sub-elements of anything that a sector is not?

I think this would be needlessly confusing, and unless someone has a use-case in mind, I wouldn’t consider it.

Which Codelists (if any) should be used as restrictions for the various attributes?

I’m not even sure this needs a codelist associated with it. It should certainly be completely compatible with both the Sector and Sector Category codelists, but it should be useable with any of the other sector vocabularies on offer.

** Some of the confusion above is, in my view, the result of inheriting the CRS approach to classification wholesale (i.e. adding flags and use-case specific fields i.e. sector vs policy marker vs humanitarian classifiers), without abstracting it to one element and making it modular. I strongly believe that at 3.01, IATI should be moving towards a <classification/> element, which adds a new attribute to specify what type of classification it is, and then allowing a shared semantics through various codelists. Obviously this is out of scope for this discussion, but I believe we should be moving towards this approach regardless of how we solve the current problem in the short term. This touches on another reason I preferred the sector attribute approach, which was to avoid adding more and more to the standard if possible.

As I said above, this isn’t to say that I’m against the newly agreed solution; I just think we can do better than it within the scope of an integer upgrade.

1 Like

Our proposal for the tag element is very simple: a single free-text element at activity level containing comma or semi-colon delimited values. No vocabulary, code or even language.

To go back to the Standards Day ‘cashew nuts’ example, is there really a need to build a whole new taxonomy around keywords that could help enrich standard classifications?

Yes. As I said above, I really think that free-text is unhelpful. The point of this proposal was that there are multiple use-cases for semantically rigorous but non-statistical classification. This is not the same as a free-text tag. The former can be used to compare things like-for-like with reference to rich definitions and vocabularies which can be mapped to others. The latter, though not completely unhelpful, really can’t be used to do much other than to associate an activity or transactions with one or a few words.

I understand that in the case of ‘Cashew’, the semantics aren’t exactly controversial, but the point is this: how can the classifications made in this element ever contribute to a joining up of IATI data with external data if they have no external semantics? And what is the argument against adding the proposed attributes?

1 Like

Well, @bill_anderson actually said ‘cashew nuts’ rather than (unpluralised) ‘cashew’ as it’s commonly known. Of course it’s also known as ‘Anacardium’ (which appears in this activity description, along with ‘marañón’ and ‘anacardo’).

(To be explicit: I agree with @rory_scott :slight_smile: )

I should be more specific, sorry. I think we do need to treat this element as one that has a taxonomy around it, but no, I don’t think we need to build a whole new one. We could just plug in existing codelists and taxonomies.

1 Like

FWIW I supported the original proposal, but appreciate the issues with making the change as part of a decimal upgrade.

[Aside: As a data user new to IATI, I actually interpreted <sector> as if it were a tag, and was surprised to learn (many months later) that the element in fact had some other True Meaning.]

Longterm, I like the idea of merging elements (as Rory suggests with the <classification> element) to streamline and simplify the standard, so I don’t believe adding <tag> in a decimal upgrade would be a step in the right direction. I’d rather see something closer to the original proposal incorporated in 3.01.

1 Like

Taking one step back …

  • (A) There is a need to describe the functional ‘sector/s’ of activities.
  • (B) There is a need to apportion the cost of each activity across these ‘sectors’.
  • © There is a need to apportion activity costs in a universally comparable way so that users can make sense of sectoral spend across all publishers.
  • (D) There is a need to classify activities according to other taxonomies without apportioning costs
  • (E). There is a need to classify activities with words or concepts that do not come from a formal taxonomy.

Which raises for me the following questions

  • The existing <sector> element delivers (A) and (B) but the increase in the number of vocabularies employed undermines ©.
  • Should we be adding a new element similar to <sector> but without a percentage? (D)
  • To achieve © should we limit the vocabularies allowed in <sector> and move the rest to the new element (D)?
  • Is there a need for (E) or could this be included in the <description>?

(D) and (E) are doable in a decimal upgrade, but would it be better to discuss further first?

In addition, cashew(s) are also known as noix de cajou, of course.

Strongly against a free-text field on its own.

2 Likes

I think <description> does this sufficiently well as it stands. I suppose a ‘keyword’ Description Type could be added to provide tagging functionality without adding a new element… But again, I’m not sure if there’s a need.

This increase would only undermine © if it was leading to less or worse usage of OECD DAC CRS Purpose codes. I don’t think this is the case, but I’d be interested to hear what others think. Without this, the only upshot of said increase is a greater pool of comparative data between CRS and other codelists, which seems like a good thing to me.

That, to me, is a sensible interim option if we want to be sure that the meaning of old data hasn’t changed, though I think we should be clear that there are better, more long term ways of doing classifications in IATI (I will keep on repeating this point, though I know there’s low appetite for a big re-working of IATI next integer).

I touched on this in my first response, but it’s worth expanding upon: what exactly is the issue with there being a number of different sector vocabularies available? Is it that there are just too many, or that the vocabularies included are sub-standard in some way? Again, as long as people are still using the CRS codes, I don’t see what’s being lost.

I believe this can be done in a description for now.

N.b. If and when people are amenable to a proper reworking of classification in IATI, I’d want to see a freetext ‘tag’, along with ‘statistical’ and ‘non-stastical’ as @types of of classification, rather than as separate elements.

I’ll be joining the call today. I definitely think it’s worth discussing first, and though I’m unsure of the need for (E) given the existence of description elements with their own type codelists, I’m convinced that (D) should be given high priority, not least because some great work is being done around it as we speak, and without (D) taking place, there will be a lot of useful information which won’t be able to make it into the IATI standard.

I look forward to discussing this later today.

This issue was discussed in the consultation call this afternoon. There was consensus on two issues.

  • We need a new element to express useful sector classifications that bear no relation to percentage splits on finances.
  • The most useful classifications will come from existing formal taxonomies (vocabularies)

We therefore propose the following amendment (the activity level part of @hayfield’s proposal above.)

  • The name “tag” for the element is a placeholder. We are seeking a better suggestion
  • The issue of free-text key-words was discussed. These could be handled either by a generic “Unspecified” vocabulary, or guidance could advise use of the activity description element.
1 Like