Discussion post: Identifying government entities

TimDavies · April 27, 2017, 8:57am

Alongside the development of the updated org-id list of organisation identifier lists (which contains an increasing number of company and NGO/charity lists) we have been exploring the state of government entity identification.

A full paper on this is forthcoming, but to summarise the current state of play:

The problem

There are few reliable sources of identifiers for government entities (i.e. lists which are stable, well maintained and provide unique identifiers that map to particular legal entities, or sub-units of legal entities).

The problem is compounded when looking beyond national government departments, to also consider schools, hospitals or other kinds of entities that might engage in transactions (e.g contracts, spending or recieving aid funding directly or indirectly).

The current organisation identifier methodology in use in Open Contracting, IATI and 360Giving is predicated on identifying a ‘legal entity’. However, in many cases the legal status of state entities is not directly analogous to that of companies or charities.

Current state of play

It is difficult to answer questions such as:

How much money has been spent by the UK Department for Health with small or medium enterprises?

or

How much money has the official Ugandan education system received from government donors in the last five years?

because the data available lacks identifiers, names in data are presented inconsistently, and where identifiers exist they are not related to meta-data which can support analysis.

Requirements

We have identified 11 requirements against which any identification approach should be evaluated:

An identifier system should…

…allow identification of central government departments.
…allow identification of local government units.
…allow identification of government agencies.
…allow identification of bodies such as schools, hospitals etc.
…provide access to information on the level of government the body operates at.
…provide access to information on the history of the body (e.g. bodies it replaced).
…provide access to information on the hierarchical position of the body.
…uniquely identify a single entity (e.g. two distinct government entities should not share a code).
…provide persistent identifiers that only change when the integrity or legal status of the identified organisation changes.
…be appropriate for use in a single country
…be appropriate for use across countries (for example, identifying all the mining ministries around the world).

These requirements may be prioritised differently be different use cases, and it may be difficult to meet all requirements through a single approach.

Initial recommendations

Based on the analysis to date, we propose a two-track approach to government entity identification.

(1) Maintaining the distributed organisation identifier list approach - but focussing on (a) encouraging government to publish registers of government entities; and (b) working to create a ‘list of last resort’ via WikiData - which can accept user-contributions.

(2) Requiring separate publication of descriptive meta-data elements for government entities - to provide information for identification, de-duplication and data analysis.

In particular, this would involve adding attributes to organisation fields based on the EC’s Core Public Organisation Vocabulary for:

purpose - using COFOG codes to indicate the function of the government entity;
spatial - using either (a) ADM1 / ADM2 / ADM2 to indicate the level of government, or some established gazeteer to indicate the geographic area covered by the government entity;

As well as including a field for:

jurisdiction - the iso country code of the organisation;

And encouraging additional publication (either as attributes or sub-fields of an organisation object) of:

full name - without use of acronyms
address - a registered or operational address of the organisation
uri - the homepage of the organisation

Taken together, these additional elements:

(1) Prevent false-positive matches - as tools can disambiguate two ‘Department of Education’ records if they have a different jurisdiction or spatial coverage.
(2) Support de-duplication - as tools can use matching purpose, jurisdiction, address, uri and spatial values to support matching of organisations even when names or identifiers are different;
(3) Answer general queries that do not require specific organisation identification. For example, using COFOG codes to identify spending with government education entities in a given area.

Guidance should be developed so that COFOG purpose codes are only used for government entities.

Implications and IATI worked example

For IATI (and other standards) the above approach would involve:

Continued work to maintain a list of organisation identifier lists;
Addition of new fields to organisation elements (e.g. <participating-org>)

For example, the block below adds purpose, spatial and jurisdiction codes to identify the UK Department for International Development.

<participating-org ref="GB-GOV-1" role="1" type="10" org-purpose="01.2" org-spatial="ADM1" org-jurisiction="gb">
        <narrative>UK Department for International Development</narrative>
</participating-org>

The block below identifies an Education Ministry in Nepal, for which no identifier reference could be located:

<participating-org ref="" role=4" type="10" org-purpose="09" org-spatial="ADM1" org-jurisiction="np">
        <narrative>Ministry of Education, Government of Nepal</narrative>
</participating-org>

Note that this additional meta-data about organisations would be provided by data publishers. It could be drawn from an identifier lists if such a list exists (to avoid it having to be entered each time), but ultimately the values of these additional meta-data fields would be based on the judgement of those entering the data (so there is no solid guarantee that two publishers would describe a single organisation in exactly the same way).

Discussion

Feedback is invited on:

Whether the approach of splitting identification and description of organisations into two separate problems is an appropriate one;
The implications of this approach for publishers;
The implications of this approach for data users;
The implications of this approach for other data standards (e.g. OCDS, 360 etc.)

Following feedback we’ll combine feedback into a full paper, and will host an online discussion on ways forward, identifying whether to propose changes to existing standards.

JohnAdams · February 23, 2017, 9:16am

Tim, are you talking with Thom Townsend in GDS on any of this?

TimDavies · February 23, 2017, 10:00am

We’ve briefly discussed it but not in last few months.

bill_anderson · February 23, 2017, 10:46am

If we went down this path I would suggest a new element <org-details> or <org-metadata> containing attributes for all metadata required. This could then be nested under any of the elements using organisation identifiers, including for example transaction/receiver-org
This could apply to all organisations, not just government agencies?

For example, a transaction going to a ministry in a country where we don’t have identifiers:

<transaction>
...
<receiver-org type="10">
  <narrative>Vanuatu Ministry of Health"</narrative>
  <org-details jurisdiction="VU" purpose="07" scope="4">
</receiver-org>
...
</transaction>

jurisdiction uses Country codes
purpose uses COFOG codes
scope uses existing ActivityScope codes

Thoughts?

bill_anderson · February 23, 2017, 10:51am

Is it worth proposing a standard for what metadata lists of identifiers should contain and how it should be accessed?

bill_anderson · February 23, 2017, 11:01am

Another thought

The identifier “VU-07-04” here describes a national health agency in Vanuatu. Although not unique isn’t this a pragmatic solution in itself? It is an organisation prototype, rather than an identifier.

TimDavies · February 23, 2017, 12:42pm

Exactly. My sense is that this ‘prototype’ (nice framing!) Can answer a set
of the questions users might ask without needing unique identification of a
legal entity - so in splitting the description and identification problems
we are able to better address each.

markbrough · February 23, 2017, 2:34pm

@TimDavies, great to see you’ve put a bunch of thought into this tricky problem. I want to throw a couple of thoughts into the mix.

I like and want to build on @bill_anderson’s point about having a solution in countries where we don’t have a solid list. I think it is unlikely for the foreseeable future for many countries to have solid (complete, up to date, machine-readable) lists of all of their government ministries and agencies, let alone things like sub-national public bodies and individual schools and clinics.

I think we really need a methodology that allows for a way of identifying public bodies even if there isn’t a single authoritative list for that country. I think a good place to start would be budget classifications. A couple of worked examples below.

In Liberia, there are the following codes in the government’s 2016 Chart of Accounts (which is basically the set of classifications used in the budget and financial management systems):

101 - National Legislature
102 - Ministry of State for Presidential Affairs
103 - Office of the Vice President
104 - Ministry of Finance
105 - Ministry of Internal Affairs

etc. In the Liberian CoA, Departments within Ministries are also identified. This seems like a good place to start. So we could imagine an identifier such as LR-COA-2016-104 referring to the Ministry of Finance (code 104), as identified in the 2016 CoA. Of course CoAs change from time to time (new ministries get created etc.), so it would be good if possible to capture the year of the CoA (or at least a year in which this code definitely existed).

In Bangladesh, this would also work - according to the current Chart of Accounts:

01 - Office of the President
02 - Parliament
03 - Prime Minister’s Office
04 - Cabinet Division

etc. So something like BD-COA-2016-02 could refer to Parliament (code 02), as identified in the 2016 CoA.

Come to think of it… these kind of are authoritative lists. Aren’t they?

markbrough · February 23, 2017, 2:41pm

I think these are two separate points? One is a distributed approach, the second set is a centralised approach. a) is centralised and controlled; b) is centralised but organic

TimDavies · February 23, 2017, 3:03pm

Hey Mark. Thanks for the thoughts on this.

I agree that in some situations Charts of Accounts can provide a possible
source of entity identifiers that could be used in the @ref of an
organisation element ideally alongside the descriptive elements also
discussed in this thread.

The challenge I’ve found when looking at some examples is that Charts of
Accounts or Counterparty IDs often seem to include things that might not be
considered to be distinct entities, so some care might be needed to either
(a) check this before the COA for a given country is added as a recognised
organisatiojln identifier list; or (b) document constraints and caveats of
each COA.

Essentially this comes back to making sure we don’t confuse identifying an
organisation with identifying descriptions that imperfectly overlap the
organisation as when a COA code describes the entities responsible for a
given policy area, rather than uniquely describing the lead entity…

One thing we’ve been thinking about in org-id is whether, as well as having
metadata on organisation in the org-id repo we could also have a library of
helper scripts with some standard interfaces that could help take the
trouble for users our of, for e.g. knowing when LR-COA-2016-104 can be
mapped to LR-COA-2017-104 or not. Though this is definitely something for
future thinking rather than right now…m

bill_anderson · February 23, 2017, 3:11pm

If they are published in a consistent place from year to year

I like ??-COA as a prototype registration agency

matmaxgeds · February 23, 2017, 3:41pm

Hi all,

Most country budgets include the CoA as an annex (or it can be easily worked out from the tables) e.g. see here for Sierra Leone - and where published! these are normally fairly easy to find.

RE overlap, as long as you are taking the administrative classification (Ministries, Agencies, Directorates etc), and staying within one level, I am not sure that there can be any overlap otherwise it would be unclear where the government funds were going. There could be overlap e.g. with two Ministries being under one code on the Functional/Programmatic (e.g. Sector/deliverable) part of the CoA - but this can be avoided by sticking to the Administrative codes.

I have not yet seen a CoA that would identify individual schools, hospitals etc so something else would be needed for that - I suspect a text field with the name would be sufficient in combination with the geographic data also being filled out.

RE CoFOG, several (many?) countries do not have CoFOG aligned to their CoA.

RE local government units - you might need to look into the separate subnational CoA for countries with federal systems to pick these out.

This would be great to see given the continued reluctance of even in-country DPs to report against the national CoA.

Worth noting that humanitarian agencies might not want to use a national CoA to report against out of principle but I presume would then also not be supporting any of the organisations it describes.

Matt

markbrough · February 23, 2017, 4:02pm

Just to echo Matt - CoAs will contain a range of different sets of classifications, but these are distinct classifications and (though when presented they may be nested into each other) refer to different things. There will maybe be some cases where there is no recognised administrative classification, though there is clearly still some way in which the central budget gives money to Ministry A rather than Ministry B. But generally, these things should exist.

I think my main argument is that it would be much less of a lift to get CoAs published than to get an entirely new set of lists written from scratch. Especially given that CoAs are a core part of government financial management processes, so they have to exist and be maintained. Whereas I am not sure what the incentives would be for developing and maintaining another list.

@bill_anderson - ha, I guess OECD codelists are a good example of this :). CoAs are often not be published on websites probably partly because they’re not considered very interesting to anyone except for a handful of people, most of whom already have access to them anyway. Still: the codes can be referred to even if the source document is not always visible, especially as budget documents will almost always use the same codes.

bill_anderson · February 23, 2017, 6:19pm

YohannaLoucheur · February 24, 2017, 11:26pm

Would like to go back to the opening post and discuss briefly the case where there is in fact a list.

Tim identified 11 requirements for an identifier system. As some of you may know, but perhaps not all, Canada recently published a machine-readable list of “legal department names and numbers” for all “organization listed in Schedules I, 1.1 and II of the Financial Administration Act authorized to use the Consolidated Revenue Fund”. In other words, the federal ministries and organizations (175 of them).
http://open.canada.ca/data/en/dataset/22090865-f8a6-4b83-9bad-e9d61f26a821

This list does not meets the 11 requirements defined by Tim. For instance, Canada has a federal system so this won’t include all provincial or municipal entities. It also doesn’t provide access to history (though it may in future). But it may still be a useful example to look at, if only to identify what’s wrong and move closer to defining a workable system.

Regarding the recommendations, I’m a bit confused by the notion that the meta-data must be published separately. Shouldn’t the (distant) goal be that all this information available in the individual registers? In which case their separate publication would only be required if they were not included in the register. In practical terms, perhaps at the moment all the meta-data would have to be published separately, but it would be good to be clear on what the long-term vision is, so we know what to work towards.

TimDavies · March 1, 2017, 1:44pm

Thanks Yohanna. The Canada data is a really useful example of the kinds of
lists it would be great to see governments produce and maintain.

On the question of keeping the meta-data only in registries or including in
datasets that refer to the organisations (e.g. IATI datsets) I would
advocate for the redundancy of always including meta-data in the datasets
referring to organisations, and encouraging publication in the registries
or organisation lists that exist. This is on the basis of usability.

I.e.

It removes the need for basic users of the data to dereference and look up
each organisation in a source list (which risks being offline, poorly
formatted or versions out of sync with the dataset) before they can perform
analysis based on organisational meta-data.

But it allows the advanced user to choose to compare entries in the dataset
with information provided in source lists and to use this in preference in
their analysis if the choose.

This keeps the data from being too brittle, whilst maintaining the option
of deferring to authority lists.

YohannaLoucheur · July 20, 2017, 5:26pm

Hey Tim

Really early in this post you mentioned a paper coming out at some point. Where are we at on this?

Just going through the discussion again, one of the quite practical issues mentioned was access to CoA - this woudl be required to take Mark’s suggestion (which seem a reasonable one, at least for ADMIN1 entities) forward.

I’m increasingly convinced that CoA should be made available as open data, ideally by a central body that would be able to provide a change log etc. Who should this be: IMF? Which stakeholders might have an interest and could help make it happen (eg INTOSAI? ICGFM?)? How do we launch a lobbying campaign?

TimDavies · July 26, 2017, 4:10pm

Hey Yohanna,

The paper has been submitted to DevInit, and just waiting for it to be reviewed.

I definitely agree Chart of Accounts will be useful - but I think it plays more towards identifying functions of government, rather than organizational components (albeit the notion of which parts of government are independent organizations remains a generally tricky one, and one that depends on the perspective from which it is being asked)

YohannaLoucheur · July 26, 2017, 4:32pm

As Mark and Matt mentioned, CoA would normally include both function and administrative (ie organizations components), although each CoA may present or nest them differently.

Look forward to seeing the paper.

TimDavies · October 7, 2017, 1:30pm

The paper is now up at http://juds.joinedupdata.org/discussion-papers/paper-7-identifying-government-entities/

@markbrough has questioned whether it addresses the Chart of Accounts question enough.

In summary, my view is that:

(1) Chart of Account Administrative Divisions may in some cases be the best list of organisation identifiers for government entities in a country - but this is going to be a judgement call on a country-by-country basis, depending on the rules that are applied in maintenance of the CoA, and whether or not it is proactively published and well maintained;
(2) The org-id.guide methodology already accommodates this, so that in countries where a CoA Administrative Divisions list is published, it’s meta-data can be recorded as part an an entry in the organisation identifier list register, and, in the event it is the best open source for government identifiers in a country (based on the ranking algorithm employed) it would be recommended for use;
(3) There were no strong examples at the time of writing the paper to show CoAs being used widely as organisation identifier lists;

To operationalise (2) we could certainly improve the research handbook at http://docs.org-id.guide/en/latest/research/ for guidance for researchers on how to find and validate potential CoA sources. Suggestions/pull requests welcome.