Alongside the development of the updated org-id list of organisation identifier lists (which contains an increasing number of company and NGO/charity lists) we have been exploring the state of government entity identification.
A full paper on this is forthcoming, but to summarise the current state of play:
The problem
There are few reliable sources of identifiers for government entities (i.e. lists which are stable, well maintained and provide unique identifiers that map to particular legal entities, or sub-units of legal entities).
The problem is compounded when looking beyond national government departments, to also consider schools, hospitals or other kinds of entities that might engage in transactions (e.g contracts, spending or recieving aid funding directly or indirectly).
The current organisation identifier methodology in use in Open Contracting, IATI and 360Giving is predicated on identifying a âlegal entityâ. However, in many cases the legal status of state entities is not directly analogous to that of companies or charities.
Current state of play
It is difficult to answer questions such as:
- How much money has been spent by the UK Department for Health with small or medium enterprises?
or
- How much money has the official Ugandan education system received from government donors in the last five years?
because the data available lacks identifiers, names in data are presented inconsistently, and where identifiers exist they are not related to meta-data which can support analysis.
Requirements
We have identified 11 requirements against which any identification approach should be evaluated:
An identifier system shouldâŚ
- âŚallow identification of central government departments.
- âŚallow identification of local government units.
- âŚallow identification of government agencies.
- âŚallow identification of bodies such as schools, hospitals etc.
- âŚprovide access to information on the level of government the body operates at.
- âŚprovide access to information on the history of the body (e.g. bodies it replaced).
- âŚprovide access to information on the hierarchical position of the body.
- âŚuniquely identify a single entity (e.g. two distinct government entities should not share a code).
- âŚprovide persistent identifiers that only change when the integrity or legal status of the identified organisation changes.
- âŚbe appropriate for use in a single country
- âŚbe appropriate for use across countries (for example, identifying all the mining ministries around the world).
These requirements may be prioritised differently be different use cases, and it may be difficult to meet all requirements through a single approach.
Initial recommendations
Based on the analysis to date, we propose a two-track approach to government entity identification.
(1) Maintaining the distributed organisation identifier list approach - but focussing on (a) encouraging government to publish registers of government entities; and (b) working to create a âlist of last resortâ via WikiData - which can accept user-contributions.
(2) Requiring separate publication of descriptive meta-data elements for government entities - to provide information for identification, de-duplication and data analysis.
In particular, this would involve adding attributes to organisation fields based on the ECâs Core Public Organisation Vocabulary for:
- purpose - using COFOG codes to indicate the function of the government entity;
- spatial - using either (a) ADM1 / ADM2 / ADM2 to indicate the level of government, or some established gazeteer to indicate the geographic area covered by the government entity;
As well as including a field for:
- jurisdiction - the iso country code of the organisation;
And encouraging additional publication (either as attributes or sub-fields of an organisation object) of:
- full name - without use of acronyms
- address - a registered or operational address of the organisation
- uri - the homepage of the organisation
Taken together, these additional elements:
- (1) Prevent false-positive matches - as tools can disambiguate two âDepartment of Educationâ records if they have a different jurisdiction or spatial coverage.
- (2) Support de-duplication - as tools can use matching purpose, jurisdiction, address, uri and spatial values to support matching of organisations even when names or identifiers are different;
- (3) Answer general queries that do not require specific organisation identification. For example, using COFOG codes to identify spending with government education entities in a given area.
Guidance should be developed so that COFOG purpose codes are only used for government entities.
Implications and IATI worked example
For IATI (and other standards) the above approach would involve:
- Continued work to maintain a list of organisation identifier lists;
- Addition of new fields to organisation elements (e.g.
<participating-org>
)
For example, the block below adds purpose, spatial and jurisdiction codes to identify the UK Department for International Development.
<participating-org ref="GB-GOV-1" role="1" type="10" org-purpose="01.2" org-spatial="ADM1" org-jurisiction="gb">
<narrative>UK Department for International Development</narrative>
</participating-org>
The block below identifies an Education Ministry in Nepal, for which no identifier reference could be located:
<participating-org ref="" role=4" type="10" org-purpose="09" org-spatial="ADM1" org-jurisiction="np">
<narrative>Ministry of Education, Government of Nepal</narrative>
</participating-org>
Note that this additional meta-data about organisations would be provided by data publishers. It could be drawn from an identifier lists if such a list exists (to avoid it having to be entered each time), but ultimately the values of these additional meta-data fields would be based on the judgement of those entering the data (so there is no solid guarantee that two publishers would describe a single organisation in exactly the same way).
Discussion
Feedback is invited on:
- Whether the approach of splitting identification and description of organisations into two separate problems is an appropriate one;
- The implications of this approach for publishers;
- The implications of this approach for data users;
- The implications of this approach for other data standards (e.g. OCDS, 360 etc.)
Following feedback weâll combine feedback into a full paper, and will host an online discussion on ways forward, identifying whether to propose changes to existing standards.