Python3 Upgrade: Proposed Publishing Statistics Changes (comment by 15th April)

As outlined in a previous Discuss post, the Technical Team has been working towards upgrading the programming language that is used across a number of IATI tools from Python 2.x to Python 3.x

One of the tools that is undergoing upgrade work is IATI’s Publishing Statistics site.

Some of the functionality changes between Python2 and Python3 have led to some scores in the Publishing Statistics pages altering slightly. These changes - and how we propose to handle them - are summarised below:

Change to ‘Bankers’ Rounding’

  • In Python2, x.5 numbers are rounded away from 0. E.g. 75.5 rounds to 76, 76.5 rounds to 77.
  • In Python3, x.5 numbers are rounded to the nearest even integer. Therefore 75.5 still rounds to 76, but 76.5 would also round to 76.
  • This means in Python3 that two publishers with a score that is a whole integer different in an initial calculation (e.g. 75.5 and 76.5) could be assigned the same score - 76 in this case.
  • The Technical Team proposes that we amend the code so that the Python2 functionality is replicated (while still upgrading the language itself to Python3), so that those two publishers would continue to score 76 and 77, respectively.

Change to integer division functionality

  • In Python2, an integer divided by another integer, yields an integer. E.g. (2 / 3) * 100 = 66%
  • In Python3, an integer divided by another integer, yields a ‘float’. E.g. (2 / 3) * 100 = 66.66%
  • The Technical Team proposes that the core Python3 functionality is retained. Therefore where an initial calculation of 66.66 is currently output as 66, this will change to the value being output as 67.

The only material difference to the metrics (as a result of the integer division change) is that some values in the Publishing Statistics calculations will be one unit higher than they currently are. Therefore, no publisher will be penalised by the changes that we propose.

We welcome the community’s thoughts on this before proceeding with the amendments and deploying to the live Publisher Statistics site. Please provide your input by Wednesday 15th April. The upgrade will be implemented following this if there are no objections.

Many thanks,
IATI Tech Team

This seems fine to me. A purely technical upgrade like this shouldn’t affect the output, so I think it’s okay to reproduce python2 rounding (and consider migrating to bankers rounding at a later date).

This sounds like a bug in the existing code – 66.66 shouldn’t be rounded to 66. In python2, integers should be cast to floats first. However, I’m not able to see an instance of this bug in the publishing statistics code. Please could you point to an example? (I see it in the coverage code, but obviously that is not currently in use.) Thanks!

Hi Andy,

Examples are scattered throughout the code, but here’s one in particular: https://github.com/IATI/IATI-Publishing-Statistics/blob/master/IATI-Dashboard/summary_stats.py#L105

You can see the integer is cast to int before being divided by a length, which is also an integer.

1 Like

Thanks Alex! You’re absolutely right.

Okay, so… It looks like there are a number of rounding issues in this code. There’s a lot of coercing and casting going on. It might be clearer and more accurate to use floats throughout, and leave all the rounding to presentation in the template (rounding up where applicable, as per this).

I’d suggest doing this as a separate piece of work to the python3 upgrade, since it’s not really related to that piece of work. It’s a happy coincidence that python3 happens to handle division differently… But it looks like there are other rounding issues here.

Is there any way to list / identify which values will be affected? Whilst flagged that nobody will be negatively changed, it’s important to know what - as publishers have no control / input to this change… (usually, a publisher can only influence their metrics on publisher statistics by updating their IATI publication)

The Tech Team has completed the coding of the upgrade from Python2 to Python3. In order to test the exact differences the code introduces, we have created two versions of the output based on a static pull of the Registry data. You can see the differences in the versions here: https://github.com/akmiller01/Publishing-Changes/pull/1/files

As we suspected, the difference in the way that Python2 and Python3 treat integer division has resulted only in a handful of the statistics increasing by 1, and in some instances by 2 units.

Of the cases where the statistic increased by 2 units, we concluded that this was an artifact of the two versions being run at slightly different times (once a date passes, an activity is no longer considered forward-looking, for e.g.).

The underlying JSON files running the HTML are identical, so all of the above changes are only the result of the integer division and rounding differences. Please note the stats generated in this study were based off of a static pull of the Registry data, and as such are not directly comparable with the live version of publishing statistics (only comparable with each other).

Even while trying to hold the data constant, the nature of how the statistics are generated does depend heavily on the computer clock at the time of running, so it is possible that some of the small increases are due to the two versions being run at slightly different times.

@andylolz These differences are due to the intrinsic ways in which Python2 and Python3 vary with regard to integers and rounding. While we could try and separate this slight change in the statistics with the version upgrade, it would be an exercise in re-introducing deprecated behavior into the newer version

1 Like

To be clear: no objections from me re. the proposals. I’m certainly in favour of upgrading to python 3 while retaining existing functionality as much as possible (which I think is the spirit of the first proposal), and fixing bugs including rounding bugs (which I think is the spirit of the second proposal).

I suspect there are other rounding errors in this code (due to premature rounding) but those should be addressed separate to the python 3 upgrade.