Machine translation and secondary publishers

matmaxgeds · February 27, 2019, 8:30am

Hi,

I had been thinking about machine translation of IATI a while back and the recent discussion on globaldevhub stimulated it again.

One issue that I came up against last time is that I had imagined that we would lookup the recipient country, cross-reference with a list of national languages, and then re-publish all the text fields in the national languages. However, my understanding was that republishing needs to be done as a ‘secondary-publisher’ but it is very difficult to see data from secondary publishers in any of the user-facing tools.

For example, I was just trying to see some secondary publishers in d-portal but don’t seem to be able to get them to come up in the publisher list. The FAQ says that they are excluded from the calculations but this suggests that they should still be in the publisher list?

This isn’t really about d-portal, but a question asking for feedback on whether anyone can suggest whether there is going to be an easy route to end-users being able to see the machine translated data in order to decide whether setting up machine translation would be worth it (although it wouldn’t be that much effort).

Thoughts much appreciated,

Matt

bill_anderson · February 27, 2019, 8:49am

Good idea. Not sure secondary publishing is the answer though. On the wish list for DataStore Phase 3?

matmaxgeds · March 1, 2019, 1:38pm

Would doing it as a primary-publisher be better, or you don’t think that publishing them is the way forward?

I think I am against this idea that the datastore should be adjusting, tweaking, imputing, assuming or doing anything to data - otherwise tools that use the datastore will get different results from tools that use the registry - which I think would be pretty disruptive to the ecosystem. The job of the datastore is to make it easier to access data. Note that I am already concerned that the datastore will follow OIPA and do things like making assumptions where percentages don’t add up (for me this is the job of the ruleset that is now being discussed a lot).

If the translated versions were published, the datastore could allow you to select what language you wanted to see/download for fields where there were multiple language options available - I presume it would already do that, publishing them would just mean there were more options to pick from. Is there a timeline for Datastore v3 - I had guessed it is several years away?

YohannaLoucheur · March 1, 2019, 3:58pm

Why would you need to re-publish translated text? Would it not make more sense to have machine translation as an option for the end user? Something like a button onsay D-portal that would offer to translate the content for you? (a bit like Google does)

matmaxgeds · March 1, 2019, 4:23pm

For all those end users who do not have machine translation built into their systems - plus the efficiencies of ‘do once, use everywhere’. I am not complaining if d-portal, OIPA, aidstream, the datastore, open UNICEF, each AIMS etc all want to implement it, a) it just doesn’t seem the most efficient and b) none of them have done so far and by publishing translations, it becomes available in all systems without them doing anything.