Posted in English

Tallinn Calling

So I think I mentioned I’ve been doing map and flag quizzes in portuguese to try and boost both my knowledge of geography and to familiarise myself with the names of countries in portuguese. One of them is Worldle, which has one daily map for users to guess, and then asks follow-up questions about language, flag, capital etc. Today’s happened to be a country that looked familiar.

I have questions though.

First of all, have the Açores and Madeira drifted a lot since I last checked? What are they doing just off the coast there? Could you swim from Coimbra to Funchal?

But the language round was even weirder. The first language is easy enough, obviously, but the second?

As you can see from the screenshot, I tried Galego and Mirandês as the two other native languages. Actually I think I might have wrong to choose Galego because I think it’s spoken on the Spanish side of the border, but Mirandês has a proper linguistic community in the North-East of Portugal and I think has a claim to be the second language of Portugal.  The other two languages given on Wikipedia are Barranquenho and Minderico, neither of which I’d even heard of.

As for non-native languages, I’d probably have guessed English, French or Spanish. There’s been an upsurge in refugees recently (eg from Ukraine – roughly 60000) and economic migrants (probably mainly from other lusophone countries like Brasil and Angola) but I’m pretty sure if you added together the British immig… er sorry “expats” (50000), Americans (10000) and people from various other anglophone countries, plus the fact that the portuguese education system seems to be doing an amazing job of teaching English as a second language, English must be pretty high on the list. Then there are quite a few Italian, French and Spanish migrants, and a few years ago there was a massive uptick of venezuelans, descendents of Portuguese migrants, returning home to escape the benefits of that socialist utopia, so I ended up guessing Spanish as my third and final option.

The answer they give is Estonês. I was estonêsed… er… I mean astonished, but I didn’t want to write it off so I did a bit of research to see if there really was a huge Estonian diaspora in Portugal.

Nope. Estonians are 86th on the list of immigrants by country according to the chart on this page. So what’s going on?

My first guess was that the person who made the pages picked from a list of languages and espanhol and estonês were just next to each other alphabetically, so maybe he just clicked on the wrong one. However, my brother does the same quiz in English and he was surprised to see Estonian pop up as the second language of Portugal too. Estonian and Spanish definitely aren’t next to each other in an alphabetical list of English language place names, so my theory looked shaky.

Digging further, languageknowledge.eu reckons 1.89 percent of the population of Portugal speak Estonian, which is the same percentage as the quiz gives. Does 1.89% sound plausible? The population of Portugal is about ten million and Estonia less than one and a half million, so for this to be true you’d need about fifteen percent of the population of Estonia to emigrate to Portugal and there would be about 3 or 4 times as many of them as there are brits. Hmmm… 🤔

Global Estonian, which bills itself as a global forum for Estonians around the world, gives the figure as 77 Estonians lifting in Portugal. That seems awfully precise, but I’d bet the true number is a hell of a lot closer to 77 than 190,000.

So how did they arrive at such a huge number? Maybe at some point it was 190, and some data entry clerk entered that in a database, not noticing that it said “population in thousands”, and that single insignificant error got picked up by other sites and eventually incorporated into the model answers for the quiz.

I think the lesson here is that sloppy data seeps out and pollutes everything downstream of where it’s keyed in. This isn’t quite as catastrophic in its effects as it could have been, but it’s an interesting little lesson in data pollution. Imagine a similar error creeping into some database used for planning or making policy. You could end up with serious miscalculation rather than just an annoyed quiz contestant.

Unknown's avatar

Author:

Just a data nerd

Leave a comment