My investigation stems from Alex Bellos’s “Alex’s Adventures in Numberland”.
In Chapter 0, he describes the oddities in the native language of the Munduruku tribe.
In short, the language has words for ‘one’, ‘two’, and ‘many’. No (definite) words for the numbers 4,5,6…
If that isn’t shocking enough, further experiments showed that the members of the tribe arrange numbers in accordance with a logarithmic scale. (Link to the paper)(Supplementary)
Participants were shown groups of dots and asked to move the cursor to the position they thought the group should be placed at. Participants from the tribe placed the dots in a logarithmic fashion.
Similar tests run with different demographics were used to back up their claims that humans innately think in a logarithmic fashion, when it comes to numbers. (Further interesting observations are made in the paper, so it’s well worth giving it a look.)
Back and forth is inevitable; however I feel that it seems natural to consider its effect on the way we say numbers as well. The number line may be linear, but it’s clearly not ‘linearly hard’ to say them, eg. ‘hundred’ isn’t 10 times more verbose than ‘ten’, ‘thousand’ isn’t 10 times as verbose as ‘hundred’, and so on.
I’ve tried to quantify this by assuming that it should show up in the number of syllables used to say the number. Under this assumption, I have plotted the graph for “Syllables in the number’s word VS the number” for different languages. For the time-being I’ve only looked at English, rest will be updated later.
Before going to the graphs, I should clarify that the theory of syllables is not clear cut. There is no single set of rules that work for every language, quite the opposite in reality. (Wiki article)
Lastly, I have chosen 1,000 and 106 as the cutoff for the following reason:
106: French→un million, German → eine million, Russian → миллион (pronounced roughly the same as ‘million’), Arabic →مليون (pronounced roughly the same)
Japanese and Chinese differ, this will be dealt with later. Nonetheless, they follow the same pattern upto 1,000.
For English:
Strict counting is used, so the ‘and’ is maintained.
So “75193” is “seventy five thousand, and one hundred and ninety three”. (This is to ensure “1002” is not “one thousand two”.
scipy.optimize is used for curve fitting.
For larger values (upto 106):
Not the best of fits.
Fortunately it’s a valid enough assumption that the graph is logarithmic.
Justification:
If we see the Desmos graph (link) where I’ve plotted the first 1,000 and 10,000 points and tried logarithmic fitting, the best fit curves for these two cases are not far off from the best fit for 1,000,000. Comparatively, a power series of the form axb + c gives solutions that vastly undershoot the predicted curve for 1,000,000.
In other words, the power series solutions predicts the data points, the logarithmic solution predicts the nature of the plot.1
I’m not interested in more complex fittings. Fit your own elephant.
The post will be updated with the other languages by the next week.
I’ve attempted to illustrate this in the Desmos graph.