The surprisingly useful mathematical patterns in some real-world data

The surprisingly useful mathematical patterns in some real-world data


Electronic stok data board.

“Look up stock market prices and you might see the pattern…”

Muhla1/Getty Images

If you were to look at the front page of a newspaper, you would probably find that it contains lots of numbers: amounts of money, population sizes, measurements of length or area. If you pulled all those numbers out and put them in a list, you would have a collection of random numbers.

But those numbers wouldn’t be as random as you might think. In real-world data, like cash totals or the heights of buildings, the first digit in any given number is surprisingly likely to be 1. If the digits were truly random, around 1/9th would start with 1, but in practice, it is often more like a third. The digit 9 is least likely to lead the way, occurring roughly 1/20th of the time, and the other digits follow a curve between them.

This pattern, known as Benford’s law, is a commonly observed distribution of first digits in certain types of datasets – particularly ones where the values are drawn from an unspecified large range. You don’t see it happening with things like human heights (where the numbers all lie within a small range) or dates (where there are restrictions on the values the number can take).

But if you asked a group of people to check the amount of money in their bank account, or give their house number, or look up stock market prices (pictured), you might see the pattern – these are all numbers that could span several orders of magnitude. Some streets have only a few houses, while others have hundreds. This is why the phenomenon occurs.

Imagine a street with nine properties: the proportion of house numbers starting with each digit would be an equal nine-way split. But in a street with 19 houses, more than half start with 1. These two extremes keep occurring as we increase the number of houses: with 100, there are roughly equal numbers of each initial digit; boost this to 200 and, again, half of them start with 1.

Since each item of real-world data comes from a set of unknown size, the average probability of a number starting with 1 ends up being somewhere between these two values. Similar calculations can be done for the other digits, and this gives us the overall frequency with which each appears. The effect is most visible in large collections of data.

One reason this is useful is that it gives you a clue when data has been faked. If you looked at a set of business accounts, you would expect to find Benford-like distributions in the sales figures. But if someone has fabricated data by picking random numbers, when you plot the frequencies of first digits, it won’t have the characteristic curve. This is one trick forensic accountants use to detect suspicious activity.

So next time you are checking your accounts or comparing the lengths of rivers, keep an eye on how many numbers start with 1 – you might just have spotted Benford’s law in action!

Katie Steckles is a mathematician, lecturer, YouTuber and author based in Manchester, UK. She is also adviser for New Scientist’s puzzle column, BrainTwister. Follow her @stecks

For other projects visit newscientist.com/maker

Topics:



Source link

إرسال التعليق

You May Have Missed