Page 1 of 1

What about the data?

Posted: Wed Jan 29, 2025 10:25 am
by mdsah5125
This simple recipe – learning algorithms plus computing power plus data – produces stunning predictive results.

The nonprofit research institute Epoch AI estimates that between 2012 and 2023, the computing power needed to reach a given performance threshold halved roughly every eight months. That’s the cost efficiency achieved by recent innovations in neural networks.

But the long-term value of these algorithms is much harder to determine. Digital code is vulnerable to imitation and theft. The pace of future innovation is hard to predict. The human talent currently sitting in the tech giants’ AI labs could easily leave.

The second ingredient – ​​raw computing power – is a simpler proposition. But the efficiency of this model depends on the volume of chips and electricity used. The projected capital expenditures are becoming so huge that many investors are rightly becoming wary of this area. AMD has announced that the AI ​​chip market will reach $400 billion by 2027. But based on current trends, US researchers predict that AI investment will reach $3 trillion in just a year. While the first cluster of $1 trillion data centers will open two years after that. It seems that computer hardware, not software, is now eating the world.

There’s a flaw in this reasoning, however. The first two ingredients of AI—algorithms and computation—are worthless without the third: data. In fact, the better the data, the less valuable the computing power becomes.

It was easy to miss this fact. The most well-known pastors in the us email database applications of AI are general-purpose chatbots trained to handle large volumes of traffic and unverified text taken from the Internet. The system was designed for quantity, not quality. Morgan Stanley, the largest American financial conglomerate, expects to use at least 10,000 graphics chips to process more than 9.5 petabytes of text using OpenAI ChatGPT 4.

While special-purpose AI applications are less significant, they do show where the future is likely to be.

ICONICA programmer Andrey Petritsa:


Google DeepMind's AlphaFold model has solved a truly monumental biological problem that scientists have been struggling with for about fifty years. And remarkably, it took less than 200 graphics chips to do it.

This was possible because it was trained on a carefully curated database of 170,000 protein samples. So high-quality data not only radically improves the efficiency of AI models, but also the economics of the technology.