Data is eating the world. It has been the most important tech trend since the 1990s. But to paraphrase Robot Solow, “You can see data everywhere but in the Internet statistics.” The most astute and influential observers of the tech landscape have been counting and reporting on the number of devices, the number of users, the volume of eCommerce, the number of online ads, the number of apps, the number of images and videos, and so on. What’s largely been missing is data on data, on what drives the growth of all of these digital entities and what drives new businesses and business models and innovation and change.
Identifying “global trends that drive innovation and change” has been the calling card of one of the most influential observers of the tech landscape and a successful investor in and shaper of this landscape—Mary Meeker. Since 1995, Meeker has delivered a 30-minute presentation of more than 300 slides summarizing all the key stats that’s fit to present in a given year. This (almost) annual event never failed to become for at least a few days the talk of the global tech town and has served as a reliable and reliably updated online source of data on the state of the tech economy. The only data missing has been data on data.
According to various reports, while delivering her 2019 presentation on June 11, Meeker told her audience: “If it feels like we’re all drinking from a data firehose, it’s because we are.” Still, it’s only in the 5th section of the report, starting on slide #121, that she gets to “data growth.” Most of this section is devoted to an interesting discussion of how before 1995 businesses used human data and insights to improve customer experiences and after 1995 shifted to using digital data and insights to do the same. Meeker provides many examples of the startups (and their revenue growth) providing the tools allowing established businesses to improve customer experience and satisfaction. Only on slides #151-157 she provides data—from an IDC study—and observations on data growth, volume, and stewardship, before moving on (there are 333 slides in the 2019 edition) to discussing Internet usage numbers, the open Internet, cybersecurity, and other topics and trends.
The IDC study is also quoted in last year’s report, on slide #189, showing data’s “torrid growth” since 2006. It’s possible that this was the first time Meeker has used data on data from this study, although the study has been published annually since 2007 (I checked some but not all of the annual editions of Meeker’s Internet Trends report).
If it sounds like I’m criticizing Meeker, let me clarify: I sincerely believe that for almost a quarter of a century she has performed an important public service by sharing with us the data she has collected with the support of her well-heeled employers. Now that she has recently established her own $1+ billion growth fund, Bond, she is providing an archive of all her presentations since 1995—a treasure trove of historical data on the tech economy that will serve entrepreneurs, executives, and researchers for years to come.
Moreover, there is no doubt in my mind that each year Meeker has correctly teased out the key trends shaping the tech landscape and that the data she chose to present has been significant for this increasingly important sector of the global economy. I haven’t studied carefully all of Meeker’s presentations, and it’s possible that there have been passing references (as in 2018 and 2019) to stats on data. But I believe that she, like most other astute tech observers, has not realized that the growth of data has been the most important driver of the tech economy.
And there has been data on data. In 2011, Martin Hilbert and Priscila Lopez published their study which estimated that in 1986, 99.2% of all storage capacity was analog, but in 2007, 94% of storage capacity was digital, a complete reversal of roles (in 2002, digital data storage surpassed non-digital for the first time).
It is this remarkably rapid shift from analog to digital that is encapsulated in “(digital) data is eating the world.” I thought I coined the phrase two years ago, when I wrote:
In eating the world, data has not only transformed the management of IT and the IT industry, it has also blurred previously rigid industry boundaries and destroyed the sharp distinction between what is considered “consumer” and what is considered “enterprise.” When everything looks like ones and zeros and you focus on collecting and mining as many ones and zeros as possible, old categories just fade away.
Alas, Google tells me the first (?) mention of “data is eating the world” was by Denny Britz in 2013. This was, of course, a brilliant take on Marc Andreessen’s 2011 observation that “software is eating the world.” More on this later, but first, a quick overview of other key attempts to come up with stats on data volume and growth (for a full account, see my A Very Short History of Big Data).
In October 2000, Peter Lyman and Hal Varian at UC Berkeley published “How Much Information?”—the first comprehensive study to quantify, in computer storage terms, the total amount of new and original information (not counting copies) created in the world annually (in 1999, the world produced 1.5 exabytes of original data). In March 2007, John Gantz, David Reinsel and other researchers at IDC published the first study to estimate and forecast the amount of digital data created and replicated each year (161 exabytes in 2006, estimated to increase more than six-fold to 988 exabytes in 2010, or doubling every 18 months).
Full disclosure: I commissioned both studies and they were sponsored by my employer at the time, EMC (In recent years, the IDC study has been sponsored by Seagate). EMC was focused on just one segment of the industry—computer storage—and was a leading example of the re-structuring of the industry in the 1990s from a bunch of vertically integrated companies to a bunch of companies focused on just one “horizontal” IT layer (e.g., semiconductors, storage, operating systems). Cisco, focused on (and dominating) computer networking, was also interested in terabytes and exabytes and in June 2008 started releasing an annual “Visual Networking Index”—tracking and forecasting IP traffic, predicting that it will nearly double every two years through 2012, reaching half a zettabyte.
The interest in lots of bytes and the use of forecasting language reminiscent of Moore’s Law were no accident. Both EMC and Cisco represented not only the restructuring of the industry but also a move away from the dominant computer industry paradigm so well encapsulated by Moore’s Law. Faster and faster processors have been perceived by entrepreneurs, established IT companies, IT buyers, and investors/observers such as Mary Meeker as the single most important driver of growth and innovation ever since the term “data processing” was coined in 1954.
Data processing. It’s not that all industry participants ignored data. But data was perceived as the effect and not the cause, the result of having faster and faster devices processing more data and larger and larger containers (also enabled by Moore’s Law) to store it.
By 1995, when Mary Meeker first delivered her Internet Trends presentation, that paradigm was disrupted by data—it became the most important driver of growth and innovation. With his “software is eating the world” Andreessen tried to capture the move away from the processor-centric paradigm, from hardware at the center of everything. You don’t need increasingly powerful processors (hardware is now a commodity, Moore’s Law is slowing down, etc.) when you have powerful software.
“We are in the middle of a dramatic and broad technological and economic shift in which software companies are poised to take over large swathes of the economy,” wrote Andreessen. “The single most dramatic example of this phenomenon of software eating a traditional business,” was Amazon, according to Andreessen. Similarly, “today’s largest direct marketing platform is a software company — Google.”
I would argue that a more accurate description of Amazon and Google is that they are both data-driven companies. Of course, both companies have built their impressive market caps on the skills and creativity of their software engineers. Just like hardware, software is a very important foundation for the success of top tech companies and for a time, can serve as a competitive differentiator. But Amazon is a prime example of how the “amazing software engine for selling virtually everything online” (per Andreessen in 2011) can be replicated by competitors’ software engineers—see Walmart and Target, for example.
Software innovation has been very important in Amazon’s success, but more important have been its data mining skills and creativity. Having been born digital, living the online life, meant not only excelling in software development, but also innovating in the collection and analysis of the mountains of data produced by online transactions. Data has taken over from hardware and software as the center of everything, the lifeblood of tech companies. And increasingly, the lifeblood of any type of business.
Which is why for Jeff Bezos there are no “industries,” only ones and zeros to collect, to mine, and to move around, which explains why Amazon today is so much more than “the world’s largest bookseller” (as Andreessen described it in 2011), so much more than just an eCommerce company. Google has also used its data smarts to diversify beyond its origins as a “direct marketing company,” and the title of the most innovative in this category belongs today to another data-obsessed company—Facebook.
One of the early investors in Facebook (and other data-driven companies), Andreessen wrote “software is eating the world” almost 20 years after he made–with the Mosaic/Netscape browser–the second most important contribution to the big data big bang, the invention of the World Wide Web by Tim Berners-Lee.
The Web is a digital platform that makes the consuming, creating, and moving of data far easier than it has ever been, making any additional member in the Internet community (50% of the world’s population today, up from 20% ten years ago), a contributor to the exponential growth of data. Moreover, each additional link between people, devices, and other online entities (“things”), accelerates the rate of data growth.
A parallel development—which Meeker described in the Data Growth section of her 2019 presentation—in the early 1990s (and a key demand-side contributor to the re-structuring of the IT industry), was a radical change in business executives’ attitude towards data. They stopped throwing it away and started storing it for longer periods of time, sharing it among internal departments and even with suppliers and customers, and most important, analyzing it to improve various business activities, customer relations, and decision-making.
These two trends are merging today in the cloud, turning all businesses into data-driven businesses (a.k.a “digital transformation”). The most important recent tech development, what is generally known as “AI” and what is more accurately labeled “deep learning,” is sophisticated statistical analysis of lots and lots of data, the merging of software development and data mining skills (a.k.a “data science”).