How to deal with too much data

March 4, 2010 by Chris Skinner

Did I mention that I’m in love with the Economist’s 16-page supplement on a deluge of data. It just crystallises so many thoughts from my career in technology and where we’ve gotten to today.

There are diamond and gold nuggets of information throughout the report.

Here’s a few highlights:

The amount of digital information increases tenfold every five years.

By 2013 the amount of traffic flowing over the internet annually will reach 667 exabytes, according to Cisco.

According to a 2008 study by International Data Corp (IDC), around 1,200 exabytes (1.2 billion terabytes) of digital data will be generated this year.

Researchers at the University of California in San Diego (UCSD) examined the flow of data to American households. They found that in 2008 such households were bombarded with 3.6 zettabytes of information (or 34 gigabytes per person per day). The biggest data hogs were video games and television. In terms of bytes, written words are insignificant, amounting to less than 0.1% of the total. However, the amount of reading people do, previously in decline because of television, has almost tripled since 1980, thanks to all that text on the internet.

Providing access to data “creates a culture of accountability”, says Vivek Kundra, the federal government’s CIO.

Where traditional businesses generally collect information about customers from their purchases or from surveys, internet companies have the luxury of being able to gather data from everything that happens on their sites. The biggest websites have long recognised that information itself is their biggest treasure. And it can immediately be put to use in a way that traditional firms cannot match.

Experiments at the Large Hadron Collider at CERN, Europe’s
particle-physics laboratory near Geneva, generate 40 terabytes every
second.

Wal-Mart handles more than 1m customer transactions
every hour, feeding databases estimated at more than 2.5 petabytes—the
equivalent of 167 times the books in America’s Library of Congress.

Cablecom reduced customer defections from one-fifth of subscribers a year to under 5% by crunching its numbers. Its software spotted that although customer defections peaked in the 13th month, the decision to leave was made much earlier, around the ninth month (as indicated by things like the number of calls to customer support services). So Cablecom offered certain customers special deals seven months into their subscription and reaped the rewards.

Best Buy found that 7% of its customers accounted for 43% of its sales, so it reorganised its stores to concentrate on those customers’ needs.

The Royal Shakespeare Company (RSC) sifted through seven years of sales data for a marketing campaign that increased regular visitors by 70%. By examining more than 2m transaction records, the RSC discovered a lot more about its best customers: not just income, but things like occupation and family status, which allowed it to target its marketing more precisely.

[Using Hadoop, a form of cloud computing] Visa crunched two years of
test records, or 73 billion transactions, amounting to 36 terabytes of
data. The processing time fell from one month with traditional methods
to a mere 13 minutes.

One company that capitalises on real-time information flows is Li & Fung, one of the world’s biggest supply-chain operators. Founded in Guangzhou in southern China a century ago, it does not own any factories or equipment but orchestrates a network of 12,000 suppliers in 40 countries, sourcing goods for brands ranging from Kate Spade to Walt Disney. Its turnover in 2008 was $14 billion.

One of the most important technologies has turned out to be videoconferencing. It allows buyers and manufacturers to examine the colour of a material or the stitching on a garment. “Before, we weren’t able to send a 500MB image—we’d post a DVD. Now we can stream it to show vendors in our offices. With real-time images we can make changes quicker,” says Manuel Fernandez, Li & Fung’s chief technology officer. Data flowing through its network soared from 100 gigabytes a day only 18 months ago to 1 terabyte.

“What we are seeing is the ability to have economies form around the
data—and that to me is the big change at a societal and even
macroeconomic level,” says Craig Mundie, head of research and strategy
at Microsoft. Data are becoming the new raw material of business: an
economic input almost on a par with capital and labour.

“They are uncomfortable bringing so much attention to this because it is at the heart of their competitive advantage,” says Tim O’Reilly, a technology insider and publisher. “Data are the coin of the realm. They have a big lead over other companies that do not ‘get’ this.”

Google handles around half the world’s internet searches, answering around 35,000 queries every second.

Google’s innovation was to count the number of inbound links from other web pages. Such links act as “votes” on what internet users at large believe to be good content. More links suggest a webpage is more useful, just as more citations of a book suggests it is better. But although Google’s system was an improvement, it too was open to abuse from “link spam”, created only to dupe the system. The firm’s engineers realised that the solution was staring them in the face: the search results on which users actually clicked and stayed. A Google search might yield 2m pages of results in a quarter of a second, but users often want just one page, and by choosing it they “tell” Google what they are looking for. So the algorithm was rejigged to feed that information back into the service automatically. From then on Google realised it was in the data-mining business. To put the model in simple economic terms, its search results give away, say, $1 in value, and in return (thanks to the user’s clicks) it gets 1 cent back. When the next user visits, he gets $1.01 of value, and so on ... Together, all this is in line with the company’s audacious mission to “organise the world’s information”. Yet the words are carefully chosen: Google does not need to own the data. Usually all it wants is to have access to them (and see that its rivals do not). In an initiative called “Data Liberation Front” that quietly began last September, Google is planning to rejig all its services so that users can discontinue them very easily and take their data with them. In an industry built on locking in the customer, the company says it wants to reduce the “barriers to exit”.

In a study by IBM half the managers quizzed did not trust the information on which they had to base decisions. Many say that the technology meant to make sense of it often just produces more data. Instead of finding a needle in the haystack, they are making more hay.

UncategorizedCategories

Chris M Skinner

Chris Skinner is best known as an independent commentator on the financial markets through his blog, TheFinanser.com, as author of the bestselling book Digital Bank, and Chair of the European networking forum the Financial Services Club. He has been voted one of the most influential people in banking by The Financial Brand (as well as one of the best blogs), a FinTech Titan (Next Bank), one of the Fintech Leaders you need to follow (City AM, Deluxe and Jax Finance), as well as one of the Top 40 most influential people in financial technology by the Wall Street Journal's Financial News. To learn more click here...

How to deal with too much data

Share

Chris M Skinner