Home / Opinion / Big Data is all about knowing what questions to ask

Big Data is all about knowing what questions to ask

Like everyone, I was shocked by the news of
the Westgate Shopping Mall shootings in Nairobi, Kenya.

The real shock is how they determined who to shoot, singling
out individuals and asking whether they could name the Prophet Muhammad’s
mother:

Reports from separate
floors of the building in the first hours of the assault told how the
attackers, speaking rough Swahili and English, shouted at Muslims to identify
themselves. Many people came forward. They were ordered to speak in Arabic, or
to recite a verse from the Koran, or to name the Prophet Mohammed’s mother.
Those who passed this test were allowed to flee. Those that did not were
executed, including children.

It almost seems perfunctory to relate this to banking, but
it did sit firmly in my mind as I chaired a meeting around Big Data last night.

I hate the term Big Data, as mentioned before, and feel it needs a context and so here is the context.

I don’t know the name of the Prophet’s mother but, within
two seconds, I can Google the answer:  Aminah
bint Wahb.

I don’t know a verse of the Koran but can find one in
seconds online: Assalamu alaikum wa rahmatullahi
wa barakatuh
(May the peace, mercy, and blessings of Allah be with you).

And there is the context of Big Data: if you don’t know the
question, how can you find the answer?

The discussion about Big Data last night was in the question
of Fraud and Anti-Money Laundering (AML) and was a wide ranging conversation.

Big Data for fraud and AML is all about cost avoidance whilst,
on the other hand, much of the Big Data conversation is about marketing and
sales for revenue uptick.

Both are valid uses of Big Data analytics, but this market
is nothing new.

Teradata was doing all this stuff in the 1990s with propensity
modelling and data mining, with Wal*Mart their biggest customer in the world back
then, with a 27 terabyte database.

The change today is that the world produces 27 terabytes every
few seconds thanks to social media.

This is well illustrated by Maria Conner’s recent blog entry:

In 2012, every day 2.5 quintillion bytes of data (1
followed by 18 zeros) are created, with 90% of the world’s data created in
the last two years alone. As a society, we’re producing and capturing more data
each day than was seen by everyone since the beginning of the earth.

This vast amount of digital data would fill DVD stack
reaching from the Earth
to moon and back
. To put things in perspective, the entire works of
William Shakespeare (in text form) represent about 5 MB of data. So,
you could store about 1,000 copies of Shakespeare on a single DVD. The text in
all the books in the Library of Congress would fit comfortably on a stack of
DVDs the height of a single-story house.

The world’s technological per-capita capacity to store
information has roughly doubled every 40 months since the 1980s according to Martin
Hilbert and Priscila López
.

Given that unstructured data accounts for 80% of the data in
the world, and we know much of that is from social media that gets special
attention.

How much data is generated through social media tools?

  • People
    send more than 144.8
    billion Email
     messages sent a day.
  • People
    and brands on Twitter send more than 340
    million tweets
     a day.
  • People
    on Facebook share more than 684,000 bits of content a
    day.
  • People
    upload 72 hours (259,200 seconds) of new video to YouTube a
    minute.
  • Consumers spend
    $272,000 on Web shopping a day.
  • Google receives
    over 2 million
    search queries
     a minute.
  • Apple receives
    around 47,000 app downloads a minute.
  • Brands receive
    more than 34,000 Facebook ‘likes’ a minute.
  • Tumblr blog
    owners publish 27,000 new posts a minute.
  • Instagram photographers
    share 3,600 new photos a minute.
  • Flickr photographers
    upload 3,125 new photos a minute.
  • People
    perform over 2,000 Foursquare check-ins a minute.
  • Individuals
    and organizations launch 571 new websites a minute.
  • WordPress bloggers
    publish close to 350 new blog posts a minute.
  • The Mobile
    Web 
    receives 217 new participants a minute.

The most updated
numbers are available from the sites themselves.

So what?

Well the so what
test is that twenty years ago, we could not produce, search, analyse and track so
much data because it was too costly.

Teradata used to refer to their systems as BFOBs (Big F-Off
Boxes) and that it would be a $20 million plus investment to get one up and running
effectively.  Today, you can do that
analysis in the cloud for peanuts.

This means that we couldn’t analyse and leverage the data in
the past, but we can today. The question then is how do you do it?

Bring all the data into one big enterprise bucket, and then
apply Hadoop to
it?

Possibly, but that does not work in many banks as they have everything
still structured in siloed boxes, some of which are segregated by law.  For example, integrating the insurance data
with the banking data in a bancassurance group is still claimed to be a big
no-no.

That does not wash today however, and I suspect that
regulations are used as an excuse for inertia rather than being a real block.  After all, Tesco Bank claim this will be their major opportunity:

“In our move from retailing products to bank retailing,
it amazes me that the current incumbents reward the new customer rather than
the existing one. That encourages promiscuity and commoditisation. If you can
reward the existing customer more than the new one, by learning more about
them, then you can price your products better. For example, our Clubcard (their
major loyalty program) data allows us to price our products 15% more accurately
than the Royal Bank of Scotland for any particular risk type by customer
segment. This means we can be the best at risk-adjusted pricing.”

In other words,
integrating retail data with financial data is not a big leap of thinking.  It needs to be on a permissions basis however,
as I would not appreciate you offering me baby products if I didn’t know my
partner was pregnant or, worse, my daughter.

Target started sending
coupons for baby items to customers according to their pregnancy scores resulting
in an angry man going into a Target store in Minneapolis, demanding to talk to
the manager.

 “My daughter got this in the mail!” he said.
“She’s still in high school, and you’re sending her coupons for baby clothes
and cribs? Are you trying to encourage her to get pregnant?”

The manager didn’t
have any idea what the man was talking about and apologised.  He called a few days later to apologise again
but the father was somewhat abashed. “I had a talk with my daughter,” he said.
“It turns out there’s been some activities in my house I haven’t been
completely aware of. She’s due in August. I owe you an apology.”

Data analytics is the new battleground and the first step is
to get the data sorted for the purpose of the question you are trying to
answer.

Then there is another interesting aside: it’s not just the
internal data.

As we talked last night, many of the attendees felt the
hardest part would be organising the data internally, with Forrester saying that companies only use around 12% of the internal data available to
them.

But what about all the external data?  When people leave digital footprints built
over years in Facebook, LinkedIn, Twitter, Tumblr, Flickr and more, then it
makes it far easier to track individual’s histories and identify them than ever
before.

That’s what criminals are finding, as referenced in the recent
report by Sophos who cracked open a criminal gang using malware in Russia
thanks to their social media footprints,
so shouldn’t we be using these for finding the criminals who launder or defraud?

It’s obviously not a simple thing however, as building data
banks that hold all the data about an individual in public domain and internally
would be a massive task … but today’s technologies allow you to tackle such
massive tasks.  As mentioned, you can do what
Wal*Mart were doing twenty years ago for a few pennies today.

I guess the conclusion is that if data is the battleground,
then you need to arm yourselves with as much weaponry as possible and, for
those who invest the most in their warfare, the rewards will be increased
market share and decreased cost.

That’s as long as you know the question to be asked of
course.

“It’s not just about
looking for needles in haystacks, but removing some of the hay.”  
Martha Bennett,
Forrester 

 

 

 

About Chris M Skinner

Chris M Skinner
Chris Skinner is best known as an independent commentator on the financial markets through his blog, the Finanser.com, as author of the bestselling book Digital Bank, and Chair of the European networking forum the Financial Services Club. He has been voted one of the most influential people in banking by The Financial Brand (as well as one of the best blogs), a FinTech Titan (Next Bank), one of the Fintech Leaders you need to follow (City AM, Deluxe and Jax Finance), as well as one of the Top 40 most influential people in financial technology by the Wall Street Journal’s Financial News. To learn more click here...

Check Also

Chain

For all the talk about blockchain, what is really happening?

There is a lot of talk about blockchain.  A lot.  Loads.  But when you get …

2 comments

  1. Great summary of the Big data topic from SIBOS but I agree that the question if the main factor that should drive the BIG DATA processes. we refer to Edna St. Vincent Millay’s poem titled “Huntsman, What Quarry” back in the year 1939 that addressed the question of Big Data:
    Upon this gifted age, in its dark hour,
    rains from the sky a meteoric shower
    of facts…they lie, unquestioned, uncombined.
    Wisdom enough to leach us of our ill
    is daily spun; but there exists no loom
    to weave it into a fabric.
    Here is the article where I found this from. A different view on the issue of Big Data:
    http://www.perceptualedge.com/articles/visual_business_intelligence/big_data_big_ruse.pdf
    Paul

  2. Great overview Chris. The sheer number of statistics available on Big Data probably fills up quite a few DVDs :)
    One thing that I think doesn’t get enough attention is how incredibly freaking difficult it is to deal with that 80% of data that is unstructured. It’s really really hard, but despite that fact, unstructured data gets all the press.
    And look, I get it. Social media is undeniably sexy. Just look at all of those statistics! Look at all of the data consumers are putting out there about themselves and their preferences! We’ve got to start mining right now!
    However, as you point out, the thing that is missing from the Big Data discussion is focus. Focus on the right questions. Focus on a smaller pile of hay, as Martha Bennett astutely points out.
    From my perspective, that focus first needs to be directed at the 20% of data that is structured. Before we worry about the 80%, let’s ensure that banks are asking the right questions and getting useful answers out of the structured data they already have access to. Well constructed attributes based on structured internal and external data can drive a tremendous amount of value. Let’s not get distracted by the shinny objects.
    Here is an article that discusses building attributes in more detail.
    http://www.zootweb.com/blog/index.php/building-attributes-start-data-6/1911/

Click on a tab to select how you'd like to leave your comment

Leave a Reply

Your email address will not be published. Required fields are marked *