Big data is data that exceeds the processing capacity of conventional database
systems. The data is too big, moves too fast, or does not fit the structures of traditional
database architectures. In other words, Big data is an all-encompassing term for any
collection of data sets so large and complex that it becomes difficult to process using on-hand
data management tools or traditional data processing applications.
To gain value from this data, you must choose an alternative way to process it. Big Data is
the next generation of data warehousing and business analytics and is poised to deliver top
line revenues cost efficiently for enterprises. Big data is a popular term used to describe the
exponential growth and availability of data, both structured and unstructured.
Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the
world today has been created in the last two years alone. This data comes from everywhere:
sensors used to gather climate information, posts to social media sites, digital pictures and
videos, purchase transaction records, and cell phone GPS signals to name a few. This data is
Big data usually includes data sets with sizes beyond the ability of commonly used software
tools to capture, create, manage, and process the data within a tolerable elapsed time
Big data is high-volume, high-velocity and high-variety information assets that demand costeffective, innovative forms of information processing for enhanced insight and decisionmaking.
Big data is often boiled down to a few varieties including social data, machine data, and
transactional data. Social media data is providing remarkable insights to companies on
consumer behavior and sentiment that can be integrated with CRM data for analysis, with
230 million tweets posted on Twitter per day, 2.7 billion Likes and comments added to
Facebook every day, and 60 hours of video uploaded to YouTube every minute (this is what
we mean by velocity of data).
Machine data consists of information generated from industrial equipment, real-time data
from sensors that track parts and monitor machinery (often also called the Internet of Things),
and even web logs that track user behavior online. At arcplan client CERN, the largest
particle physics research center in the world, the Large Hadron Collider (LHC) generates 40
terabytes of data every second during experiments. Regarding transactional data, large
retailers and even B2B companies can generate multitudes of data on a regular basis
considering that their transactions consist of one or many items, product IDs, prices, payment
information, manufacturer and distributor data, and much more.
Major retailers like Amazon.com, which posted $10B in sales in Q3 2011, and restaurants
like US pizza chain Domino's, which serves over 1 million customers per day, are generating
petabytes of transactional big data. The thing to note is that big data can resemble traditional
structured data or unstructured, high frequency information.