3. Tom White, "Hadoop: The Definitive Guide", Third Edition, O'Reilley, 2012.
4. Eric Sammer, "Hadoop Operations", O'Reilley, 2012.
5. E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilley, 2012.
6. Lars George, "HBase: The Definitive Guide", O'Reilley, 2011.
7. Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010.
8. Alan Gates, "Programming Pig", O'Reilley, 2011.
Big data is data that exceeds the processing capacity of conventional database systems.
The data is too big, moves too fast, or does not fit the structures of traditional database
architectures. In other words, Big data is an all-encompassing term for any collection
of data sets so large and complex that it becomes difficult to process using on-hand data
management tools or traditional data processing applications. To gain value from this
data, you must choose an alternative way to process it. Big Data is the next generation
of data warehousing and business analytics and is poised to deliver top line revenues
cost efficiently for enterprises. Big data is a popular term used to describe the
exponential growth and availability of data, both structured and unstructured.
Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the
world today has been created in the last two years alone. This data comes from
everywhere: sensors used to gather climate information, posts to social media sites,
digital pictures and videos, purchase transaction records, and cell phone GPS signals to
name a few. This data is big data.
Big data usually includes data sets with sizes beyond the ability of commonly used
software tools to capture, create, manage, and process the data within a tolerable
Big data is high-volume, high-velocity and high-variety information assets that demand
cost-effective, innovative forms of information processing for enhanced insight and