In life, NO ONE and NOTHING will help you until you start helping YOURSELF.
--Your friends at LectureNotes

Note for BIG DATA - BD by zabeen b

  • Note
  • 7 Topics
  • 54 Offline Downloads
  • Uploaded 1 year ago
Zabeen B
Zabeen B
0 User(s)
Download PDFOrder Printed Copy

Share it with your friends

Leave your Comments

Text from page-3

Big data is often boiled down to a few varieties including social data, machine data, and transactional data. Social media data is providing remarkable insights to companies on consumer behavior and sentiment that can be integrated with CRM data for analysis, with 230 million tweets posted on Twitter per day, 2.7 billion Likes and comments added to Facebook every day, and 60 hours of video uploaded to YouTube every minute (this is what we mean by velocity of data). Machine data consists of information generated from industrial equipment, real-time data from sensors that track parts and monitor machinery (often also called the Internet of Things), and even web logs that track user behavior online. At arcplan client CERN, the largest particle physics research center in the world, the Large Hadron Collider (LHC) generates 40 terabytes of data every second during experiments. Regarding transactional data, large retailers and even B2B companies can generate multitudes of data on a regular basis considering that their transactions consist of one or many items, product IDs, prices, payment information, manufacturer and distributor data, and much more. Major retailers like Amazon.com, which posted $10B in sales in Q3 2011, and restaurants like US pizza chain Domino's, which serves over 1 million customers per day, are generating petabytes of transactional big data. The thing to note is that big data can resemble traditional structured data or unstructured, high frequency information. Big Data Analytics Big (and small) Data analytics is the process of examining data—typically of a variety of sources, types, volumes and / or complexities—to uncover hidden patterns, unknown correlations, and other useful information. The intent is to find business insights that were not previously possible or were missed, so that better decisions can be made.

Text from page-4

Big Data analytics uses a wide variety of advanced analytics to provide 1. Deeper insights. Rather than looking at segments, classifications, regions, groups, or other summary levels you ’ll have insights into all the individuals, all the products, all the parts, all the events, all the transactions, etc. 2. Broader insights. The world is complex. Operating a business in a global, connected economy is very complex given constantly evolving and changing conditions. As humans, we simplify conditions so we can process events and understand what is happening. But our best-laid plans often go astray because of the estimating or approximating. Big Data analytics takes into account all the data, including new data sources, to understand the complex, evolving, and interrelated conditions to produce more accurate insights. 3. Frictionless actions. Increased reliability and accuracy that will allow the deeper and broader insights to be automated into systematic actions. Advanced Big data analytics Big data analytic applications

Text from page-5

3 dimensions / characteristics of Big data 3Vs (volume, variety and velocity) are three defining properties or dimensions of big data. Volume refers to the amount of data, variety refers to the number of types of data and velocity refers to the speed of data processing. Volume: The size of available data has been growing at an increasing rate. The volume of data is growing. Experts predict that the volume of data in the world will grow to 25 Zettabytes in 2020. That same phenomenon affects every business – their data is growing at the same exponential rate too. This applies to companies and to individuals. A text file is a few kilo bytes, a sound file is a few mega bytes while a full length movie is a few giga bytes. More sources of data are added on continuous basis. For companies, in the old days, all data was generated internally by employees. Currently, the data is generated by employees, partners and customers. For a group of companies, the data is also generated by machines. For example, Hundreds of millions of smart phones send a variety of information to the network infrastructure. This data did not exist five years ago. More sources of data with a larger size of data combine to increase the volume of data that has to be analyzed. This is a major issue for those looking to put that data to use instead of letting it just disappear. Peta byte data sets are common these days and Exa byte is not far away.

Text from page-6

Velocity: Data is increasingly accelerating the velocity at which it is created and at which it is integrated. We have moved from batch to a real-time business. Initially, companies analyzed data using a batch process. One takes a chunk of data, submits a job to the server and waits for delivery of the result. That scheme works when the incoming data rate is slower than the batch-processing rate and when the result is useful despite the delay. With the new sources of data such as social and mobile applications, the batch process breaks down. The data is now streaming into the server in real time, in a continuous fashion and the result is only useful if the delay is very short. Data comes at you at a record or a byte level, not always in bulk. And the demands of the business have increased as well – from an answer next week to an answer in a minute. In addition, the world is becoming more instrumented and interconnected. The volume of data streaming off those instruments is exponentially larger than it was even 2 years ago. Variety: Variety presents an equally difficult challenge. The growth in data sources has fuelled the growth in data types. In fact, 80% of the world’s data is unstructured. Yet most traditional methods apply analytics only to structured information. From excel tables and databases, data structure has changed to loose its structure and to add hundreds of formats. Pure text, photo, audio, video, web, GPS data, sensor data, relational data bases, documents, SMS, pdf, flash, etc. One no longer has control over

Lecture Notes