In order to succeed, we must first believe that we can.
--Your friends at LectureNotes

Note for BIG DATA - BD By Ketan Darandale Patil

  • Note
  • 3 Offline Downloads
  • Uploaded 11 months ago
0 User(s)
Download PDFOrder Printed Copy

Share it with your friends

Leave your Comments

Text from page-1

Big Data Analytics(T1-IT53) “BIG DATA” IN THE ENTERPRISE CHAPTERNO:-1 INTRODUCTION:Big data is a blanket term for the non-traditional strategies and technologies needed to gather, organize, process, and gather insights from large datasets. Big data is:  large datasets  the category of computing strategies and technologies that are used to handle large datasets "Large dataset" means a dataset too large to reasonably process or store with traditional tooling or on a single computer. In 2001, Gartner's Doug Laney first presented what became known as the "three Vs of big data" to describe some of the characteristics that make big data different from other data processing: Volume The sheer scale of the information processed helps define big data systems. These datasets can be orders of magnitude larger than traditional datasets, which demands more thought at each stage of the processing and storage life cycle.  Velocity Another way in which big data differs significantly from other data systems is the speed that information moves through the system. Data is frequently flowing into the system from multiple sources and is often expected to be processed in real time to gain insights and update the current understanding of the system.  Variety Big data problems are often unique because of the wide range of both the sources being processed and their relative quality. Data can be ingested from internal systems like application and server logs, from social media feeds and other external APIs, from physical device sensors, and from other providers. Big data seeks to handle potentially useful data regardless of where it's coming from by consolidating all information into a single system. Prepared By:-Prof. Deepak Pandita, JSPM’s Jayawant Institute of Management Studies.

Text from page-2

Big Data Analytics(T1-IT53) What is the importance of Big Data? The importance of big data is how you utilize the data which you own. Data can be fetched from any source and analyze it to solve that enable us in terms of 1) Cost reductions 2) Time reductions, 3) New product development and optimized offerings, and 4) Smart decision making. Combination of big data with high-powered analytics, you can have great impact on your business strategy such as:  Finding the root cause of failures, issues and defects in real time operations.  Generating coupons at the point of sale seeing the customer’s habit of buying goods.  Recalculating entire risk portfolios in just minutes.  Detecting fraudulent behavior before it affects and risks your organization. CHALLENGES:Today, every minute, sees production of huge amounts of data. Every large company is struggling to find ways to make this data useful. However, this is not an easy task. The amount of data produced makes it very difficult to store, manage, analyse and utilize it. The development of various big data analysis tools has helped with data handling to a great extent. The main challenges faced in Big Data Analysis:Data storage and quality Prepared By:-Prof. Deepak Pandita, JSPM’s Jayawant Institute of Management Studies.

Text from page-3

Big Data Analytics(T1-IT53) Companies and Organizations are growing at a very fast pace. Moreover, the growth of the companies rapidly increases the amount of data produced. The storage of this data is becoming a challenge for everyone. Options like data lakes/ warehouses are used to collect and store massive quantities of unstructured data in its native format. The problem, however, is when a data lakes/ warehouse try to combine inconsistent data from disparate sources, it encounters errors. Good quality analysis The companies and organizations use big data produced to make the best decisions possible. Consequently, the data they are using should be accurate. If the data used to make decisions is not accurate it will result in ill-advised decisions that would ultimately be detrimental to the future success of their business. This high reliance on data quality makes testing a high priority issue. This requires a lot of resources to ensure the accuracy of the information provided. Security and privacy of the data Once, companies and organizations figure out how to use big data, it gives them a varied range of opportunities. However, it also involves big risks when it comes to the security and the privacy of the data. The tools used for analysis, stores, manages, analyses, and utilizes the data from a different variety of sources. This ultimately leads to a risk of exposure of the data, making it highly vulnerable. Therefore, the production of more and more data increases security and privacy concerns. Various sources of data Dealing with the volume of data being produced and the velocity at which it is being produced is a challenge. Additionally, it is a challenge to manage the enormous number of sources that are producing this data. The data comes from the company’s internal sources like finance, marketing etc. Moreover, external sources like social media produce a huge amount of data. Therefore, making the data extremely diverse and massive. Any number of tools and Big Data experts will not be enough to manage and utilize this amount of data optimally. OPPORTUNITIES FROM BIG DATA:Big data refers to the dynamic, large and disparate volumes of data being created by people, tools and machines. It requires new, innovative and scalable technology to collect, host and analytically process the vast amount of data gathered to derive real-time business insights that relate to consumers, risk, profit, performance, productivity management and enhanced shareholder value. The biggest opportunities are in the continued evolution of big data   Tie in real-time operations to telecom providers. Managing things centrally makes sense. All companies are scrambling to become part of Amazon. We want to make our product available on AWS. Redundant, virtualized systems in the public cloud. This is a Capex versus Op Ex versus core competency decision. Prepared By:-Prof. Deepak Pandita, JSPM’s Jayawant Institute of Management Studies.

Text from page-4

Big Data Analytics(T1-IT53)      The immediate opportunity is to find effective, efficient ways to correlate use cases, and leverage common big data stores to address multiple needs. More real-time convergence of disparate data sets for real-time analytics. Machine learning. Being able to ask questions of big data and get answers back. IOT will provide real-time traffic, threats, and traffic. Life sciences – struggle with large data sets. Able to predict microbiotics’ impact on humans. Resurgence of small businesses integrating with online businesses. It’s easy to open a store online, look at the analytics and then replicate in brick and mortar. Retail will be able to use the same technology developed for casinos (e.g., Space Meter) to track walk byes and drive byes. There will be more analytics involved in monitoring the performance and potential of retail stores even down to the use of shelf space. Better education at the grassroots level. Computer engineering degrees need to teach the concepts and provide future computer engineers with experience in big data processing. It takes a significant mind shift to move away from traditional data processing concepts, such as relational database, and to move toward big data processing concepts. Cloud providers need to keep refining their tools to make them easier and easier to use. To be fair, they are doing this. Machine learning and advanced analysis as a follow-on phase to Big Data processing is already a revolution. I believe that the longer-term impact of this technology will be politically and economically profound and is currently grossly under-estimated by most people. ENTERPRISE INFORMATION MANAGEMENT:Enterprise information management (EIM) is a field of within information technology. It specializes in finding solutions for optimal use of information within organizations, for instance to support decision-making processes or day-to-day operations that require the availability of knowledge. It tries to overcome traditional IT-related barriers to managing information at an enterprise level. EIM combines enterprise content management (ECM), business process management (BPM), customer experience management (CEM), and business intelligence (BI). Whereas BI and ECM focus on the management of structured and unstructured information respectively, EIM does not make this distinction but approaches the management of information from the perspective of the whole enterprise. Big Data has opened up possibilities for speedy, economical, and more grassroots type of data solutions. Businesses are not willing to bargain Data Quality or Governance, which they have come to expect since the days of Data Warehouses. Modern businesses cannot afford to get trapped into standard reporting by IT-savvy users, but need just-in-time, fast, and accurate information to aid daily decision making. NEW APPROACH TO ENTERPRISE INFORMATION MANAGEMENT FOR BIG DATA: Prepared By:-Prof. Deepak Pandita, JSPM’s Jayawant Institute of Management Studies.

Lecture Notes