In order to succeed, we must first believe that we can.
--Your friends at LectureNotes

Note for Data Mining And Data Warehousing - DMDW By Anurag Kumar

  • Data Mining And Data Warehousing - DMDW
  • Note
  • Rajiv Gandhi Proudyogiki Vishwavidyalaya Bhopal - rgpv
  • Computer Science Engineering
  • 11 Offline Downloads
  • Uploaded 1 year ago
0 User(s)
Download PDFOrder Printed Copy

Share it with your friends

Leave your Comments

Text from page-1

Anurag Kumar, Asst. Prof. CSE, Dr. APJ Abdul Kalam UIT Jhabua Subject: Data Mining UNIT I Lecture I Data warehouse A data warehouse is a large collection of business data used to help an organization make decisions. The large amount of data in data warehouses comes from different places such as internal applications such as marketing, sales, and finance; customer -facing apps; and external partner systems, among others. On a technical level, a data warehouse periodically pulls data from those apps and systems; then, the data goes through formattin g and import processes to match the data already in the warehouse. The data warehouse stores this processed data so it’s ready for decision makers to access. How frequently data pulls occur, or how data is formatted, etc., will vary depending on the needs of the organization. Data warehousing Data warehousing is a technology that aggregates structured data from one or more sources so that it can be compared and analyzed for greater business intelligence. It is electronic storage of a large amount of information by a business which is designed for query and analysis instead of transaction processing. It is a process of transforming data into information and making it available to users in a timely manner to make a difference. What do I need to know about data warehousing? Data warehouses are typically used to correlate broad business data to provide greater executive insight into corporate performance. How is a data warehouse different from a regular database? Data warehouses use a different design from standard operational databases. The latter are optimized to maintain strict accuracy of data in the moment by rapidly updating real-time data. Data warehouses, by contrast, are designed to give a long-range view of data over time. They trade off transaction volume and instead specialize in data aggregation. What are data warehouses used for? Many types of business data are analyzed via data warehouses. The need for a data warehouse often becomes evident when analytic requirements run afoul of the ongoing performance of operational databases. Running a complex query on a database requires the database to enter a temporary fixed state. This is often untenable for transactional databases. A data warehouse is employed to do the analytic work, leaving the transactional database free to focus on transactions. The other benefits of a data warehouse are the ability to analyze data from multiple sources and to negotiate differences in storage schema using the ETL process. The decision support database (Data Warehouse) is maintained separately from the organization's operational database. However, the data warehouse is not a product but an environment. It is an architectural construct of an information system which provides users with current and historical decision support information which is difficult to access or present in the traditional operational data store. 1

Text from page-2

Anurag Kumar, Asst. Prof. CSE, Dr. APJ Abdul Kalam UIT Jhabua The data warehouse is the core of the BI system which is built for data analysis and reporting. You many know that a 3NF-designed database for an inventory system many have tables related to each other. For example, a report on current inventory information can include more than 12 joined conditions. This can quickly slow down the response time of the query and report. A data warehouse provides a new design which can help to reduce the response time and helps to enhance the performance of queries for reports and analytics. Data warehouse system is also known by the following name: • • • • • • Decision Support System (DSS) Executive Information System Management Information System Business Intelligence Solution Analytic Application Data Warehouse History of Datawarehouse The Datawarehouse benefits users to understand and enhance their organization's performance. The need to warehouse data evolved as computer systems became more complex and needed to handle increasing amounts of Information. However, Data Warehousing is a not a new thing. Here are some key events in evolution of Data Warehouse• 1960- Dartmouth and General Mills in a joint research project, develop the terms dimensions and facts. • • 1970- A Nielsen and IRI introduces dimensional data marts for retail sales. 1983- Tera Data Corporation introduces a database management system which is specifically designed for decision support • Data warehousing started in the late 1980s when IBM worker Paul Murphy and Barry Devlin developed the Business Data Warehouse. • However, the real concept was given by Inmon Bill. He was considered as a father of data warehouse. He had written about a variety of topics for building, usage, and maintenance of the warehouse & the Corporate Information Factory. How Datawarehouse works? A Data Warehouse works as a central repository where information arrives from one or more data sources. Data flows into a data warehouse from the transactional system and other relational databases. Data may be: 1. Structured 2. Semi-structured 3. Unstructured data 2

Text from page-3

Anurag Kumar, Asst. Prof. CSE, Dr. APJ Abdul Kalam UIT Jhabua The data is processed, transformed, and ingested so that users can access the processed data in the Data Warehouse through Business Intelligence tools, SQL clients, and spreadsheets. A data warehouse merges information coming from different sources into one comprehensive database. By merging all of this information in one place, an organization can analyze its customers more holistically. This helps to ensure that it has considered all the information available. Data warehousing makes data mining possible. Data mining is looking for patterns in the data that may lead to higher sales and profits. Types of Data Warehouse Three main types of Data Warehouses are: 1. Enterprise Data Warehouse: Enterprise Data Warehouse is a centralized warehouse. It provides decision support service across the enterprise. It offers a unified approach for organizing and representing data. It also provide the ability to classify data according to the subject and give access according to those divisions. 2. Operational Data Store: Operational Data Store, which is also called ODS, are nothing but data store required when neither Data warehouse nor OLTP systems support organizations reporting needs. In ODS, Data warehouse is refreshed in real time. Hence, it is widely preferred for routine activities like storing records of the Employees. 3. Data Mart: A data mart is a subset of the data warehouse. It specially designed for a particular line of business, such as sales, finance, sales or finance. In an independent data mart, data can collect directly from sources. General stages of Data Warehouse Earlier, organizations started relatively simple use of data warehousing. However, over time, more sophisticated use of data warehousing begun. The following are general stages of use of the data warehouse: Offline Operational Database: In this stage, data is just copied from an operational system to another server. In this way, loading, processing, and reporting of the copied data do not impact the operational system's performance. Offline Data Warehouse: Data in the Datawarehouse is regularly updated from the Operational Database. The data in Datawarehouse is mapped and transformed to meet the Datawarehouse objectives. Real time Data Warehouse: In this stage, Data warehouses are updated whenever any transaction takes place in operational database. For example, Airline or railway booking system. Integrated Data Warehouse: In this stage, Data Warehouses are updated continuously when the operational system performs a transaction. The Datawarehouse then generates transactions which are passed back to the operational system. 3

Text from page-4

Anurag Kumar, Asst. Prof. CSE, Dr. APJ Abdul Kalam UIT Jhabua Components of Data warehouse Four components of Data Warehouses are: Load manager: Load manager is also called the front component. It performs with all the operations associated with the extraction and load of data into the warehouse. These operations include transformations to prepare the data for entering into the Data warehouse. Warehouse Manager: Warehouse manager performs operations associated with the management of the data in the warehouse. It performs operations like analysis of data to ensure consistency, creation of indexes and views, generation of denormalization and aggregations, transformation and merging of source data and archiving and baking-up data. Query Manager: Query manager is also known as backend component. It performs all the operation operations related to the management of user queries. The operations of this Data warehouse components are direct queries to the appropriate tables for scheduling the execution of queries. End-user access tools: This is categorized into five different groups like 1. Data Reporting 2. Query Tools 3. Application development tools 4. EIS tools, 5. OLAP tools and data mining tools. Who needs Data warehouse? Data warehouse is needed for all types of users like: • Decision makers who rely on mass amount of data • Users who use customized, complex processes to obtain information from multiple data sources. • It is also used by the people who want simple technology to access the data • It also essential for those people who want a systematic approach for making decisions. • If the user wants fast performance on a huge amount of data which is a necessity for reports, grids or charts, then Data warehouse proves useful. • Data warehouse is a first step If you want to discover 'hidden patterns' of data-flows and groupings. What Is a Data Warehouse Used For? Here, are most common sectors where Data warehouse is used: Airline: In the Airline system, it is used for operation purpose like crew assignment, analyses of route profitability, frequent flyer program promotions, etc. Banking: It is widely used in the banking sector to manage the resources available on desk effectively. Few banks also used for the market research, performance analysis of the product and operations. Healthcare: Healthcare sector also used Data warehouse to strategize and predict outcomes, generate patient's treatment reports, share data with tie-in insurance companies, medical aid services, etc. 4

Lecture Notes