Keep asking yourself. "How bad do you want it? how bad?"
--Your friends at LectureNotes

Note for Data Mining And Data Warehousing - DMDW by kishan chandra

  • Data Mining And Data Warehousing - DMDW
  • Note
  • GD Goenka University -
  • Computer Science Engineering
  • B.Tech
  • 2 Topics
  • 165 Offline Downloads
  • Uploaded 1 year ago
Kishan Chandra
Kishan Chandra
0 User(s)
Download PDFOrder Printed Copy

Share it with your friends

Leave your Comments

Text from page-2

Unit- 1 Introduction and Data Processing Introduction Of Data Mining There is a huge amount of data available in the Information Industry. This data is of no use until it is converted into useful information. It is necessary to analyze this huge amount of data and extract useful information from it. Extraction of information is not the only process we need to perform; data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation, Data Mining, Pattern Evaluation and Data Presentation. Once all these processes are over, we would be able to use this information in many applications such as Fraud Detection, Market Analysis, Production Control, Science Exploration, etc. Data mining integrates approaches and techniques from various disciplines such as machine learning, statistics, artificial intelligence, neural networks, database management, data warehousing, data visualization, spatial data analysis, probability graph theory etc. In short, data mining is a multi-disciplinary field.

Text from page-3

Definition of Data Mining Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data mining is the procedure of mining knowledge from data. The information or knowledge extracted so can be used for any of the following applications −  Market Analysis  Fraud Detection  Customer Retention  Production Control  Science Exploration Data Mining Applications Data mining is highly useful in the following domains −  Market Analysis and Management  Corporate Analysis & Risk Management  Fraud Detection Apart from these, data mining can also be used in the areas of production control, customer retention, science exploration, sports, astrology, and Internet Web Surf-Aid Market Analysis and Management Listed below are the various fields of market where data mining is used −  Customer Profiling − Data mining helps determine what kind of people buy what kind of products.  Identifying Customer Requirements − Data mining helps in identifying the best products for different customers. It uses prediction to find the factors that may attract new customers.  Cross Market Analysis − Data mining performs Association/correlations between product sales.  Target Marketing − Data mining helps to find clusters of model customers who share the same characteristics such as interests, spending habits, income, etc.  Determining Customer purchasing pattern − Data mining helps in determining customer purchasing pattern.  Providing Summary Information − Data mining provides us various multidimensional summary reports.

Text from page-4

Corporate Analysis and Risk Management Data mining is used in the following fields of the Corporate Sector −  Finance Planning and Asset Evaluation − It involves cash flow analysis and prediction, contingent claim analysis to evaluate assets.  Resource Planning − It involves summarizing and comparing the resources and spending.  Competition − It involves monitoring competitors and market directions. Fraud Detection Data mining is also used in the fields of credit card services and telecommunication to detect frauds. In fraud telephone calls, it helps to find the destination of the call, duration of the call, time of the day or week, etc. It also analyzes the patterns that deviate from expected norms. Data Preprocessing & form of data preprocessing Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues. Data preprocessing prepares raw data for further processing. Data preprocessing is used database-driven applications such as customer relationship management and rule-based applications (like neural networks). Data goes through a series of steps during preprocessing:      Data Cleaning: Data is cleansed through processes such as filling in missing values, smoothing the noisy data, or resolving the inconsistencies in the data. Data Integration: Data with different representations are put together and conflicts within the data are resolved. Data Transformation: Data is normalized, aggregated and generalized. Data Reduction: This step aims to present a reduced representation of the data in a data warehouse. Data Discretization: Involves the reduction of a number of values of a continuous attribute by dividing the range of attribute intervals User interface User interface is the module of data mining system that helps the communication between users and the data mining system. User Interface allows the following functionalities −  Interact with the system by specifying a data mining query task.  Providing information to help focus the search.  Mining based on the intermediate data mining results.

Text from page-5

 Browse database and data warehouse schemas or data structures.  Evaluate mined patterns.  Visualize the patterns in different forms. Data Integration Data Integration is a data preprocessing technique that merges the data from multiple heterogeneous data sources into a coherent data store. Data integration may involve inconsistent data and therefore needs data cleaning. Data Cleaning Data cleaning is a technique that is applied to remove the noisy data and correct the inconsistencies in data. Data cleaning involves transformations to correct the wrong data. Data cleaning is performed as a data preprocessing step while preparing the data for a data warehouse. Data Selection Data Selection is the process where data relevant to the analysis task are retrieved from the database. Sometimes data transformation and consolidation are performed before the data selection process. Clusters Cluster refers to a group of similar kind of objects. Cluster analysis refers to forming group of objects that are very similar to each other but are highly different from the objects in other clusters. Data Transformation In this step, data is transformed or consolidated into forms appropriate for mining, by performing summary or aggregation operations. Data Cleaning Data cleaning is a technique that is applied to remove the noisy data and correct the inconsistencies in data. Data cleaning involves transformations to correct the wrong data. Data cleaning is performed as a data preprocessing step while preparing the data for a data warehouse Data cleaning is a process used to determine inaccurate, incomplete or unreasonable data and then improve the quality through correcting of detected errors and omissions. Generally data cleaning reduces errors and improves the data quality. Correcting errors in data and eliminating bad records can be a time consuming and tedious process but it cannot be ignored. Data mining is a key technique for data cleaning. Data mining is a technique for discovery interesting information in data.

Lecture Notes