Data Warehouse And Data Mining By Kumar Kishan Chandra
Unit- 1 Introduction and Data Processing Introduction Of Data Mining There is a huge amount of data available in the Information Industry. This data is of no use until it is converted into useful information. It is necessary to analyze this huge amount of data and extract useful information from it. Extraction of information is not the only process we need to perform; data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation, Data Mining, Pattern Evaluation and Data Presentation. Once all these processes are over, we would be able to use this information in many applications such as Fraud Detection, Market Analysis, Production Control, Science Exploration, etc. Data mining integrates approaches and techniques from various disciplines such as machine learning, statistics, artificial intelligence, neural networks, database management, data warehousing, data visualization, spatial data analysis, probability graph theory etc. In short, data mining is a multi-disciplinary field.
Definition of Data Mining Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data mining is the procedure of mining knowledge from data. The information or knowledge extracted so can be used for any of the following applications − Market Analysis Fraud Detection Customer Retention Production Control Science Exploration Data Mining Applications Data mining is highly useful in the following domains − Market Analysis and Management Corporate Analysis & Risk Management Fraud Detection Apart from these, data mining can also be used in the areas of production control, customer retention, science exploration, sports, astrology, and Internet Web Surf-Aid Market Analysis and Management Listed below are the various fields of market where data mining is used − Customer Profiling − Data mining helps determine what kind of people buy what kind of products. Identifying Customer Requirements − Data mining helps in identifying the best products for different customers. It uses prediction to find the factors that may attract new customers. Cross Market Analysis − Data mining performs Association/correlations between product sales. Target Marketing − Data mining helps to find clusters of model customers who share the same characteristics such as interests, spending habits, income, etc. Determining Customer purchasing pattern − Data mining helps in determining customer purchasing pattern. Providing Summary Information − Data mining provides us various multidimensional summary reports.
Corporate Analysis and Risk Management Data mining is used in the following fields of the Corporate Sector − Finance Planning and Asset Evaluation − It involves cash flow analysis and prediction, contingent claim analysis to evaluate assets. Resource Planning − It involves summarizing and comparing the resources and spending. Competition − It involves monitoring competitors and market directions. Fraud Detection Data mining is also used in the fields of credit card services and telecommunication to detect frauds. In fraud telephone calls, it helps to find the destination of the call, duration of the call, time of the day or week, etc. It also analyzes the patterns that deviate from expected norms. Data Preprocessing & form of data preprocessing Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues. Data preprocessing prepares raw data for further processing. Data preprocessing is used database-driven applications such as customer relationship management and rule-based applications (like neural networks). Data goes through a series of steps during preprocessing: Data Cleaning: Data is cleansed through processes such as filling in missing values, smoothing the noisy data, or resolving the inconsistencies in the data. Data Integration: Data with different representations are put together and conflicts within the data are resolved. Data Transformation: Data is normalized, aggregated and generalized. Data Reduction: This step aims to present a reduced representation of the data in a data warehouse. Data Discretization: Involves the reduction of a number of values of a continuous attribute by dividing the range of attribute intervals User interface User interface is the module of data mining system that helps the communication between users and the data mining system. User Interface allows the following functionalities − Interact with the system by specifying a data mining query task. Providing information to help focus the search. Mining based on the intermediate data mining results.