UNIT-I INTRODUCTION DATA MINING:Data Mining is defined as extracting or mining, knowledge or information from huge amount of data. In other words, we can say that data mining is the procedure of mining Knowledge from data. The information or knowledge extracted so can be used for any of the following applications: 1. Market Analysis 2. Fraud Detection 3. Customer Retention 4.Production Control The key properties of data mining are: 1. Automatic Discovery patterns 2. Focus on large DB 3. Focus on large data sets 2. KINDS OF PATTERN CAN BE MINED: Data mining deals with the kind of patterns thatcan bemined. On the basis of thekind of data to be mined,there are twocategories of functions involves those are 1. Descriptive 2. Classification and Prediction 2.1DescriptiveFunction The descriptive function deals with thegeneral properties of data in the Database. Here is the list of descriptive functions: 1. Class/Concept Description 2. Mining of Frequent Patterns 3. Mining of Associations4. Mining of Correlations 5. Mining of clusters 2.1.1Class/Concept:refers to the data to be associated with the classes or concepts. For example, in a company, the classes of items for sales include computer and printer Concepts of customers include big spenders and budget spenders. These descriptions can be derived by the following two ways: • DataCharacterization: This refers to summarizing data of a class under Study. This class under study is called as the Target Class. • Data Discrimination:-It refers to the mapping or classification of a class with some predefined group or class.
2.1.2Mining of Frequent Patterns: Frequent patterns are those patterns that occur frequently in transactional data. Here is the list of kind of frequent patterns: ➢ Frequent Item Set: It refers to a set of items that frequently appear together, forexample, milk and bread. ➢ Frequent Subsequence: A sequence of patterns that occur frequently Such as purchasing a camera is followed by memory card. ➢ Frequent Sub Structure: Substructure refers to different structural Forms, such as graphs, trees, or lattices, which may be combined WithItem sets or subsequences. 2.1.3Mining of Association: Associations are used in retail sales to identify patterns that are frequently purchased together. This process refers to the process of uncovering the relationship among data and determining association rules. For example: a retailer generates an association rule that shows that 70% of time milk is sold with bread and only 30% of times biscuits are sold with bread. 2.1.4Mining of Correlations It is a kind of additional analysis performed to uncover interesting statistical correlations between associated attribute value pairs or between two itemssets to analyze that if they have positive, negative or no effect on each other. 2.1.5Mining of Clusters Cluster refers to a group of similar kind of objects. Cluster analysis refers to forming group of objects that are very similar to each other but are highly different from the objects in other clusters. 2.2 Classification and Prediction Classification is the process of finding a model that describes the data classes or concepts. The purpose is to be able to use this model to predict the class of objects whose class label is unknown. This derived model is based on the analysis of sets of training data. The derived model can be presented in the following forms: 2.2.1Classification-It predicts the class of objects whose class label is unknown.Its objective is to find a derived model that describes and Distinguishes data classes or concepts. The Derived model is based on The Data mining analysis is set of training data i.e.the data object who see class label is well known. 2.2.2Prediction-It is used to predict missing or unavailable numerical data values rather than class labels.Regression Analysis is generally used for prediction. Prediction can also be used for identification of distribution trends based on available data.
3. Data Mining Task Primitives We can specify a data mining task in the form of a data mining query.This query is input to the system.A data mining query is defined in terms of data mining task primitives.These primitives allow us to communicate in an interactive manner with the data mining system. Here is the list of Data Mining Task Primitives: 1. Set of task relevant data to be mined 2. Kind of knowledge to be mined 3. Background knowledge to be used in discovery process 4. Interestingness measures and thresholds for pattern evaluation 5. Representation for visualizing the discovered patterns. 3.1. Set of task relevant data to be mined This is the portion of database in which the user is interested. This portion Includes the following: A. Database Attributes B. Data Warehouse dimensions of interest 3.2. Kind of knowledge to be mined It refers to the kind of functions to be performed. These functions are: a.Characterization b. Discrimination c. Association and Correlation Analysis Data Mining d.Classification e.Prediction f.Clustering g.Outlier Analysis h.Evolution Analysis 3.3. Background knowledge:The background knowledge allows data to be mined at multiple levels of abstraction. For example, the Concept hierarchies are one of the background knowledge that allows data to be mined at multiple levels of abstraction. 3.4. Interestingness measures and thresholds for pattern evaluation:This is used to evaluate the patterns that are discovered by the process ofKnowledge discovery. There are different interesting measures for different kind of knowledge. 3.5. Representation for visualizing the discovered patterns:This refers to the form in which discovered patterns are to be displayed. These representations may include the following: •Rules •Tables •Charts •Graphs •Decision Trees
4. Knowledge Discovery Databases: Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. This widely used data mining technique is a process that includes data preparation and selection, data cleansing, incorporating prior knowledge on data sets and interpreting accurate solutions from the observed results. Data mining simply an essential step in the process of knowledge discovery or KDD .KDD As a process of knowledge discovery or KDD.KDD as a process is depicted in the fig 1.1 Here is some list of steps involved in the knowledge discovery process: •Data Cleaning-In this step, the noise and inconsistent data is removed. •Data Integration-In this step, multiple data sources are combined. •Data Selection-In this step, data relevant to the analysis task are retrieved from the database. •Data Transformation-In this step data istransformed or consolidated into forms appropriate for mining by performing summary or aggregation operations. •Data Mining-In this step, intelligent methods are applied in order to extract data patterns. •Pattern Evaluation-In this step, data patterns are evaluated. •Knowledge Presentation-In this step, knowledge is represented. The following diagram shows the process of knowledge discovery Fig 1.1