×
Don't wait for the right time for studying... It won't come.
--Your friends at LectureNotes
Close

Note for Data Mining And Data Warehousing - DMDW By Ramesh Chiluvuri001

  • Data Mining And Data Warehousing - DMDW
  • Note
  • 6 Topics
  • 3049 Views
  • 38 Offline Downloads
  • Uploaded 10 months ago
0 User(s)
Download PDFOrder Printed Copy

Share it with your friends

Leave your Comments

Text from page-2

2.1.2Mining of Frequent Patterns: Frequent patterns are those patterns that occur frequently in transactional data. Here is the list of kind of frequent patterns: ➢ Frequent Item Set: It refers to a set of items that frequently appear together, forexample, milk and bread. ➢ Frequent Subsequence: A sequence of patterns that occur frequently Such as purchasing a camera is followed by memory card. ➢ Frequent Sub Structure: Substructure refers to different structural Forms, such as graphs, trees, or lattices, which may be combined WithItem sets or subsequences. 2.1.3Mining of Association: Associations are used in retail sales to identify patterns that are frequently purchased together. This process refers to the process of uncovering the relationship among data and determining association rules. For example: a retailer generates an association rule that shows that 70% of time milk is sold with bread and only 30% of times biscuits are sold with bread. 2.1.4Mining of Correlations It is a kind of additional analysis performed to uncover interesting statistical correlations between associated attribute value pairs or between two itemssets to analyze that if they have positive, negative or no effect on each other. 2.1.5Mining of Clusters Cluster refers to a group of similar kind of objects. Cluster analysis refers to forming group of objects that are very similar to each other but are highly different from the objects in other clusters. 2.2 Classification and Prediction Classification is the process of finding a model that describes the data classes or concepts. The purpose is to be able to use this model to predict the class of objects whose class label is unknown. This derived model is based on the analysis of sets of training data. The derived model can be presented in the following forms: 2.2.1Classification-It predicts the class of objects whose class label is unknown.Its objective is to find a derived model that describes and Distinguishes data classes or concepts. The Derived model is based on The Data mining analysis is set of training data i.e.the data object who see class label is well known. 2.2.2Prediction-It is used to predict missing or unavailable numerical data values rather than class labels.Regression Analysis is generally used for prediction. Prediction can also be used for identification of distribution trends based on available data.

Text from page-3

3. Data Mining Task Primitives We can specify a data mining task in the form of a data mining query.This query is input to the system.A data mining query is defined in terms of data mining task primitives.These primitives allow us to communicate in an interactive manner with the data mining system. Here is the list of Data Mining Task Primitives: 1. Set of task relevant data to be mined 2. Kind of knowledge to be mined 3. Background knowledge to be used in discovery process 4. Interestingness measures and thresholds for pattern evaluation 5. Representation for visualizing the discovered patterns. 3.1. Set of task relevant data to be mined This is the portion of database in which the user is interested. This portion Includes the following: A. Database Attributes B. Data Warehouse dimensions of interest 3.2. Kind of knowledge to be mined It refers to the kind of functions to be performed. These functions are: a.Characterization b. Discrimination c. Association and Correlation Analysis Data Mining d.Classification e.Prediction f.Clustering g.Outlier Analysis h.Evolution Analysis 3.3. Background knowledge:The background knowledge allows data to be mined at multiple levels of abstraction. For example, the Concept hierarchies are one of the background knowledge that allows data to be mined at multiple levels of abstraction. 3.4. Interestingness measures and thresholds for pattern evaluation:This is used to evaluate the patterns that are discovered by the process ofKnowledge discovery. There are different interesting measures for different kind of knowledge. 3.5. Representation for visualizing the discovered patterns:This refers to the form in which discovered patterns are to be displayed. These representations may include the following: •Rules •Tables •Charts •Graphs •Decision Trees

Text from page-4

4. Knowledge Discovery Databases: Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. This widely used data mining technique is a process that includes data preparation and selection, data cleansing, incorporating prior knowledge on data sets and interpreting accurate solutions from the observed results. Data mining simply an essential step in the process of knowledge discovery or KDD .KDD As a process of knowledge discovery or KDD.KDD as a process is depicted in the fig 1.1 Here is some list of steps involved in the knowledge discovery process: •Data Cleaning-In this step, the noise and inconsistent data is removed. •Data Integration-In this step, multiple data sources are combined. •Data Selection-In this step, data relevant to the analysis task are retrieved from the database. •Data Transformation-In this step data istransformed or consolidated into forms appropriate for mining by performing summary or aggregation operations. •Data Mining-In this step, intelligent methods are applied in order to extract data patterns. •Pattern Evaluation-In this step, data patterns are evaluated. •Knowledge Presentation-In this step, knowledge is represented. The following diagram shows the process of knowledge discovery Fig 1.1

Text from page-5

Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. This widely used data mining technique is a process that includes data preparation and selection, data cleansing, incorporating prior knowledge on data sets and interpreting accurate solutions from the observed results. Major KDD application areas include marketing, fraud detection, telecommunication and manufacturing.Data mining and knowledge discovery was performed manually. As time passed, the amount of data in many systems grew to larger than terabyte size, and could no longer be maintained manually. Moreover, for the successful existence of any business, discovering underlying patterns in data is considered essential. As a result, several software tools were developed to discover hidden data and make assumptions, which formed a part of artificial intelligence. The KDD process has reached its peak in the last 10 years. It now houses many different approaches to discovery, which includes inductive learning, Bayesian statistics, semantic query optimization, knowledge acquisition for expert systems and information theory. The ultimate goal is to extract high-level knowledge from low-level data. KDD includes multidisciplinary activities. This encompasses data storage and access, scaling algorithms to massive data sets and interpreting results. The data cleansing and data access process included in data warehousing facilitate the KDD process. Artificial intelligence also supports KDD by discovering empirical laws from experimentation and observations. The patterns recognized in the data must be valid on new data, and possess some degree of certainty. These patterns are considered new knowledge. Steps involved in the entire KDD process are: 5. Data Mining SystemClassification: Data mining is an advanced and grouping fields of research and development that includes intersection of widely varied domains such as data base system. Statistics, visualization, machine learning etc. Therefore it is considered as an inter disciplinary filed. A data mining system can be classified according to the following criteria: Apart from these, a data mining system can also be classified based on the kind of (a) Databases mined (b) Knowledge mined (c) Techniques utilized (d) Applications adapted. • Classification Based on the Databases Mined We can classify a data mining system according to the kind of databases mined. Database system can be classified according to different criteria such as data models, types of data, etc. The data mining system can be classified accordingly. For example, if we classify a database according to the data model, then we may have a relational, transactional, object - relational, or data warehouse mining system.

Lecture Notes