Push yourself because, no one else is going to do it for you.
--Your friends at LectureNotes

Note for Data Minining - DM By Bakka Jyothsna

  • Data Minining - DM
  • Note
  • 9 Topics
  • 1 Offline Downloads
  • Uploaded 9 months ago
0 User(s)
Download PDFOrder Printed Copy

Share it with your friends

Leave your Comments

Text from page-2

Market Analysis and Management Listed below are the various fields of market where data mining is used − • • • • • • Customer Profiling − Data mining helps determine what kind of people buy what kind of products. Identifying Customer Requirements − Data mining helps in identifying the best products for different customers. It uses prediction to find the factors that may attract new customers. Cross Market Analysis − Data mining performs association/correlations between product sales. Target Marketing − Data mining helps to find clusters of model customers who share the same characteristics such as interests, spending habits, income, etc. Determining Customer purchasing pattern − Data mining helps in determining customer purchasing pattern. Providing Summary Information − Data mining provides us various multidimensional summary reports. Corporate Analysis and Risk Management Data mining is used in the following fields of the Corporate Sector − • • • Finance Planning and Asset Evaluation − It involves cash flow analysis and prediction, contingent claim analysis to evaluate assets. Resource Planning − It involves summarizing and comparing the resources and spending. Competition − It involves monitoring competitors and market directions. Fraud Detection Data mining is also used in the fields of credit card services and telecommunication to detect frauds. In fraud telephone calls, it helps to find the destination of the call, duration of the call, time of the day or week, etc. It also analyzes the patterns that deviate from expected norms. Knowledge discovery in databases (KDD) Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. This widely used data mining technique is a process that includes data preparation and selection, data cleansing, incorporating prior knowledge on data sets and interpreting accurate solutions from the observed results. Here is the list of steps involved in the knowledge discovery process −

Text from page-3

• • • • • • • Data Cleaning − In this step, the noise and inconsistent data is removed. Data Integration − In this step, multiple data sources are combined. Data Selection − In this step, data relevant to the analysis task are retrieved from the database. Data Transformation − In this step, data is transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations. Data Mining − In this step, intelligent methods are applied in order to extract data patterns. Pattern Evaluation − In this step, data patterns are evaluated. Knowledge Presentation − In this step, knowledge is represented. The following diagram shows the process of knowledge discovery Fig 1: Data Mining as a process of knowledge discovery

Text from page-4

Architecture of data mining system Fig 2: Architecture of data mining system

Text from page-5

Data Mining Functionalities - What can be discovered? The kinds of patterns that can be discovered depend upon the data mining tasks employed. By and large, there are two types of data mining tasks: descriptive data mining tasks that describe the general properties of the existing data, and predictive data mining tasks that attempt to do predictions based on inference on available data. The data mining functionalities and the variety of knowledge they discover are briefly presented in the following list: • • • • Characterization: Data characterization is a summarization of general features of objects in a target class, and produces what is called characteristic rules. The data relevant to a user-specified class are normally retrieved by a database query and run through a summarization module to extract the essence of the data at different levels of abstractions. For example, one may want to characterize the OurVideoStore customers who regularly rent more than 30 movies a year. With concept hierarchies on the attributes describing the target class, the attribute-oriented induction method can be used, for example, to carry out data summarization. Note that with a data cube containing summarization of data, simple OLAP operations fit the purpose of data characterization. Discrimination: Data discrimination produces what are called discriminant rules and is basically the comparison of the general features of objects between two classes referred to as the target class and the contrasting class. For example, one may want to compare the general characteristics of the customers who rented more than 30 movies in the last year with those whose rental account is lower than 5. The techniques used for data discrimination are very similar to the techniques used for data characterization with the exception that data discrimination results include comparative measures. Association analysis: Association analysis is the discovery of what are commonly called association rules. It studies the frequency of items occurring together in transactional databases, and based on a threshold called support, identifies the frequent item sets. Another threshold, confidence, which is the conditional probability than an item appears in a transaction when another item appears, is used to pinpoint association rules. Association analysis is commonly used for market basket analysis. For example, it could be useful for the OurVideoStore manager to know what movies are often rented together or if there is a relationship between renting a certain type of movies and buying popcorn or pop. The discovered association rules are of the form: P -> Q [s,c], where P and Q are conjunctions of attribute value-pairs, and s (for support) is the probability that P and Q appear together in a transaction and c (for confidence) is the conditional probability that Q appears in a transaction when P is present. For example, the hypothetic association rule: RentType(X, "game") AND Age(X, "13-19") -> Buys(X, "pop") [s=2% ,c=55%] would indicate that 2% of the transactions considered are of customers aged between 13 and 19 who are renting a game and buying a pop, and that there is a certainty of 55% that teenage customers who rent a game also buy pop. Classification: Classification analysis is the organization of data in given classes. Also known as supervised classification, the classification uses given class labels to order the objects in the data collection. Classification approaches normally use a training set where

Lecture Notes