Nothing in the world is more common than unsuccessful people with talent.
--Your friends at LectureNotes

Note for Data Mining And Data Warehousing - DMDW By Anurag Kumar

  • Data Mining And Data Warehousing - DMDW
  • Note
  • Computer Science Engineering
  • Uploaded 9 months ago
0 User(s)
Download PDFOrder Printed Copy

Share it with your friends

Leave your Comments

Text from page-1

Anurag kumar, Asst. Prof., CSE Dept, Dr. APJ Abdul Kalam UIT Jhabua (M.P.) Subject : Data Mining Unit III Lecture I What is Data Mining? Data mining is looking for hidden, valid, and potentially useful patterns in huge data sets. Data Mining is all about discovering unsuspected/ previously unknown relationships amongst the data. It is a multi-disciplinary skill that uses machine learning, statistics, AI and database technology. The insights derived via Data Mining can be used for marketing, fraud detection, and scientific discovery, etc. Data mining is also called as Knowledge discovery, Knowledge extraction, data/pattern analysis, information harvesting, etc. Market Analysis and Management Listed below are the various fields of market where data mining is used − • Customer Profiling − Data mining helps determine what kind of people buy what kind of products. • Identifying Customer Requirements − Data mining helps in identifying the best products for different customers. It uses prediction to find the factors that may attract new customers. • Cross Market Analysis − Data mining performs Association/correlations between product sales. • Target Marketing − Data mining helps to find clusters of model customers who share the same characteristics such as interests, spending habits, income, etc. • Determining Customer purchasing pattern − Data mining helps in determining customer purchasing pattern. • Providing Summary Information − Data mining provides us various multidimensional summary reports. Corporate Analysis and Risk Management Data mining is used in the following fields of the Corporate Sector − • Finance Planning and Asset Evaluation − It involves cash flow analysis and prediction, contingent claim analysis to evaluate assets. 1

Text from page-2

Anurag kumar, Asst. Prof., CSE Dept, Dr. APJ Abdul Kalam UIT Jhabua (M.P.) • Resource Planning − It involves summarizing and comparing the resources and spending. • Competition − It involves monitoring competitors and market directions. Fraud Detection Data mining is also used in the fields of credit card services and telecommunication to detect frauds. In fraud telephone calls, it helps to find the destination of the call, duration of the call, time of the day or week, etc. It also analyzes the patterns that deviate from expected norms. Data Mining Functionalities Data mining functionalities are used to specify the kind of patterns to be found in data mining tasks.Data mining tasks can be classified into two categories: descriptive and predictive. • • Descriptive mining tasks characterize the general properties of the data in the database. Predictive mining tasks perform inference on the current data in order to make predictions. Concept/Class Description: Characterization and Discrimination Data can be associated with classes or concepts. For example, in the Electronics store, classes of items for sale include computers and printers, and concepts of customers include bigSpenders and budgetSpenders. Data characterization Data characterization is a summarization of the general characteristics or features of a target class of data. Data discrimination Data discrimination is a comparison of the general features of target class data objects with the general features of objects from one or a set of contrasting classes. Mining Frequent Patterns, Associations, and Correlations Frequent patterns, as the name suggests, are patterns that occur frequently in data. There are many kinds of frequent patterns, including itemsets, subsequences, and substructures. • Frequent Item Set − It refers to a set of items that frequently appear together, for example, milk and bread. • Frequent Subsequence − A sequence of patterns that occur frequently such as purchasing a camera is followed by memory card. • Frequent Sub Structure − Substructure refers to different structural forms, such as graphs, trees, or lattices, which may be combined with item-sets or subsequences. Association analysis 2

Text from page-3

Anurag kumar, Asst. Prof., CSE Dept, Dr. APJ Abdul Kalam UIT Jhabua (M.P.) Suppose, as a marketing manager, you would like to determine which items are frequently purchased together within the same transactions. buys(X,“computer”)=buys(X,“software”) [support=1%,confidence=50%] where X is a variable representing a customer.Confidence=50% means that if a customer buys a computer, there is a 50% chance that she will buy software as well. Support=1% means that 1% of all of the transactions under analysis showed that computer and software were purchased together. Classification and Prediction Classification is the process of finding a model that describes and distinguishes data classes for the purpose of being able to use the model to predict the class of objects whose class label is unknown. “How is the derived model presented?” The derived model may be represented in various forms, such as classification (IF-THEN) rules, decision trees, mathematical formulae, or neural networks. A decision tree is a flow-chart-like tree structure, where each node denotes a test on an attribute value, each branch represents an outcome of the test, and tree leaves represent classes or class distributions. A neural network, when used for classification, is typically a collection of neuron-like processing units with weighted connections between the units. 3

Text from page-4

Anurag kumar, Asst. Prof., CSE Dept, Dr. APJ Abdul Kalam UIT Jhabua (M.P.) Neural Network Cluster Analysis In classification and prediction analyze class-labeled data objects, where as clustering analyzes data objects without consulting a known class label. There are many other methods for constructing classification models, such as Bayesian classification, support vector machines, and k-nearest neighbor classification. Whereas classification predicts categorical (discrete, unordered) labels, prediction models Continuous-valued functions. That is, it is used to predict missing or unavailable numerical data values rather than class labels. Although the term prediction may refer to both numeric prediction and class label prediction, Cluster Analysis The objects are grouped based on the principle of maximizing the intraclass similarity and minimizing the interclass similarity. That is, clusters of objects are formed so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. Outlier Analysis A database may contain data objects that do not comply with the general behavior or model of the data. These data objects are outliers. Most data mining methods discard outliers as noise or exceptions. The analysis of outlier data is referred to as outlier mining. 4

Lecture Notes