When you want to succeed as bad as you want to breathe, then you’ll be successful.
--Your friends at LectureNotes

Note for Data Mining And Data Warehousing - DMDW By Anurag Kumar

  • Data Mining And Data Warehousing - DMDW
  • Note
  • Computer Science Engineering
  • Uploaded 9 months ago
0 User(s)
Download PDFOrder Printed Copy

Share it with your friends

Leave your Comments

Text from page-3

Anurag kumar, Asst. Prof., CSE Dept, Dr. APJ Abdul Kalam UIT Jhabua (M.P.) Suppose, as a marketing manager, you would like to determine which items are frequently purchased together within the same transactions. buys(X,“computer”)=buys(X,“software”) [support=1%,confidence=50%] where X is a variable representing a customer.Confidence=50% means that if a customer buys a computer, there is a 50% chance that she will buy software as well. Support=1% means that 1% of all of the transactions under analysis showed that computer and software were purchased together. Classification and Prediction Classification is the process of finding a model that describes and distinguishes data classes for the purpose of being able to use the model to predict the class of objects whose class label is unknown. “How is the derived model presented?” The derived model may be represented in various forms, such as classification (IF-THEN) rules, decision trees, mathematical formulae, or neural networks. A decision tree is a flow-chart-like tree structure, where each node denotes a test on an attribute value, each branch represents an outcome of the test, and tree leaves represent classes or class distributions. A neural network, when used for classification, is typically a collection of neuron-like processing units with weighted connections between the units. 3

Text from page-4

Anurag kumar, Asst. Prof., CSE Dept, Dr. APJ Abdul Kalam UIT Jhabua (M.P.) Neural Network Cluster Analysis In classification and prediction analyze class-labeled data objects, where as clustering analyzes data objects without consulting a known class label. There are many other methods for constructing classification models, such as Bayesian classification, support vector machines, and k-nearest neighbor classification. Whereas classification predicts categorical (discrete, unordered) labels, prediction models Continuous-valued functions. That is, it is used to predict missing or unavailable numerical data values rather than class labels. Although the term prediction may refer to both numeric prediction and class label prediction, Cluster Analysis The objects are grouped based on the principle of maximizing the intraclass similarity and minimizing the interclass similarity. That is, clusters of objects are formed so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. Outlier Analysis A database may contain data objects that do not comply with the general behavior or model of the data. These data objects are outliers. Most data mining methods discard outliers as noise or exceptions. The analysis of outlier data is referred to as outlier mining. 4

Text from page-5

Anurag kumar, Asst. Prof., CSE Dept, Dr. APJ Abdul Kalam UIT Jhabua (M.P.) Lecture II Data Mining System categorization and its Issues Data Mining - Systems There is a large variety of data mining systems available. Data mining systems may integrate techniques from the following − • Spatial Data Analysis • Information Retrieval • Pattern Recognition • Image Analysis • Signal Processing • Computer Graphics • Web Technology • Business • Bioinformatics Data Mining System Classification A data mining system can be classified according to the following criteria − • Database Technology • Statistics • Machine Learning • Information Science • Visualization • Other Disciplines Apart from these, a data mining system can also be classified based on the kind of (a) databases mined, (b) knowledge mined, (c) techniques utilized, and (d) applications adapted. 5

Text from page-6

Anurag kumar, Asst. Prof., CSE Dept, Dr. APJ Abdul Kalam UIT Jhabua (M.P.) Classification Based on the Databases Mined We can classify a data mining system according to the kind of databases mined. Database system can be classified according to different criteria such as data models, types of data, etc. And the data mining system can be classified accordingly. For example, if we classify a database according to the data model, then we may have a relational, transactional, object-relational, or data warehouse mining system. Classification Based on the kind of Knowledge Mined We can classify a data mining system according to the kind of knowledge mined. It means the data mining system is classified on the basis of functionalities such as − • • • • • • • Characterization Discrimination Association and Correlation Analysis Classification Prediction Outlier Analysis Evolution Analysis Classification Based on the Techniques Utilized We can classify a data mining system according to the kind of techniques used. We can describe these techniques according to the degree of user interaction involved or the methods of analysis employed. Classification Based on the Applications Adapted We can classify a data mining system according to the applications adapted. These applications are as follows − • • • • • Finance Telecommunications DNA Stock Markets E-mail Integrating a Data Mining System with a DB/DW System If a data mining system is not integrated with a database or a data warehouse system, then there will be no system to communicate with. This scheme is known as the non-coupling scheme. In this scheme, the main focus is on data mining design and on developing efficient and effective algorithms for mining the available data sets. The list of Integration Schemes is as follows − • No Coupling − In this scheme, the data mining system does not utilize any of the database or data warehouse functions. It fetches the data from a particular source and processes that data using some data mining algorithms. The data mining result is stored in another file. 6

Lecture Notes