Every great story on the planet happened when someone decided not to give up, but kept going no matter what.
--Your friends at LectureNotes

Data Mining And Data Warehousing

by Jntu Heroes
Type: NoteInstitute: JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY Downloads: 114Views: 3348Uploaded: 8 months agoAdd to Favourite

Share it with your friends

Suggested Materials

Leave your Comments


Jntu Heroes
Jntu Heroes
SYLLABUS: Module – I Data Mining overview, Data Warehouse and OLAP Technology,Data Warehouse Architecture, Stepsfor the Design and Construction of Data Warehouses, A Three-Tier Data WarehouseArchitecture,OLAP,OLAP queries, metadata repository,Data Preprocessing – Data Integration and Transformation, Data Reduction,Data Mining Primitives:What Defines a Data Mining Task? Task-Relevant Data, The Kind of Knowledge to be Mined,KDD Module – II Mining Association Rules in Large Databases, Association Rule Mining, Market BasketAnalysis: Mining A Road Map, The Apriori Algorithm: Finding Frequent Itemsets Using Candidate Generation,Generating Association Rules from Frequent Itemsets, Improving the Efficiently of Apriori,Mining Frequent Itemsets without Candidate Generation, Multilevel Association Rules, Approaches toMining Multilevel Association Rules, Mining Relational Database and Data Multidimensional Association Rules for Warehouses,Multidimensional Association Rules, Mining Quantitative Association Rules, MiningDistance-Based Association Rules, From Association Mining to Correlation Analysis Module – III What is Classification? What Is Prediction? Issues RegardingClassification and Prediction, Classification by Decision Tree Induction, Bayesian Classification, Bayes Theorem, Naïve Bayesian Classification, Classification by Backpropagation, A Multilayer Feed-Forward Neural Network, Defining aNetwork Topology, Classification Based of Concepts from Association Rule Mining, OtherClassification Methods, k-Nearest Neighbor Classifiers, GeneticAlgorithms, Rough Set Approach, Fuzzy Set Approachs, Prediction, Linear and MultipleRegression, Nonlinear Regression, Other Regression Models, Classifier Accuracy Module – IV What Is Cluster Analysis, Types of Data in Cluster Analysis,A Categorization of Major Clustering Methods, Classical Partitioning Methods: k-Meansand k-Medoids, Partitioning Methods in Large Databases: From k-Medoids to CLARANS, Hierarchical Methods, Agglomerative and Divisive Hierarchical Clustering,Density-BasedMethods, Wave Cluster: Clustering Using Wavelet Transformation, CLIQUE:Clustering High-Dimensional Space, Model-Based Clustering Methods, Statistical Approach,Neural Network Approach. DEPT OF CSE & IT 2
Chapter-1 1.1 What Is Data Mining? Data mining refers to extracting or mining knowledge from large amountsof data. The term is actually a misnomer. Thus, data miningshould have been more appropriately named as knowledge mining which emphasis on mining from large amounts of data. It is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. The key properties of data mining are Automatic discovery of patterns Prediction of likely outcomes Creation of actionable information Focus on large datasets and databases 1.2 The Scope of Data Mining Data mining derives its name from the similarities between searching for valuable business information in a large database — for example, finding linked products in gigabytes of store scanner data — and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find exactly where the value resides. Given databases of sufficient size and quality, data mining technology can generate new business opportunities by providing these capabilities: DEPT OF CSE & IT 3
Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in large databases. Questions that traditionally required extensive handson analysis can now be answered directly from the data — quickly. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Other predictive problems include forecasting bankruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events. Automated discovery of previously unknown patterns. Data mining tools sweep through databases and identify previously hidden patterns in one step. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors. 1.3 Tasks of Data Mining Data mining involves six common classes of tasks: Anomaly detection (Outlier/change/deviation detection) – The identification of unusual data records, that might be interesting or data errors that require further investigation. Association rule learning (Dependency modelling) – Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis. Clustering – is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data. Classification – is the task of generalizing known structure to apply to new data. For example, an e-mail program might attempt to classify an e-mail as "legitimate" or as "spam". Regression – attempts to find a function which models the data with the least error. DEPT OF CSE & IT 4

Lecture Notes