You don’t have to be great to start, but you have to start to be great.
--Your friends at LectureNotes

Note for Data Minining - DM by sonali dash

  • Data Minining - DM
  • Note
  • Uploaded 3 months ago
Sonali Dash
Sonali Dash
0 User(s)
Download PDFOrder Printed Copy

Share it with your friends

Leave your Comments

Text from page-1

International Journal of Innovations in Engineering and Technology (IJIET) Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially practical, interesting and previously unknown patterns from a big volume of data. It plays an important role in result orientation. Data mining can be used in each and every aspect of life. The same is similarly significant in other areas including sales/ marketing, revenue services, sports, health care and insurance etc. The said paper implies general idea of data mining system, functionalities and its applications.   Keywords: Applications, Data Mining Architecture, Data mining Challenges and Functionalities. I. INTRODUCTION Data mining involves the use of sophisticated data analysis tools to discover previously unknown valid patterns and relationships in large data set [1]. Data mining tools predict future trends and behaviors, helps organizations to take proactive knowledge-driven decision [2]. The questions that were traditionally tedious to settle can be settled by data mining tools. Data mining is also known as knowledge discovery in Database (KDD) and is the nontrivial extraction of implicit previously unknown and potentially useful information from data in databases. Though, databases (or KDD) are frequently treated data mining and knowledge discovery as synonyms, data mining is actually part of knowledge discovery process [3,4,5] . The following figure (Figure 1) shows the steps of knowledge discovery process in data mining.   Fig 1:Data Mining is the core of Knowledge Discovery Process[4] II. DATA MINING ARCHITECTURE Data Mining’s architecture is formed of many elements namely Data Mining Engineer / Pattern evaluation / Data Warehouse server/User Interface and Knowledge Base. The said data mining system of Architecture is presented below in figure (Fig 2) 2.1 Knowledge Base: Centralized storage of Knowledge Base are used to collect the information and to evaluate the pattern. 2.2 Data Mining Engine: An essential element of data mining system and consists of functional elements that perform various tasks namely clustering, classification, prediction, association and correlation analysis, characterization.     Volume 5 Issue 2 April 2015 449 ISSN: 2319 – 1058

Text from page-2

International Journal of Innovations in Engineering and Technology (IJIET) 2.3 Pattern Evaluation Module: The element performs interesting measures and communicates with the data mining engine module to find out interesting pattern.   User Interface   Pattern Evaluation Data Mining Engine Knowledge Base   Data Base ----------------------------------------------Data Cleaning, consolidation and selection ----------------------------------------------- Word wide web Data base Data warehouse Fig.2 : Data mining system architecture 1.4 User Interface: User interface module interacts between user and data exploring system. It allows the subscriber to do interaction with the system by explaining his query and simultaneously by identifying information in order to help in search and to carry out exploratory data mining based on the intermediate data mining results. III. DATA MINING SYSTEM CLASSIFICATION: Following criteria is evolved by classifying the said data mining system:1. Visualization 2. Data Base Technology 3. Machine Learning 4. Information Science 5. Other Disciplines     Volume 5 Issue 2 April 2015 450 ISSN: 2319 – 1058

Text from page-3

International Journal of Innovations in Engineering and Technology (IJIET) Visualization  Database  Technology  Statistics Data Mining Machine Learning  Other  Disciplines  Information Science  3.1 Some Other Classification Criteria: Data Mining System can be divided on the basis of other criteria’s that are mentioned below: 3.1.1. Classification of data mining system according to the type of data sources mined: This mode depends upon the type of data used such as text data, multimedia data, World Wide Web, spatial data and time series data etc. 3.1.2. Classification according to kind of data bases mined: This classification is based on the kind of database excavation which is relational database, transactional database, data warehouse, object-oriented database etc. 3.1.3 Classification according to kind of knowledge mined: This division is according to the kind of knowledge discovered in data mining and its functionalities, such as clustering, prediction, Association and correlation analysis, discrimination, outlier analysis, characterization etc. 3.1.4 Classification bases on the type of Techniques used: This categorization is according to the type of techniques utilized such as genetic algorithms, learning of machine, neural networks, oriented database or data ware houses–oriented, Statistics, and visualization etc. IV. DATA MINING FUNCTIONALITIES: The said functionalities are measured to perceive the type of patterns to be found in data mining tasks, Data Mining tasks can be categorized in to two categories. 4.1Descriptive Task: These tasks present the general properties of data stored in database. The descriptive tasks are used to find out patterns in data i.e. cluster, correlation, trends and anomalies etc. 4.2 Predictive Tasks: Predictive data mining tasks predict the value of one attribute on the bases of values of other attributes, which is known as target or dependent variable and the attributes used for making the prediction are known as independent variables.     Volume 5 Issue 2 April 2015 451 ISSN: 2319 – 1058

Text from page-4

International Journal of Innovations in Engineering and Technology (IJIET) Data mining functionalities are described as follows:4.3 Prediction: Predictive model determined the future outcome rather than present behavior. The predictive attribute of a predictive model can be geometric or categorical. It engross the ruling of set of characteristics relevant to the attribute of interest and predicting the value distribution based on the set of data similar to the selected object (S) for example one may predict the kind of disease based on the symptoms of patient. 4.4 Classification: Classification is used to builds models from data with predefined classes as the model is used to classify new instance whose classification is not known. The instances used to create the model are known as training data. A decision tree or set of classification rules is based on such type of mechanism of classification which can be retrieved for identification of future data for example one may classify the employee’s potential salary on the bases of salary classification of similar employees in the company. 4.5 Clustering: Clustering is the process of partitioning a set of object or data in a same group called a cluster. These objects are more similar (in some sense or another) to each other than to those in other groups ( clusters). Clustering is used in many fields, including machine learning, patterns recognition, bioinformatics, image analysis and information retrieval. 4.6 Mining Frequent patterns, Associations and correlations: Frequent patterns can be defined as a pattern (a set of items, subsequence, substructures, etc.) that appears intermittently in data. A intermittent item set is a set of data that occurs frequently together in a transaction data set for example, a set of items, such as table and chair. Subsequence means first of all buying a Computer system, then UPS, and thereafter a printer. This appears frequently in a shopping history data base and is called a frequent sequential pattern. Substructure as particular structural forms such as sub graphs, sub tree. If a substructure appears intermittently, it is named as a frequent structural pattern. Discovering such type of frequent pattern plays an important role in correlation mining association clustering and other data mining tasks. 4.7Outlier Analysis: Outer analysis is an object in database which is significantly different from the existing data. “An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism”[6]. Deviants, Abnormalities, Discordant and Anomalies are also referred as outliers in data mining and statistics literature. The outlier can be diagnosed with the help of statistical tests that assume probability model for the data. V. APPLICATIONS OF DATA MINING: 5.1 Data mining applications in sales/ marketing: Data mining is the process of extracting unknown patterns from database which help in planning, organizing, managing and launching new market in a cost effective way. Data mining plays an important role in Market Basket Analysis. It gives information relevant to item sets that are purchased together, their sequence and when they were bought. This information helps business encouragement and to make it most profitable. 5.2 Data mining applications in banking / finance: There are numerous fields in which data mining can be used like in financial and banking sector for credit analysis, fraudulent transactions, customer segmentation and profitability, optimizing stocks portfolios, predicting payment     Volume 5 Issue 2 April 2015 452 ISSN: 2319 – 1058

Lecture Notes