--Your friends at LectureNotes

Note for Data Minining - DM by sonali dash

  • Data Minining - DM
  • Note
  • Uploaded 4 months ago
Sonali Dash
Sonali Dash
0 User(s)
Download PDFOrder Printed Copy

Share it with your friends

Leave your Comments

Text from page-2

International Journal of Innovations in Engineering and Technology (IJIET) 2.3 Pattern Evaluation Module: The element performs interesting measures and communicates with the data mining engine module to find out interesting pattern.   User Interface   Pattern Evaluation Data Mining Engine Knowledge Base   Data Base ----------------------------------------------Data Cleaning, consolidation and selection ----------------------------------------------- Word wide web Data base Data warehouse Fig.2 : Data mining system architecture 1.4 User Interface: User interface module interacts between user and data exploring system. It allows the subscriber to do interaction with the system by explaining his query and simultaneously by identifying information in order to help in search and to carry out exploratory data mining based on the intermediate data mining results. III. DATA MINING SYSTEM CLASSIFICATION: Following criteria is evolved by classifying the said data mining system:1. Visualization 2. Data Base Technology 3. Machine Learning 4. Information Science 5. Other Disciplines     Volume 5 Issue 2 April 2015 450 ISSN: 2319 – 1058

Text from page-3

International Journal of Innovations in Engineering and Technology (IJIET) Visualization  Database  Technology  Statistics Data Mining Machine Learning  Other  Disciplines  Information Science  3.1 Some Other Classification Criteria: Data Mining System can be divided on the basis of other criteria’s that are mentioned below: 3.1.1. Classification of data mining system according to the type of data sources mined: This mode depends upon the type of data used such as text data, multimedia data, World Wide Web, spatial data and time series data etc. 3.1.2. Classification according to kind of data bases mined: This classification is based on the kind of database excavation which is relational database, transactional database, data warehouse, object-oriented database etc. 3.1.3 Classification according to kind of knowledge mined: This division is according to the kind of knowledge discovered in data mining and its functionalities, such as clustering, prediction, Association and correlation analysis, discrimination, outlier analysis, characterization etc. 3.1.4 Classification bases on the type of Techniques used: This categorization is according to the type of techniques utilized such as genetic algorithms, learning of machine, neural networks, oriented database or data ware houses–oriented, Statistics, and visualization etc. IV. DATA MINING FUNCTIONALITIES: The said functionalities are measured to perceive the type of patterns to be found in data mining tasks, Data Mining tasks can be categorized in to two categories. 4.1Descriptive Task: These tasks present the general properties of data stored in database. The descriptive tasks are used to find out patterns in data i.e. cluster, correlation, trends and anomalies etc. 4.2 Predictive Tasks: Predictive data mining tasks predict the value of one attribute on the bases of values of other attributes, which is known as target or dependent variable and the attributes used for making the prediction are known as independent variables.     Volume 5 Issue 2 April 2015 451 ISSN: 2319 – 1058

Text from page-4

International Journal of Innovations in Engineering and Technology (IJIET) Data mining functionalities are described as follows:4.3 Prediction: Predictive model determined the future outcome rather than present behavior. The predictive attribute of a predictive model can be geometric or categorical. It engross the ruling of set of characteristics relevant to the attribute of interest and predicting the value distribution based on the set of data similar to the selected object (S) for example one may predict the kind of disease based on the symptoms of patient. 4.4 Classification: Classification is used to builds models from data with predefined classes as the model is used to classify new instance whose classification is not known. The instances used to create the model are known as training data. A decision tree or set of classification rules is based on such type of mechanism of classification which can be retrieved for identification of future data for example one may classify the employee’s potential salary on the bases of salary classification of similar employees in the company. 4.5 Clustering: Clustering is the process of partitioning a set of object or data in a same group called a cluster. These objects are more similar (in some sense or another) to each other than to those in other groups ( clusters). Clustering is used in many fields, including machine learning, patterns recognition, bioinformatics, image analysis and information retrieval. 4.6 Mining Frequent patterns, Associations and correlations: Frequent patterns can be defined as a pattern (a set of items, subsequence, substructures, etc.) that appears intermittently in data. A intermittent item set is a set of data that occurs frequently together in a transaction data set for example, a set of items, such as table and chair. Subsequence means first of all buying a Computer system, then UPS, and thereafter a printer. This appears frequently in a shopping history data base and is called a frequent sequential pattern. Substructure as particular structural forms such as sub graphs, sub tree. If a substructure appears intermittently, it is named as a frequent structural pattern. Discovering such type of frequent pattern plays an important role in correlation mining association clustering and other data mining tasks. 4.7Outlier Analysis: Outer analysis is an object in database which is significantly different from the existing data. “An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism”[6]. Deviants, Abnormalities, Discordant and Anomalies are also referred as outliers in data mining and statistics literature. The outlier can be diagnosed with the help of statistical tests that assume probability model for the data. V. APPLICATIONS OF DATA MINING: 5.1 Data mining applications in sales/ marketing: Data mining is the process of extracting unknown patterns from database which help in planning, organizing, managing and launching new market in a cost effective way. Data mining plays an important role in Market Basket Analysis. It gives information relevant to item sets that are purchased together, their sequence and when they were bought. This information helps business encouragement and to make it most profitable. 5.2 Data mining applications in banking / finance: There are numerous fields in which data mining can be used like in financial and banking sector for credit analysis, fraudulent transactions, customer segmentation and profitability, optimizing stocks portfolios, predicting payment     Volume 5 Issue 2 April 2015 452 ISSN: 2319 – 1058

Text from page-5

International Journal of Innovations in Engineering and Technology (IJIET) default, ranking investments, marketing, high risk loan applicants, cash management and forecasting operations and most profitable credit card customers and cross selling. 5.3 Data mining applications in Health Care and Insurance: Insurance industry growth is completely depends on the ability of transforming data into information regarding customers, competitors and its market. The insurance industries have implemented the Data Mining successfully and have achieved tremendous competitive advantages. The data mining applications in insurance industry can be used in the form that, data mining is applied in claims analysis such as identifying the medical procedures which are claimed together. Data mining enables to forecasts the potential customers who will buy new schemes. This data mining also proactive insurance companies to detect risky customer’s behavior patterns. Data mining also helps in detecting fraudulent behavior. 5.4 Data Mining for the Retail Industry: Retail industry assemble huge amount of data related to sales and customer history of shopping. Retail data mining helps in analyzing client behavior, client patterns of shopping and trends which increases the quality of client service, enhance things consumption ratios, design more effective goods transportations and distribution policies achieve better customer retention and satisfaction and to minimize the cost of business. 5.5 Data mining for the Telecommunications industry: Telecommunication industries generally generate and store large amount of high quality data, having a very huge customer base, and operate in rapidly changing and highly competitive environment. Telecommunication companies use data mining to enhance their marketing efforts to detect fraud and to betterment of their telecommunication networks. 5.6 Data Mining Application in Higher Education: Data mining can be effectively used to address students and alumni challenges. Data mining facilitate organizations to use their current reporting capabilities to uncover and understand hidden patterns in huge databases. These patterns are then built into data mining models and used to predict individual behavior accurately. As a result of their insight, institutions are able to allocate resources and staff effeciently.This data mining can provide an entity the information necessary to take action before a student drops out, or to efficiently allocate resource with an accurate estimate of how many students will take a particular course. 5.7 Data mining for instruction Detection: Instructions are the set of actions that threatens the availability and integrity of a network resource. Network instruction detection has been considered to be one of the most promising method for defending complex and dynamic intrusion behaviors. Intrusion detection techniques using data mining have attracted more and more interests in recent years. Data mining techniques used for intrusion detection are frequent modalities for mining, classification, clustering and mining data streams etc. Fields where data mining technology can be applied for instruction detection are development of data mining algorithms for instruction detection, aggregation to help select and build discriminating attributes, Association and Correlation analysis, Analysis of stream data, Visualization, Distributed data mining and Querying tools. VI. CHALLENGES IN DATA MINING: In current situation of affairs data mining research is “too”ad-hoc” and their are so many challenges to unify different data mining tasks. Some of the challenges in the area are as under: 6.1. Scalability:     Volume 5 Issue 2 April 2015 453 ISSN: 2319 – 1058

Lecture Notes