According to this view, data mining is only one step in the knowledge discovery
process. However, in industry, in media, and in the database research milieu, the term
data mining is becoming more popular than the longer term of knowledge discovery
from data. Therefore, in this book, we choose to use the term data mining.
Based on this view, the architecture of a typical data mining system may have the
following major components.
Database, Data Warehouse, World Wide Web, or Other Information
Repository: This is one or a set of databases, data warehouses, spreadsheets, or
other kinds of information repositories. Data cleaning and data integration
techniques may be performed on the data.
Database or Data Warehouse Server: The database or data warehouse server is
responsible for fetching the relevant data, based on the user’s data mining
Knowledge Base: This is the domain knowledge that is used to guide the search
or evaluate the interestingness of resulting patterns. It is simply stored in the
form of set of rules. Such knowledge can include concept hierarchies, used to
organize attributes or attribute values into different levels of abstraction.
Data Mining Engine: This is essential to the data mining system and ideally
consists of a set of functional modules for tasks such as characterization,
association and correlation analysis, classification, prediction, cluster analysis,
outlier analysis, and evolution analysis.
Pattern Evaluation Module: This component typically employs interestingness
measures and interacts with the data mining modules so as to focus the search
toward interesting patterns. It may use interestingness thresholds to filter out
User interface: This module communicates between users and the data mining
system, allowing the user to interact with the system by specifying a data mining
query or task. In addition, this component allows the user to browse database
and data warehouse schemas or data structures, evaluate mined patterns, and
visualize the patterns in different forms.