Subject: Data and Web Mining Subject Code: PECS5409 Semester: 8th Applicable Branch (es):CSE Question Code:J237 Year of Examination:2015 Exam Type: (Special/Regular):Regular Solution Prepared by Sl.No. Faculty Name Email ID 1 2 NATIONAL INSTITUTE OF SCIENCE &TECHNOLOGY PALUR HILLS, BERHAMPUR, ORISSA – 761008, INDIA
1. a. The formal use of concept hierarchies as the most important background knowledge in data mining. The incorporation of concept hierarchy into the attribute-oriented induction (AOI) leads AOI to be one of the most successful techniques in data mining. Concept hierarchies have been used in various algorithms such as characteristic rule mining, multiple-level association mining, classification and prediction. The complexity of a concept hierarchy is defined in terms of the number of its interior nodes, and the depth and height of each of these interior node. This complexity is then used to measure the interestingness of the discovered knowledge rules. b. c. Predictive: Predict the value of a specific attribute (target/dependent variable) based on the value of other attributes (explanatory). Example: Judge if a patient has specific disease based on his/her medical tests results. Descriptive: To derive patterns (correlation, trends, trajectories) that summarizes the underlying relationship between data. Example: Identifying web pages that are accessed together. (Human interpretable pattern) d. Partitional methods: Given a database of n objects or data tuples, a partitioningmethod constructs k partitions of the data, where each partition represents a clusterand k _ n. That is, it classifies the data into k groups, which together satisfy thefollowing requirements: (1) each groupmust contain at least one object, and (2) eachobject must belong to exactly one group. Hierarchical methods: A hierarchical method creates a hierarchical decomposition ofthe given set of data objects. A hierarchical method can be classified as being eitheragglomerative or divisive, based on howthe hierarchical decomposition is formed. Theagglomerative approach, also called the bottom-up approach, starts with each objectforming a separate group. It successively merges the objects or groups that are closeto one another, until all of the groups are merged into one (the topmost level of thehierarchy), or until a termination condition holds. The divisive approach, also calledthe top-down approach, starts with all of the objects in the same cluster. In each successiveiteration, a cluster is split up into smaller clusters, until eventually each objectis in one cluster, or until a termination condition holds.
Density-based methods: The general idea is to continuegrowing the given cluster as long as the density (number of objects or datapoints) in the “neighborhood” exceeds some threshold. Grid-based methods: Grid-based methods quantize the object space into a finite numberof cells that form a grid structure. All of the clustering operations are performedon the grid structure e. Decision tree induction is the learning of decision trees fromclass-labeled training tuples.Adecision tree is a flowchart-like tree structure,where each internal node (nonleaf node)denotes a test on an attribute, each branch represents an outcomeof the test, and each leafnode (or terminal node) holds a class label. The topmost node in a tree is the root node.Internal nodes are denoted by rectangles, and leaf nodes are denoted by ovals. Somedecision tree algorithms produce only binary trees (where each internal node branchesto exactly two other nodes), whereas others can produce nonbinary trees. “How are decision trees used for classification?” Given a tuple, X, for which the associatedclass label is unknown, the attribute values of the tuple are tested against the decisiontree. A path is traced from the root to a leaf node, which holds the class prediction forthat tuple. Decision trees can easily be converted to classification rules. f. 1. Predicting the outcome of tossing a pair of dice is not a data mining task 2. Predicting the future stock price of a company using historical records is data mining task g. Web content mining targets the knowledge discovery, in which the main objects are the traditional collections of multimedia documents such as images, video, and audio, which are embedded in or linked to the web pages. It is also quite different from Data mining because Web data are mainly semi-structured and/or unstructured, while Data mining deals primarily with structured data. Web content mining is also different from Text mining because of the semi-structure nature of the Web, while Text mining focuses on unstructured texts. Web content mining thus requires creative applications of Data mining and / or Text mining techniques and also its own unique approaches. Web Usage Mining focuses on techniques that could predict the behavior of users while they are interacting with the WWW. Web usage mining, discover user navigation patterns from web data, tries to discovery the useful information from the secondary data derived from the interactions of the users while surfing on the Web. Web usage mining collects the data from Web log records to discover user access patterns of web pages. There are several available research projects and commercial tools that analyze those patterns for different purposes. The insight knowledge could be utilized in personalization, system improvement, site modification, business intelligence and usage characterization. h. PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. j.
It refers to "illegal" activities (e.g., writing fake reviews, also called shilling) that try to mislead readers or automated opinion mining and sentiment analysis systems by giving undeserving positive opinions to some target objects in order to promote the entities and/or by giving false negative opinions to some other entities in order to damage their reputations. Opinion spam has many forms, e.g., fake reviews (also called bogus reviews), fake comments, fake blogs, fake social network postings, deceptions, and deceptive messages. 2.a