×

Close

- BIG DATA ANALYTICS - bda
- Other
**198 Views**- 3 Offline Downloads
- Uploaded 7 months ago

1 Big Data Endterm Practice Questions Unit-1: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. What are the characteristics of data? Classify digital data with suitable examples. Outline the challenges associated with unstructured data. Define big data with example. Write down the name of 5v’s of big data. Write down four applications of big data analytics. Explain different types of analytics used in big data. Identify the role of stakeholders involved in data analytics project. Difference between Business Intelligence and Big data analytics. Discuss different phases of data analytics life cycle. What is distributed computing? Explain the working of distributed computing environment. What are the top challenges faced in Big Data and what kind of technology you recommend to mitigate the challenges. What is shared nothing architecture and how it is related to share disk and share memory? What are the advantages of shared nothing architecture? Explain CAP Theorem and proof it. What is Data Analytics Life Cycle and what are the different phases/stages associated to it? Discuss on the roles and responsibilities of different stakeholders in data analytics project. What is Big Data Analytics Life Cycle and what are the different phases/stages associated to it? How Data Analytics Life Cycle is different from Big Data Analytics Life Cycle. What is the role of Analytics Sandbox and in which phase it is being used in a data analytics life cycle? Discuss similarities and differences between ELT and ETL. Discuss the differences between parallel system and distributed system Explain the following a. Traditional Analytics Architecture b. Modern In-Database Analytics Architecture c. MPP Database Analytics Architecture d. In-Memory Computing Differentiate between business intelligence and business analytics. Identify the basic characteristics of Big Data used in social networking. Place the following in relevant structured, unstructured and semi-structured basket: Images, e-mail, CSV files, JSON data, Chat conversation, Web logs. List the differences between reporting and analysis. In what ways does analyzing Big Data help organizations to prevent fraud? Unit-2: 1. 2. 3. 4. 5. 6. Explain Conceptual data model, Logical model, and Physical data model with suitable examples. List the major functions of the Big Data architecture model. List the components of the Big Data architecture. Explain the functioning of the Ingestion layer in the Big Data architecture. Discuss the key building blocks of the Hadoop platform management layer. What is the role of analytical engine in the Big Data environment? Describe different types of engines used to analyze Big Data. 7. Explain data stream with suitable examples. Have a holly and jolly end semester exam

2 8. Discuss similarities and differences between SQL and NoSQL. 9. Explain rule based and learning based approach with suitable example. 10. What are the characteristics of Big Data Streaming System? 11. Explain the difference between data-at-rest and data-in-motion with suitable example. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. What is stream computing and how it is different from traditional computing? Explain Bloom filter algorithm with a suitable example. Discuss Bloom filter performance. A empty bloom filter is of size 11 with 4 hash functions namely a. h1(x) = (3x+ 3) mod 6 b. h2(x) = (2x+ 9) mod 2 c. h3(x) = (3x+ 7) mod 8 d. h4(x) = (2x+ 3) mod 5 Illustrate bloom filter insertion with 7 and then 8. Then perform bloom filter lookup/membership test with 10 and 48. Calculate the optimal number of hash functions for 10 bit length bloom filter having 3 numbers of input elements. Plot the graph and discuss if the number of hash functions are tends to increase. Calculate the probability of False Positives with table size 10 and the no. of items to be inserted are 3. Calculate the probability that a slot is set to 1 after insertion of 5 elements for 15 bit length bloom filter. Calculate the probability that a slot is not set to 1 after insertion of 5 elements for 15 bit length bloom filter. Calculate the probability that a slot is hashed with 5 hash functions for 15 bit length bloom filter. Explain the algorithm of counting distinct elements in a stream with a suitable example. List the use cases of Bloom filter. Explain the algorithm of detecting False Positive in Bloom filter. Explain the difference between False Positive and False Negative with suitable examples. Design Flajolet-Martin algorithm to count distinct elements in a stream. Show a step-by-step process to count the distinct elements in a data stream of elements {16, 8, 24, 69 , 3, 4, 9, 12, 12, 14, 18, 27, 8, 13, 90, 190, 112} with hash function h(x) = (5x+1) mod 6 of size 13. Discuss probabilistic and non-probabilistic data sampling methods using suitable examples. What is the role of the hypervisor? Discuss the different types of hypervisor with suitable examples. What is virtualization and what are the different types/classes of virtualization? What are the benefits of virtualization? Illustrate on how Cloud and Big Data are related to each other. What is data stream management system (DSMS) ? Unit-3: 1. What is NoSQL and why it is used? 2. Discuss different use cases of NoSQL 3. Explain different types of NoSQL with suitable examples. Have a holly and jolly end semester exam

3 4. What are the differences between column-oriented and row-oriented database. Explain with pictorial depiction. 5. What are the difference between SQL and NoSQL? 6. Describe each component of Hadoop Ecosystem. 7. Explain Hadoop 2.0 architecture with pictorial depiction. Explain the concept of blocks in HDFS architecture? 8. Explain Hadoop 2.0 HDFS daemons with pictorial depiction. Explain their roles. 9. What you understand by Rack awareness and replication. Explain with suitable example. 10. Explain the anatomy of file read and file write in Hadoop 2.0 HDFS with pictorial depiction. 11. Why was 128 MB chosen as default chunk size in Hadoop 2.X? What was the size in Hadoop 1.x? 12. How MapReduce works and explain with suitable example. 13. In what circumstances MapReduce is useful/used? In which cases, it is not suitable. 14. Draw the MapReduce process to count the number of words for the input: Dog Cat Rat Rat Bat Monkey Car Car Rat Car Dog car Rat Monkey Rat Rat Rat Car Car Bat 15. Explain data locality with suitable examples. Explain the difference between moving computation and moving data in a cluster. 16. What is YARN and what is it used? 17. Explain YARN architecture and discuss their roles. 18. How an application submitted to YARN is successfully executed? 19. What is the need of Apache Pig and how it is different from MapReduce. 20. Explain Apache Pig architecture and discuss their roles. 21. Discuss Apache Pig execution mode and execution mechanisms. 22. Discuss different operators of Apache Pig with suitable examples. 23. Discuss Hive DDL (creation of database, tables etc) with suitable examples. 24. Discuss different operators of Apache Hive with suitable examples. 25. Discuss Hive QL (select-where, select-order by, select – group by, select-join) with suitable example. 26. Explain Hive portioning with suitable example. 27. How Scoop works and discuss its Import and Export utility 28. What is HBase? Mention the difference between HBase and RDBMS? 29. Explain HBase architecture and discuss the concepts of regions. Discuss its storage mechanism. Unit 4: 1. A cashier has currency notes of denominations 10, 50 and 100. If the amount to be withdrawn is input through the keyboard in hundreds, write an R-script to find the total number of currency notes of each denomination the cashier will have to give to the withdrawer. 2. Ramesh’s basic salary is input through the keyboard. His dearness allowance is 40% of basic salary, and house rent allowance is 20% of basic salary. Write an R-script to calculate his gross salary. Have a holly and jolly end semester exam

4 3. Write an R-script to check whether an integer number is an Armstrong number or not. If sum of cubes of each digit of the number is equal to the number itself, then the number is called an Armstrong number. For example, 153 = ( 1 * 1 * 1 ) + ( 5 * 5 * 5 ) + ( 3 * 3 * 3 ) 4. Write an R-script to reverse the number 5. Write an R-script to sum the series S=1+(1+2)+(1+2+3)+...+(1+2+3+...+n) 6. Write an R-script to evaluate sum of the following series using recursive function 1+2+3+………………. +N 7. Write an R-script to convert decimal into binary using recursive function 8. Write an R-script to find the factorial of a number using recursive function 9. Write an R-script to develop a function that receives 5 numbers and display the sum, average and standard deviation of these numbers using function. 10. Write an R-script to input data for a matrix and check the given matrix is symmetric or not? 11. The nth triangular number is given by n * (n + 1) / 2. Create a sequence of the first 20 triangular numbers. R has a built-in constant, letters that contains the lowercase letters of the Roman alphabet. Name the elements of the vector that you just created with the first 20 letters of the alphabet. Select the triangular numbers where the name is a vowel. 12. A cricket team has following table of batting figures from a series of test matches: Player’s Name Runs Innings Times not out Sachin 8430 150 18 Rahul 4235 158 9 Saurabh 6789 168 11 Virat 9898 200 13 And so on… Write an R-script to read the figures set out in the above form and then calculate the batting average and print out the complete table including the averages. 13. Write an R-script to print a table of values of the function y = e-x for x varying from 0 to 10 in steps of 0.1. 14. An electricity board charges the following rates to domestic users to discourage large consumption of energy: For the 1st 100 units – Rs 30 per unit For the next 200 units – Rs 80 per unit Beyond 300 units – Rs 90 per unit All users are charged a minimum of Rs 500.00. If the total amount is more than Rs 3000.00 then an additional charge of 15% is added. Write an R-script to read the names of users and the number of units consumed and then print out the charges with names. 15. Write an R-script to represent a vector (a series of floating point values) with the functions: a. Creating the vector b. Modify the value of a given element c. Multiply by a scalar value d. Display the vector 16. Discuss with proper example for the following list of functions in R, apply(), rbind(), order(), skip() 17. Create a data frame with the following variables Died.At <- c(22,40,72,41) Writer.At <- c(16, 18, 36, 36) Have a holly and jolly end semester exam

## Leave your Comments