It does not matter how slowly you go as long as you do not stop.
--Your friends at LectureNotes

Note for Internet of Things - IOT by SHIVA PRASAD DAS

  • Internet of Things - IOT
  • Note
  • Silicon Institute of Technology - SIT
  • 115 Offline Downloads
  • Uploaded 1 year ago
Shiva Prasad Das
Shiva Prasad Das
0 User(s)
Download PDFOrder Printed Copy

Share it with your friends

Leave your Comments

Text from page-2

Outline • Overview of Hadoop ecosystem • MapReduce architecture • MapReduce job execution flow • MapReduce schedulers Book website: http://www.internet-of-things-book.com Bahga & Madisetti, © 2015

Text from page-3

Hadoop Ecosystem • Apache Hadoop is an open source framework for distributed batch processing of big data. • Hadoop Ecosystem includes: • • • • • • • • • • • • • • Hadoop MapReduce HDFS YARN HBase Zookeeper Pig Hive Mahout Chukwa Cassandra Avro Oozie Flume Sqoop Book website: http://www.internet-of-things-book.com Bahga & Madisetti, © 2015

Text from page-4

Apache Hadoop • A Hadoop cluster comprises of a Master node, backup node and a number of slave nodes. • The master node runs the NameNode and JobTracker processes and the slave nodes run the DataNode and TaskTracker components of Hadoop. • The backup node runs the Secondary NameNode process. • NameNode • NameNode keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself. Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file. • Secondary NameNode • NameNode is a Single Point of Failure for the HDFS Cluster. An optional Secondary NameNode which is hosted on a separate machine creates checkpoints of the namespace. • JobTracker • The JobTracker is the service within Hadoop that distributes MapReduce tasks to specific nodes in the cluster, ideally the nodes that have the data, or at least are in the same rack. Book website: http://www.internet-of-things-book.com Bahga & Madisetti, © 2015

Text from page-5

Apache Hadoop • TaskTracker • TaskTracker is a node in a Hadoop cluster that accepts Map, Reduce and Shuffie tasks from the JobTracker. • Each TaskTracker has a defined number of slots which indicate the number of tasks that it can accept. • DataNode • A DataNode stores data in an HDFS file system. • A functional HDFS filesystem has more than one DataNode, with data replicated across them. • DataNodes respond to requests from the NameNode for filesystem operations. • Client applications can talk directly to a DataNode, once the NameNode has provided the location of the data. • Similarly, MapReduce operations assigned to TaskTracker instances near a DataNode, talk directly to the DataNode to access the files. • TaskTracker instances can be deployed on the same servers that host DataNode instances, so that MapReduce operations are performed close to the data. Book website: http://www.internet-of-things-book.com Bahga & Madisetti, © 2015

Lecture Notes