Properties

  • Block size determines how small each block will be broken down into.
  • Replication Factor allows you to adjust the global replication factor for the entire cluster across the data nodes

Name Node

  • stores metadata of the file system
  • is aware of where the chunks of a file are stored
  • acts like a master node.
  • is aware of data node size
  • does health deck of data nodes ↳ takes care of machines that fail and replicates data as necessary
  • Can be aware of data centre geography   * for e.g. which rack a data node is in useful for when you don’t want to replicate a chunk in the same rack

General flow of file insertion

  1. Client requests to store file at name node
  2. Data node instructs client which node to store file
  3. Client stores at Data Node 1
  4. Data Node 1, replicates data on data node 1 upon instruction from the name node