Properties
- Block size determines how small each block will be broken down into.
- Replication Factor allows you to adjust the global replication factor for the entire cluster across the data nodes
Name Node
- stores metadata of the file system
- is aware of where the chunks of a file are stored
- acts like a master node.
- is aware of data node size
- does health deck of data nodes ↳ takes care of machines that fail and replicates data as necessary
- Can be aware of data centre geography
* for e.g. which rack a data node is in useful for when you don’t want to replicate a chunk in the same rack
General flow of file insertion
- Client requests to store file at name node
- Data node instructs client which node to store file
- Client stores at Data Node 1
- Data Node 1, replicates data on data node 1 upon instruction from the name node