Wednesday, February 3, 2016

Hadoop / HDFS Commands


HDFS Command Syntax Overview:
hadoop fs
: Ex.: hadoop fs -ls
hadoop version : check hadoop installed properly
HELP:
help [cmd]: hopefully this is self-describing


Inspect files:
-ls/lsr : list all files in (hadoop fs -ls /)
-cat : print on stdout
-tail [-f] : output the last part of the

-test : return attributes of file and directory
-touchz : create new emty file size 0
-du/dus : show space utilization

-count : no. of directories, files, and bytes
-setrep : (-r) change the replication factor of file/directory
-stat : info about the specified path

Create/remove files:
-mkdir : create a directory
-mv : move (rename) files
-cp : copy files
-rm/rmr : remove files


Copy/Put files from remote m/c into the HADOOP cluster:
-copyFromLocal : copy a local file to the HDFS
-copyToLocal : copy a file on the HDFS to the local disk

-cp : copies one or more files
-get : copies files to the local file system
-put : copies files from the local file system
-mv : moves one or more files

Hadoop Namenode Commands:
hadoop namenode -format: Format HDFS filesystem from Namenode
hadoop namenode -upgrade: Upgrade the NameNode
start-dfs.sh Start: HDFS Daemons
stop-dfs.sh Stop: HDFS Daemons
start-mapred.sh: Start: MapReduce Daemons
stop-mapred.sh Stop: MapReduce Daemons
hadoop namenode -recover -force: Recover namenode metadata after a cluster failure (may lose data) 


Hadoop Configuration Files:
core-site.xml : Parameters for entire Hadoop cluster
hdfs-site.xml : Parameters for HDFS and its clients
mapred-site.xml : Parameters for MapReduce and its clients

yarn-site.xml : Parameters for nodemanager and resource manager
masters : Host machines for secondary Namenode
slaves : List of slave hosts

hadoop-env.sh : Sets ENV variables for Hadoop 
set JAVA_HOME=%JAVA_HOME%
set HADOOP_PREFIX=D:\Hadoop


Hadoop Job Commands
hadoop job -submit : Submit the job
hadoop job -status : Print job status completion percentage
hadoop job -list all : List all jobs
hadoop job -list-active-trackers : List all available TaskTrackers
hadoop job -set-priority : Set priority for a job. Valid priorities : VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW
hadoop job -kill-task : Kill a task
hadoop job -history : Display job history including job details, failed and killed jobs


Hadoop mradmin Commands
hadoop mradmin -safemode get : Check Job tracker status
hadoop mradmin -refreshQueues : Reload mapreduce configuration
hadoop mradmin -refreshNodes : Reload active TaskTrackers
hadoop mradmin -refreshServiceAcl : Force Jobtracker to reload service ACL
hadoop mradmin -refreshUserToGroupsMappings : Force jobtracker to reload user group mappings


Hadoop fsck Commands
hadoop fsck / : Filesystem check on HDFS
hadoop fsck / -files : Display files during check
hadoop fsck / -files -blocks : Display files and blocks during check
hadoop fsck / -files -blocks -locations : Display files, blocks and its location hadoop fsck / -files -blocks -locations -racks : Display network topology for data-node locations
hadoop fsck -delete : Delete corrupted files
hadoop fsck -move : Move corrupted files to /lost+found directory


Hadoop Balancer Commands
start-balancer.sh : Balance the cluster
hadoop dfsadmin -setBalancerBandwidth : Adjust bandwidth used by the balancer
hadoop balancer -threshold 20 : Limit balancing to only 20% resources in the cluster


Hadoop Safe Mode (Maintenance Mode) Commands
The following dfsadmin commands helps the cluster to enter or leave safe mode, which is also called as maintenance mode.
In this mode, Namenode does not accept any changes to the name space, it does not replicate or delete blocks.
hadoop dfsadmin -safemode enter : Enter safe mode
hadoop dfsadmin -safemode leave : Leave safe mode
hadoop dfsadmin -safemode get : Get the status of mode
hadoop dfsadmin -safemode wait : Wait until HDFS finishes data block replication
hadoop dfsadmin -report : total usage on the cluster


Launching Hadoop Jobs:
hadoop jar [mainClass] args... :
Launch job via jar file
hadoop jar com.twitter.scalding.Tool [mainClass] args : A Scalding job is launched using 
mapred job -kill : If you need to kill a map-reduce job  

Commonly Used Administration Commands:
Format the namenode: hadoop namenode -format
Starting Secondary namenode: hadoop secondrynamenode
Run namenode : hadoop namenode
Run data node: hadoop datanode
Cluster Balancing: hadoop balancer
Run MapReduce job tracker node: hadoop jobtracker
Run MapReduce task tracker node: hadoop tasktracker


Start/Stop Yarn (starts resourcemanager and nodemanager)and DFS (Starts namenode and data node) from sbin directory:
start-yarn, stop-yarn
start-dfs, stop-dfs


Start and Stop ALL daemon from sbin directory:
start-all, stop-all


Check All 5 daemons (Namenode,Secoundary Node,Job Tracker, DataNode, Task Tracker ) are up using:
jps

Hadoop2x--Eclipse-plugin :
Download => https://github.com/winghc/hadoop2x-eclipse-plugin/tree/master/release