HDFS Command Syntax Overview:
hadoop fs
:
Ex.: hadoop fs -ls
hadoop version : check hadoop installed properly
HELP:
help [cmd]: hopefully this is self-describing
Inspect files:
-ls/lsr : list all files in (hadoop fs -ls /)
-cat : print on stdout
-tail [-f] : output the last part of the
-test : return attributes of file and directory
-touchz : create new emty file size 0
-du/dus : show space utilization
-count : no. of directories, files, and bytes
-setrep : (-r) change the replication factor of file/directory
-stat : info about the specified path
Create/remove files:
-mkdir : create a directory
-mv : move (rename) files
-cp : copy files
-rm/rmr : remove files
Copy/Put files from remote m/c into the HADOOP cluster:
-copyFromLocal : copy a local file to the HDFS
-copyToLocal : copy a file on the HDFS to the local disk
-cp : copies one or more files
-get : copies files to the local file system
-put : copies files from the local file system
-mv : moves one or more files
Hadoop Namenode Commands:
hadoop namenode -format: Format HDFS filesystem from Namenode
hadoop namenode -upgrade: Upgrade the NameNode
start-dfs.sh Start: HDFS Daemons
stop-dfs.sh Stop: HDFS Daemons
start-mapred.sh: Start: MapReduce Daemons
stop-mapred.sh Stop: MapReduce Daemons
hadoop namenode -recover -force: Recover namenode metadata after a cluster failure (may lose data)
Hadoop Configuration Files:
core-site.xml : Parameters for entire Hadoop cluster
hdfs-site.xml : Parameters for HDFS and its clients
mapred-site.xml : Parameters for MapReduce and its clients
yarn-site.xml :
Parameters for nodemanager and resource manager
masters : Host machines for secondary Namenode
slaves : List of slave hosts
hadoop-env.sh : Sets ENV variables for Hadoop
set JAVA_HOME=%JAVA_HOME%
set HADOOP_PREFIX=D:\Hadoop
Hadoop Job Commands
hadoop job -submit : Submit the job
hadoop job -status : Print job status completion percentage
hadoop job -list all : List all jobs
hadoop job -list-active-trackers : List all available TaskTrackers
hadoop job -set-priority : Set priority for a job. Valid priorities : VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW
hadoop job -kill-task : Kill a task
hadoop job -history : Display job history including job details, failed and killed jobs
Hadoop mradmin Commands
hadoop mradmin -safemode get : Check Job tracker status
hadoop mradmin -refreshQueues : Reload mapreduce configuration
hadoop mradmin -refreshNodes : Reload active TaskTrackers
hadoop mradmin -refreshServiceAcl : Force Jobtracker to reload service ACL
hadoop mradmin -refreshUserToGroupsMappings : Force jobtracker to reload user group mappings
Hadoop fsck Commands
hadoop fsck / : Filesystem check on HDFS
hadoop fsck / -files : Display files during check
hadoop fsck / -files -blocks : Display files and blocks during check
hadoop fsck / -files -blocks -locations : Display files, blocks and its location hadoop fsck / -files -blocks -locations -racks : Display network topology for data-node locations
hadoop fsck -delete : Delete corrupted files
hadoop fsck -move : Move corrupted files to /lost+found directory
Hadoop Balancer Commands
start-balancer.sh : Balance the cluster
hadoop dfsadmin -setBalancerBandwidth : Adjust bandwidth used by the balancer
hadoop balancer -threshold 20 : Limit balancing to only 20% resources in the cluster
Hadoop Safe Mode (Maintenance Mode) Commands
The following dfsadmin commands helps the cluster to enter or leave safe mode, which is also called as maintenance mode.
In this mode, Namenode does not accept any changes to the name space, it does not replicate or delete blocks.
hadoop dfsadmin -safemode enter : Enter safe mode
hadoop dfsadmin -safemode leave : Leave safe mode
hadoop dfsadmin -safemode get : Get the status of mode
hadoop dfsadmin -safemode wait : Wait until HDFS finishes data block replication
hadoop dfsadmin -report : total usage on the cluster
Launching Hadoop Jobs:
hadoop jar [mainClass] args... :
Launch job via jar file
hadoop jar com.twitter.scalding.Tool [mainClass] args
: A Scalding job is launched using
mapred job -kill
: If you need to kill a map-reduce job
Commonly Used Administration Commands:
Format the namenode: hadoop namenode -format
Starting Secondary namenode: hadoop secondrynamenode
Run namenode : hadoop namenode
Run data node: hadoop datanode
Cluster Balancing: hadoop balancer
Run MapReduce job tracker node: hadoop jobtracker
Run MapReduce task tracker node: hadoop tasktracker
Start/Stop Yarn (starts resourcemanager and nodemanager)and DFS (Starts namenode and data node) from sbin directory:
start-yarn, stop-yarn
start-dfs, stop-dfs
Start and Stop ALL daemon from sbin directory:
start-all, stop-all
Check All 5 daemons (Namenode,Secoundary Node,Job Tracker, DataNode, Task Tracker ) are up using:
jps
Hadoop2x--Eclipse-plugin :
Download => https://github.com/winghc/hadoop2x-eclipse-plugin/tree/master/release