BIPIN GUPTA: Hadoop / HDFS Commands

HDFS Command Syntax Overview: 
hadoop fs

: Ex.: hadoop fs -ls
hadoop version : check hadoop installed properly

HELP:
help [cmd]: hopefully this is self-describing

Inspect files:
-ls/lsr : list all files in (hadoop fs -ls /)
-cat : print  on stdout
-tail [-f] : output the last part of the

-test : return attributes of file and directory

-touchz : create new emty file size 0
-du/dus : show  space utilization

-count : no. of directories, files, and bytes
-setrep : (-r) change the replication factor of file/directory
-stat : info about the specified path

Create/remove files:
-mkdir : create a directory
-mv  : move (rename) files
-cp  : copy files
-rm/rmr : remove files

Copy/Put files from remote m/c into the HADOOP cluster:
-copyFromLocal  : copy a local file to the HDFS
-copyToLocal  : copy a file on the HDFS to the local disk

-cp : copies one or more files
-get : copies files to the local file system
-put : copies files from the local file system
-mv : moves one or more files

Hadoop Namenode Commands:
hadoop namenode -format: Format HDFS filesystem from Namenode
hadoop namenode -upgrade: Upgrade the NameNode
start-dfs.sh Start: HDFS Daemons
stop-dfs.sh Stop: HDFS Daemons
start-mapred.sh: Start: MapReduce Daemons
stop-mapred.sh Stop: MapReduce Daemons
hadoop namenode -recover -force: Recover namenode metadata after a cluster failure (may lose data)

Hadoop Configuration Files:
core-site.xml : Parameters for entire Hadoop cluster
hdfs-site.xml : Parameters for HDFS and its clients
mapred-site.xml : Parameters for MapReduce and its clients

yarn-site.xml : Parameters for nodemanager and resource manager

masters : Host machines for secondary Namenode
slaves : List of slave hosts

hadoop-env.sh : Sets ENV variables for Hadoop
set JAVA_HOME=%JAVA_HOME% set HADOOP_PREFIX=D:\Hadoop

Hadoop Job Commands
hadoop job -submit  : Submit the job
hadoop job -status  : Print job status completion percentage
hadoop job -list all : List all jobs
hadoop job -list-active-trackers : List all available TaskTrackers
hadoop job -set-priority   : Set priority for a job. Valid priorities : VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW
hadoop job -kill-task  : Kill a task
hadoop job -history : Display job history including job details, failed and killed jobs

Hadoop mradmin Commands
hadoop mradmin -safemode get : Check Job tracker status
hadoop mradmin -refreshQueues : Reload mapreduce configuration
hadoop mradmin -refreshNodes : Reload active TaskTrackers
hadoop mradmin -refreshServiceAcl : Force Jobtracker to reload service ACL
hadoop mradmin -refreshUserToGroupsMappings : Force jobtracker to reload user group mappings

Hadoop fsck Commands
hadoop fsck / : Filesystem check on HDFS
hadoop fsck / -files : Display files during check
hadoop fsck / -files -blocks : Display files and blocks during check
hadoop fsck / -files -blocks -locations : Display files, blocks and its location hadoop fsck / -files -blocks -locations -racks : Display network topology for data-node locations
hadoop fsck -delete : Delete corrupted files
hadoop fsck -move : Move corrupted files to /lost+found directory

Hadoop Balancer Commands
start-balancer.sh : Balance the cluster
hadoop dfsadmin -setBalancerBandwidth  : Adjust bandwidth used by the balancer
hadoop balancer -threshold 20 : Limit balancing to only 20% resources in the cluster

Hadoop Safe Mode (Maintenance Mode) Commands
The following dfsadmin commands helps the cluster to enter or leave safe mode, which is also called as maintenance mode. 
In this mode, Namenode does not accept any changes to the name space, it does not replicate or delete blocks.
hadoop dfsadmin -safemode enter : Enter safe mode
hadoop dfsadmin -safemode leave : Leave safe mode
hadoop dfsadmin -safemode get : Get the status of mode
hadoop dfsadmin -safemode wait : Wait until HDFS finishes data block replication
hadoop dfsadmin -report : total usage on the cluster

Launching Hadoop Jobs:
hadoop jar  [mainClass] args... :

Launch job via jar file
hadoop jar com.twitter.scalding.Tool [mainClass] args : A Scalding job is launched using
mapred job -kill : If you need to kill a map-reduce job

Commonly Used Administration Commands:
Format the namenode: hadoop namenode -format
Starting Secondary namenode: hadoop secondrynamenode
Run namenode : hadoop namenode
Run data node: hadoop datanode
Cluster Balancing: hadoop balancer
Run MapReduce job tracker node: hadoop jobtracker
Run MapReduce task tracker node: hadoop tasktracker

Start/Stop Yarn (starts resourcemanager and nodemanager)and DFS (Starts namenode and data node) from sbin directory:
start-yarn, stop-yarn
start-dfs, stop-dfs

Start and Stop ALL daemon from sbin directory:
start-all, stop-all

Check All 5 daemons (Namenode,Secoundary Node,Job Tracker, DataNode, Task Tracker ) are up using:
jps

Hadoop2x--Eclipse-plugin :
Download => https://github.com/winghc/hadoop2x-eclipse-plugin/tree/master/release

BIPIN GUPTA

Wednesday, February 3, 2016

Hadoop / HDFS Commands

Total Pageviews

Followers