HDFS基本操作
-
《Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS》
查询根目录hdfs dfs –ls /查询某个或者某些目录的内容
$ hdfs dfs -ls /user/hadoop/testdir1 /user/hadoop/testdir2只查询目录
$ hdfs dfs -ls -d /user/alapati查询文件详细信息
$ hdfs dfs –ls /user/hadoop/testdir1/test1.txt $ hdfs dfs –ls /hdfs://<hostname>:9000/user/hadoop/dir1/查询文件详情
hdfs dfs -stat "%n" /user/alapati/messages %b Size of file in bytes %F Will return "file", "directory", or "symlink" depending on the type of inode %g Group name %n Filename %o HDFS Block size in bytes ( 128MB by default ) %r Replication factor %u Username of owner %y Formatted mtime of inode %Y UNIX Epoch mtime of inode创建目录
$ hdfs dfs –mkdir hdfs://nn1.example.com/user/hadoop/dir -p 创建目录层级 $ hdfs dfs -mkdir –p /user/hadoop/dir1删除目录
hdfs dfs -rm -R /user/alapati目录删除后会保存到垃圾桶一段时间 :
hdfs dfs –ls /user/sam/.Trash清空回收站
$ hdfs dfs –expunge修改归属
$ hdfs dfs –chown sam:produsers /data/customers/names.txt修改群组
$ sudo –u hdfs hdfs dfs –chgrp marketing /users/sales/markets.txt修改权限
-
查询集群信息
$ hdfs dfsadmin -reportConfigured Capacity: 2068027170816000 (1.84 PB) #A
Present Capacity: 2068027170816000 (1.84 PB)
DFS Remaining: 562576619120381 (511.66 TB) #A
DFS Used: 1505450551695619 (1.34 PB) #B
DFS Used%: 72.80% #B
Under replicated blocks: 1 #C
Blocks with corrupt replicas: 0
Missing blocks: 1
Missing blocks (with replication factor 1): 9 #C
Live datanodes (54): #D
Name: 10.192.0.78:50010 (hadoop02.localhost) #E
Hostname: hadoop02.localhost.com
Rack: /rack3 #E
Decommission Status : Normal #F
Configured Capacity: 46015524438016 (41.85 TB) #G
DFS Used: 33107988033048 (30.11 TB)
Non DFS Used: 0 (0 B)
DFS Remaining: 12907536404968 (11.74 TB)
DFS Used%: 71.95%
DFS Remaining%: 28.05% #G
Configured Cache Capacity: 4294967296 (4 GB) #H
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00% #H
Xceivers: 71
Last contact: Fri May 01 15:15:59 CDT 2015
#A Configured capacity for HDFS in this cluster
#B HDFS used storage statistics
#C Shows if there are any under-replicated, corrupt or missing blocks
#D Shows how many DataNodes in the cluster are alive and available
#E The hostname and rack name
#F Status of the DataNode (decommissioned or not)
#G Configured and used capacity for this DataNode
#H Cache usage statistics (if configured)刷新节点信息
dfsadmin –refreshNodes
提供更详细的信息
dfsadmin –metasave -
权限控制
hdfs-site.xml<property> <name>dfs.permissions.enabled</name> <value>true</value> </property>HDFS uses a symbolic notation (r, w) to denote the read and write permissions, just as a Linux operating system does.
When a client accesses a directory, if the client is the same as the directory’s owner, Hadoop tests the owner’s permissions.
If the group matches the directory’s group, then Hadoop tests the user’s group permissions.
If neither the owner nor the group names match, Hadoop tests the “other” permission of the directory.
If none of the permissions checks succeed, the client’s request is denied.
修改权限
$ hdfs dfs –chmod –R 755 /userHDFS本身没有用户和组的概念:
1.simple authentication 模式 依赖操作系统的用户和组
2.Kerberos模式添加用户
$ group add analysts $ useradd –g analysts alapati $ passwd alapaticore-site.xml需要配置 <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop-$(user.name)</value> </property>$ hdfs –dfs –chmod –R 777 //tmp/hadoop-alapati$ hdfs dfs -mkdir /user/alapati$ su hdfs $ hdfs dfs –chown –R alapati:analysts $ hdfs dfs –ls /user/ $ drwxr-xr-x - alapati analysts 0 2016-04-27 12:40 /user/alapati$ hdfs dfsadmin -refreshUserToGroupMappings$ hdfs dfsadmin -setSpaceQuota 30g /user/alapati -
查询文件系统大小
# hdfs dfs -df Filesystem Size Used Available Use% hdfs://hadoop01-ns 2068027170816000 1591361508626924 476665662189076 77% #查询使用情况
$ hdfs dfs –du URI增加新的目录hdfs-site.xml
<property> <name>df.data.dir</name> value>/u01/hadoop/data,/u02/hadoop/data,/u03/hadoop/data</value> </property> -
查询目录是否存在:
$ hdfs dfs –test –e /users/alapati/test创建空文件
$ hdfs dfs -touchz /user/alapati/test3.txt -
设置配额:
Space quotas: Allow you to set a ceiling on the amount of space used for an individual directory$ hdfs dfsadmin –setSpaceQuota <N> <dirname>...<dirname> 清除设置 $ dfsadmin –clrSpaceQuota /user/alapatiName quotas: Let you specify the maximum number of file and directory names in the tree rooted at a directory
设置最大文件数$ hdfs dfsadmin –setQuota <max_number> <directory>查询配额:
dfs –count –q