暗能星系

    • 登录
    • 搜索

    HDFS基本操作

    大数据
    1
    6
    12
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 anneng 编辑

      《Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS》
      查询根目录

      hdfs dfs –ls /
      

      查询某个或者某些目录的内容

      $ hdfs dfs -ls /user/hadoop/testdir1 /user/hadoop/testdir2
      

      只查询目录

      $ hdfs dfs -ls -d /user/alapati
      

      查询文件详细信息

      $ hdfs dfs –ls /user/hadoop/testdir1/test1.txt
      $ hdfs dfs –ls /hdfs://<hostname>:9000/user/hadoop/dir1/
      

      查询文件详情

      hdfs dfs -stat "%n" /user/alapati/messages
      %b Size of file in bytes
      %F Will return "file", "directory", or "symlink" depending on the type of inode
      %g Group name
      %n Filename
      %o HDFS Block size in bytes ( 128MB by default )
      %r Replication factor
      %u Username of owner
      %y Formatted mtime of inode
      %Y UNIX Epoch mtime of inode
      

      创建目录

      $ hdfs dfs –mkdir hdfs://nn1.example.com/user/hadoop/dir
      -p 创建目录层级
      $ hdfs dfs -mkdir –p /user/hadoop/dir1
      

      删除目录

      hdfs dfs -rm -R /user/alapati
      

      目录删除后会保存到垃圾桶一段时间 :

      hdfs dfs –ls /user/sam/.Trash
      

      清空回收站

      $ hdfs dfs –expunge
      

      修改归属

      $ hdfs dfs –chown sam:produsers  /data/customers/names.txt
      

      修改群组

      $ sudo –u hdfs hdfs dfs –chgrp marketing /users/sales/markets.txt
      

      修改权限

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 anneng 编辑

        查询集群信息
        $ hdfs dfsadmin -report

        Configured Capacity: 2068027170816000 (1.84 PB) #A
        Present Capacity: 2068027170816000 (1.84 PB)
        DFS Remaining: 562576619120381 (511.66 TB) #A
        DFS Used: 1505450551695619 (1.34 PB) #B
        DFS Used%: 72.80% #B
        Under replicated blocks: 1 #C
        Blocks with corrupt replicas: 0
        Missing blocks: 1
        Missing blocks (with replication factor 1): 9 #C


        Live datanodes (54): #D

        Name: 10.192.0.78:50010 (hadoop02.localhost) #E
        Hostname: hadoop02.localhost.com
        Rack: /rack3 #E
        Decommission Status : Normal #F
        Configured Capacity: 46015524438016 (41.85 TB) #G
        DFS Used: 33107988033048 (30.11 TB)
        Non DFS Used: 0 (0 B)
        DFS Remaining: 12907536404968 (11.74 TB)
        DFS Used%: 71.95%
        DFS Remaining%: 28.05% #G
        Configured Cache Capacity: 4294967296 (4 GB) #H
        Cache Used: 0 (0 B)
        Cache Remaining: 4294967296 (4 GB)
        Cache Used%: 0.00%
        Cache Remaining%: 100.00% #H
        Xceivers: 71
        Last contact: Fri May 01 15:15:59 CDT 2015
        #A Configured capacity for HDFS in this cluster
        #B HDFS used storage statistics
        #C Shows if there are any under-replicated, corrupt or missing blocks
        #D Shows how many DataNodes in the cluster are alive and available
        #E The hostname and rack name
        #F Status of the DataNode (decommissioned or not)
        #G Configured and used capacity for this DataNode
        #H Cache usage statistics (if configured)

        刷新节点信息
        dfsadmin –refreshNodes
        提供更详细的信息
        dfsadmin –metasave

        1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 编辑

          权限控制
          hdfs-site.xml

          <property>
          <name>dfs.permissions.enabled</name>
          <value>true</value>
          </property>
          

          HDFS uses a symbolic notation (r, w) to denote the read and write permissions, just as a Linux operating system does.

          When a client accesses a directory, if the client is the same as the directory’s owner, Hadoop tests the owner’s permissions.

          If the group matches the directory’s group, then Hadoop tests the user’s group permissions.

          If neither the owner nor the group names match, Hadoop tests the “other” permission of the directory.

          If none of the permissions checks succeed, the client’s request is denied.

          修改权限
          $ hdfs dfs –chmod –R 755 /user

          HDFS本身没有用户和组的概念:
          1.simple authentication 模式 依赖操作系统的用户和组
          2.Kerberos模式

          添加用户

          $ group add analysts
          $ useradd –g analysts alapati
          $ passwd alapati
          
          core-site.xml需要配置
          <property>
            <name>hadoop.tmp.dir</name>
            <value>/tmp/hadoop-$(user.name)</value>
          </property>
          
          $ hdfs –dfs –chmod –R 777 //tmp/hadoop-alapati
          
          $ hdfs dfs -mkdir /user/alapati
          
          $ su hdfs
          $ hdfs dfs –chown –R alapati:analysts
          $ hdfs dfs –ls /user/
          $ drwxr-xr-x   - alapati   analysts      0 2016-04-27 12:40 /user/alapati
          
          $ hdfs dfsadmin -refreshUserToGroupMappings
          
          $ hdfs dfsadmin -setSpaceQuota 30g /user/alapati
          
          1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 编辑

            查询文件系统大小

            # hdfs dfs -df
            Filesystem                     Size             Used        Available Use%
            hdfs://hadoop01-ns 2068027170816000 1591361508626924  476665662189076  77%
            #
            

            查询使用情况
            $ hdfs dfs –du URI

            增加新的目录hdfs-site.xml

            <property>
            <name>df.data.dir</name>
            value>/u01/hadoop/data,/u02/hadoop/data,/u03/hadoop/data</value>
            </property>
            
            1 条回复 最后回复 回复 引用 0
            • A
              anneng 最后由 编辑

              查询目录是否存在:

              $ hdfs dfs –test –e /users/alapati/test
              

              创建空文件

              $ hdfs dfs -touchz /user/alapati/test3.txt
              
              1 条回复 最后回复 回复 引用 0
              • A
                anneng 最后由 编辑

                设置配额:
                Space quotas: Allow you to set a ceiling on the amount of space used for an individual directory

                $ hdfs dfsadmin –setSpaceQuota <N> <dirname>...<dirname>
                清除设置
                $ dfsadmin –clrSpaceQuota /user/alapati
                

                Name quotas: Let you specify the maximum number of file and directory names in the tree rooted at a directory
                设置最大文件数

                $ hdfs dfsadmin –setQuota <max_number> <directory>
                

                查询配额:
                dfs –count –q

                1 条回复 最后回复 回复 引用 0
                • First post
                  Last post
                Powered by 暗能星系