暗能星系

    • 登录
    • 搜索

    Hbase基础知识

    大数据
    1
    5
    14
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 anneng 编辑

      1.Hbase表设计
      In HBase, you will find two different types of tables: the system tables and the user
      tables. Systems tables are used internally by HBase to keep track of meta information
      like the table’s access control lists (ACLs), metadata for the tables and regions, name‐
      spaces, and so on. There should be no need for you to look at those tables. User tables
      are what you will create for your use cases. They will belong to the default name‐
      space unless you create and use a specific one.

      883300c1-458a-4a46-9b9e-64ef5e802dcd-image.png
      一个具体例子:
      7f9d6d17-d183-4dd3-a08b-41b69aa19e88-image.png
      https://www.tutorialspoint.com/hbase/hbase_create_data.htm

      Only columns where there is a value are stored in the underlying filesystem.
      fd740118-2eb7-40fc-b10f-c5a0fdd7c8ae-image.png
      tables are split into regions where each region will store a specific range
      of data. The regions are assigned to RegionServers to serve each region’s content.
      A column family is an HBase-specific concept that you will not find in other RDBMS applications. For the same region, different column families will store the data into different files and can be configured differently. Data with the same access pattern and the same format should be grouped into the same column family. As an example regarding the format, if you need to store a lot of textual metadata information for customer profiles in addition to image files for each customer’s profile photo, you might want to store them into two different column families: one compressed (where all the textual information will be stored), and one not compressed (where the image files will be stored). As an example regarding the access pattern, if some information is mostly read and almost never written, and some is mostly written and almost never read, you might want to separate them into two different column families. If the different columns you want to store have a similar format and access pattern, regroup them within the same column family.

      Stores
      We will find one store per column family. A store object regroups one memstore and zero or more store files (called HFiles). This is the entity that will store all the information written into the table and will also be used when data needs to be read from the table.

      HFiles
      HFiles are created when the memstores are full and must be flushed to disk. HFiles are eventually compacted together over time into bigger files. They are the HBase file format used to store table data. HFiles are composed of different types of blocks (e.g.,
      index blocks and data blocks). HFiles are stored in HDFS, so they benefit from Hadoop persistence and replication.

      Blocks
      HFiles are composed of blocks. Those blocks should not be confused with HDFS blocks. One HDFS block might contain multiple HFile blocks. HFile blocks are usually between 8 KB and 1 MB, but the default size is 64 KB. However, if compression is
      configured for a given table, HBase will still generate 64 KB blocks but will then compress them. The size of the compressed block on the disk might vary based on the data and the compression format. Larger blocks will create a smaller number of index values and are good for sequential table access, while smaller blocks will create more index values and are better for random read accesses.

      3e677184-4834-4fa7-be4b-c1e3165f2318-image.png
      each row will be stored within a specific format. Figure 2-4 represents the format of an individual HBase cell.

      节点角色:
      2a977524-fe14-4eb2-a706-89f5f8f051c8-image.png
      Master Server
      • Region assignment
      • Load balancing
      • RegionServer recovery
      • Region split completion monitoring
      • Tracking active and dead servers

      Unlike HBase RegionServers, the HBase Master doesn’t have much workload and can be installed on servers with less memory and fewer cores.Building HBase Masters (and other master services like NameNodes, ZooKeeper, etc.)on robust hardware with OS on RAID drives, dual power supply, etc. is highly recommended.

      RegionServer
      A RegionServer (RS) is the application hosting and serving the HBase regions and therefore the HBase data.Even if it is technically doable to host more than one RegionServer on a physical host,it is recommended to run only one server per host and to give it the resources you will have shared between the two servers.

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 编辑

        Table (HBase table)
        Region (Regions for the table)
        Store (Store per ColumnFamily for each Region for the table)
        MemStore (MemStore for each Store for each Region for the table)
        StoreFile (StoreFiles for each Store for each Region for the table)
        Block (Blocks within a StoreFile within a Store for each Region for the table)

        1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 anneng 编辑

          Master和regions
          1a880af4-a3ec-461f-bfce-3267dffeac4c-image.png
          https://dwgeek.com/hbase-architecture-components.html/

          07d6e61f-911b-4b6f-a5c2-c68a7b3f678e-image.png
          https://www.dummies.com/programming/big-data/hadoop/regionservers-in-hbase/

          1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 编辑

            Hbase架构
            https://data-flair.training/blogs/hbase-architecture/

            1 条回复 最后回复 回复 引用 0
            • A
              anneng 最后由 anneng 编辑

              Hbase 可靠性
              2ba0910e-6f34-4d99-bffe-1990c44160ef-image.png
              https://www.simplilearn.com/tutorials/hadoop-tutorial/hbase

              1 条回复 最后回复 回复 引用 0
              • First post
                Last post
              Powered by 暗能星系