暗能星系

    • 登录
    • 搜索

    Hbase批量导入数据

    大数据
    1
    12
    54
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 编辑

      java.lang.Exception: java.lang.IllegalArgumentException: TsvParser only supports single-byte separators
      at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
      at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552)
      Caused by: java.lang.IllegalArgumentException: TsvParser only supports single-byte separators
      at org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:142)
      at org.apache.hadoop.hbase.mapreduce.ImportTsv$TsvParser.<init>(ImportTsv.java:161)
      at org.apache.hadoop.hbase.mapreduce.TsvImporterMapper.setup(TsvImporterMapper.java:108)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
      at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 编辑

        因为blastdbcmd导出的数据有很多字符 为了满足 importtsv 的格式要求 我们需要以tab分割 分割tab有两个方法:
        1.把下面的命令保存到sh文件中 方便输入tab
        blastdbcmd -db nt -entry all -out newnt1.fa -outfmt '%a %t %T %s'

        2.在命令行首先输入ctrl+v 然后输入tab 就可以直接把tab输入到命令行

        1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 编辑

          hbase的cell有最大长度限制 默认10MB 需要禁用掉
          <property>
          <name>hbase.client.keyvalue.maxsize</name>
          <value>0</value>
          </property>

          配置修改后要同步到各个节点 然后重启 hbase
          stop-hbase.sh
          start-hbase.sh

          1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 anneng 编辑

            https://stackoverflow.com/questions/14326308/how-to-include-hbase-site-xml-in-the-classpath

            如何把hbase-site.xml加入到classpath 使其生效

            export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/hadoop/hbase-2.3.4/lib/:/opt/hadoop/hadoop-3.2.2/lib/native/:/opt/hadoop/hbase-2.3.4/conf

            /opt/hadoop/hbase-2.3.4/conf 必须加上这个让客户端读取hbase-site.xml

            1 条回复 最后回复 回复 引用 0
            • A
              anneng 最后由 编辑

              2021-04-07 17:37:09,260 INFO [htable-pool3866-t1] client.AsyncRequestFutureImpl (AsyncRequestFutureImpl.java:resubmit(763)) - id=2899, table=nt, attempt=12/16, failureCount=1ops, last exception=org.apache.hadoop.hbase.RegionTooBusyException: org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=512.0 M, regionName=1efaf86c7220c641629a386589e1a8ef, server=anneng01,16020,1617700403892
              at org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4535)
              at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4101)
              at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4041)
              at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1081)
              at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:1013)
              at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:978)
              at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2828)
              at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:44870)
              at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393)
              at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
              at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
              at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
              on anneng01,16020,1617700403892, tracking started null, retrying after=20194ms, operationsToReplay=1

              hbase.hregion.memstore.flush.size=1024M
              hbase.hregion.memstore.block.multiplier=4
              
              1 条回复 最后回复 回复 引用 0
              • A
                anneng 最后由 编辑

                Hbase支持两种调用方法 一种直接调用类 一种使用hadoop的driver机制
                Explicit Classname
                $ bin/hbase org.apache.hadoop.hbase.tool.LoadIncrementalHFiles hdfs://storefileoutput <tablename>
                Driver
                HADOOP_CLASSPATH=${HBASE_HOME}/bin/hbase classpath ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-mapreduce-VERSION.jar completebulkload hdfs://storefileoutput <tablename>

                https://www.oreilly.com/library/view/learning-hbase/9781783985944/ch06s05.html
                hadoop的driver机制 还可以调用下面这些hbase的功能:
                completebulkload: This is for a bulk data load
                copytable: This is to export a table data from the local to peer cluster
                export: This is to export data from an HBase table to HDFS as a sequence file
                import: This is to import data written by export
                importtsv: This is to import data in TSV format to HBase
                rowcounter: This is to count rows in an HBase table using MapReduce
                verifyrep: This is to compare the data from tables of different clusters

                1 条回复 最后回复 回复 引用 0
                • A
                  anneng 最后由 编辑

                  有一次运行 下面的参数写错了 应该是importtsv.bulk.output
                  -Dimporttst.bulk.output=hdfs://192.168.1.2:7000/nt/hfiles/

                  importtsv 在这个参数错误的情况下 顺利执行了 但是不知道把结果写入到哪里去了

                  1 条回复 最后回复 回复 引用 0
                  • A
                    anneng 最后由 anneng 编辑

                    java.lang.reflect.InvocationTargetException
                    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:498)
                    at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:64)
                    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:498)
                    at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
                    at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
                    Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
                    at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:253)
                    at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:300)
                    at org.apache.hadoop.hbase.io.hfile.HFile.isHFileFormat(HFile.java:590)
                    at org.apache.hadoop.hbase.io.hfile.HFile.isHFileFormat(HFile.java:571)
                    at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.visitBulkHFiles(LoadIncrementalHFiles.java:1072)
                    at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.discoverLoadQueue(LoadIncrementalHFiles.java:988)
                    at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.prepareHFileQueue(LoadIncrementalHFiles.java:249)
                    at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:356)
                    at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:1216)
                    at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:1229)
                    at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:1264)
                    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
                    at org.apache.hadoop.hbase.tool.BulkLoadHFilesTool.main(BulkLoadHFilesTool.java:66)
                    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:498)
                    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
                    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
                    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
                    ... 11 more
                    Command exited with non-zero status 1

                    不知道什么原因 执行下面的方法就好了
                    java -cp /opt/hadoop/hbase-2.3.4/lib/hbase-mapreduce-2.3.4.jar:/opt/hadoop/hbase-2.3.4/lib/hbase-server-2.3.4.jar:/opt/hadoop/hbase-2.3.4/lib/:/opt/hadoop/hadoop-3.2.2/share/hadoop/common/:/opt/hadoop/hadoop-3.2.2/share/hadoop/common/lib/ org.apache.hadoop.hbase.tool.BulkLoadHFilesTool hdfs://192.168.1.2:7000/nt/hfiles/ nt

                    1 条回复 最后回复 回复 引用 0
                    • A
                      anneng 最后由 编辑

                      可以提前把文件下载好 然后执行krakenuniq-download 就会自动检测已经下载好的文件
                      293d9cbe-faae-48f8-a0e9-77481cf57599-image.png

                      1 条回复 最后回复 回复 引用 0
                      • A
                        anneng 最后由 anneng 编辑

                        java -cp /opt/hadoop/hbase-2.3.4/lib/hbase-mapreduce-2.3.4.jar:/opt/hadoop/hbase-2.3.4/lib/hbase-server-2.3.4.jar:/opt/hadoop/hbase-2.3.4/lib/*:/opt/hadoop/hadoop-3.2.2/share/hadoop/common/*:/opt/hadoop/hadoop-3.2.2/share/hadoop/common/lib/* org.apache.hadoop.hbase.tool.BulkLoadHFilesTool hdfs://192.168.1.2:7000/nt/hfiles/ nt
                        

                        加载上一步生成的文件 这一步时间很快 可以忽略

                        1 条回复 最后回复 回复 引用 0
                        • A
                          anneng 最后由 anneng 编辑

                          2021-04-09 18:38:33,008 ERROR [main] tool.LoadIncrementalHFiles (LoadIncrementalHFiles.java:checkHFilesCountPerRegionPerFamily(610)) - Trying to load more than 32 hfiles to family seq of region with start key
                          2021-04-09 18:38:33,019 INFO [main] client.ConnectionImplementation (ConnectionImplementation.java:closeMasterService(1898)) - Closing master protocol: MasterService
                          Exception in thread "main" java.io.IOException: Trying to load more than 32 hfiles to one family of one region
                          at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.performBulkLoad(LoadIncrementalHFiles.java:455)
                          at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:367)
                          at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:1216)
                          at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:1229)
                          at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:1264)
                          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
                          at org.apache.hadoop.hbase.tool.BulkLoadHFilesTool.main(BulkLoadHFilesTool.java:66)

                          java -cp /opt/hadoop/hbase-2.3.4/lib/hbase-mapreduce-2.3.4.jar:/opt/hadoop/hbase-2.3.4/lib/hbase-server-2.3.4.jar:/opt/hadoop/hbase-2.3.4/lib/:/opt/hadoop/hadoop-3.2.2/share/hadoop/common/:/opt/hadoop/hadoop-3.2.2/share/hadoop/common/lib/* org.apache.hadoop.hbase.tool.BulkLoadHFilesTool -Dhbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily=1024 hdfs://192.168.1.2:7000/nt/hfiles2/ nt

                          1 条回复 最后回复 回复 引用 0
                          • First post
                            Last post
                          Powered by 暗能星系