暗能星系

    • 登录
    • 搜索

    曙光环境验证

    数据中心 IDC HPC
    1
    11
    34
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 编辑

      第一个节点
      ssh root@10.8.150.53 用户名/密码:root/TanzU2020_vip

      Architecture: x86_64
      CPU op-mode(s): 32-bit, 64-bit
      Byte Order: Little Endian
      CPU(s): 32
      On-line CPU(s) list: 0-31
      Thread(s) per core: 1
      Core(s) per socket: 32
      Socket(s): 1
      NUMA node(s): 4
      Vendor ID: HygonGenuine
      CPU family: 24
      Model: 0
      Model name: Hygon C86 7185 32-core Processor
      Stepping: 1
      CPU MHz: 2000.000
      CPU max MHz: 2000.0000
      CPU min MHz: 1200.0000
      BogoMIPS: 4000.15
      Virtualization: AMD-V
      L1d cache: 32K
      L1i cache: 64K
      L2 cache: 512K
      L3 cache: 8192K
      NUMA node0 CPU(s): 0-7
      NUMA node1 CPU(s): 8-15
      NUMA node2 CPU(s): 16-23
      NUMA node3 CPU(s): 24-31
      Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf eagerfpu pni monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 hw_pstate retpoline_amd ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca

      Memory block size: 128M
      Total online memory: 128G
      Total offline memory: 0B

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 anneng 编辑

        AMD的整个GPU体系基于ROCm
        95c3025c-b949-4c85-8484-6e3096271df0-image.png

        其中编程模型部分市HIP,如果是cuda的代码 可以使用hipify将其转成hip代码
        The Heterogeneous Computing Interface for Portability (HIP) is a vendor-neutral C++ programming model for implementing highly tuned workload for GPUs. HIP (like CUDA) is a dialect of C++ supporting templates, classes, lambdas, and other C++ constructs.

        A “hipify” tool is provided to ease conversion of CUDA codes to HIP, enabling code compilation for either AMD or NVIDIA GPU (CUDA) environments. The ROCm™ HIP compiler is based on Clang, the LLVM compiler infrastructure, and the “libc++” C++ standard library.

        d003df19-6c67-4adc-bfb7-ba15587782be-image.png

        =============
        cannot open file /mnt/repodata/repomd.xml
        因为下面这个仓库导致 禁用这个repo 设置enabled=0
        /etc/yum.repos.d/CentOS-Media.repo

        1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 编辑

          查看GPU个数
          sudo lshw -class display
          *-display
          description: Display controller
          product: Pre-Wukong DCU
          vendor: Pre-Wukong DCU
          physical id: 0
          bus info: pci@0000:63:00.0
          version: 04
          width: 64 bits
          clock: 33MHz
          capabilities: pm pciexpress msi bus_master cap_list rom
          configuration: driver=amdgpu latency=0
          resources: iomemory:880-87f iomemory:8c0-8bf irq:188 memory:8800000000-8bffffffff memory:8c00000000-8c001fffff memory:e4c00000-e4c7ffff memory:e4c80000-e4c9ffff

          曙光的机器有4块卡

          d3b07699-7d2a-45ae-bdc5-34cca393abfe-image.png

          1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 anneng 编辑

            oneTBB依赖 cmake3
            https://gist.github.com/1duo/38af1abd68a2c7fe5087532ab968574e
            wget https://cmake.org/files/v3.21/cmake-3.21.3.tar.gz
            tar zxvf cmake-3.*
            cd cmake-3.*
            ./bootstrap --prefix=/usr
            make -j$(nproc)
            make install
            cmake --version

            cmake version ..*
            CMake suite maintained and supported by Kitware (kitware.com/cmake).

            编译tbb
            cmake -DCMAKE_CXX_FLAGS=-DTBB_ALLOCATOR_TRAITS_BROKEN ..
            make -j
            make install

            A 1 条回复 最后回复 回复 引用 0
            • A
              anneng 最后由 anneng 编辑

              https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-porting-guide.html#porting-a-new-cuda-project

              使用这个文档转换segalign 代码

              hipexamine-perl.sh .
              hipify-perl --inplace
              hipconvertinplace-perl.sh .

              export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/lib

              CMakeLists.txt

              CXX=/opt/rocm/bin/hipcc cmake ..

              1 条回复 最后回复 回复 引用 0
              • A
                anneng @anneng 最后由 编辑

                @anneng https://cmake.org/cmake/help/latest/command/enable_language.html cmake在3.21版本正式支持了HIP 使用这个版本

                1 条回复 最后回复 回复 引用 0
                • A
                  anneng 最后由 编辑

                  曙光升级后 报错
                  [root@h01r1n08 ~]# /opt/rocm/bin/hipcc --version
                  Can't exec "/opt/rocm-4.0.1/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.1/hip/bin/hipconfig line 141.
                  Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.1/hip/bin/hipconfig line 142.
                  Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.1/hip/bin/hipconfig line 145.
                  Can't exec "/opt/rocm-4.0.1/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.1/hip/bin/hipconfig line 141.
                  Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.1/hip/bin/hipconfig line 142.
                  Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.1/hip/bin/hipconfig line 145.
                  Can't exec "/opt/rocm-4.0.1/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.1/hip/bin/hipconfig line 141.
                  Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.1/hip/bin/hipconfig line 142.
                  Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.1/hip/bin/hipconfig line 145.
                  Can't exec "/opt/rocm-4.0.1/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.1/hip/bin/hipconfig line 141.
                  Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.1/hip/bin/hipconfig line 142.
                  Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.1/hip/bin/hipconfig line 145.
                  Can't exec "/opt/rocm-4.0.1/llvm/bin/clang": No such file or directory at /opt/rocm/bin/hipcc line 203.
                  Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm/bin/hipcc line 204.
                  Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm/bin/hipcc line 208.
                  Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm/bin/hipcc line 846.

                  修复:缺少llvm包
                  yum install llvm-amdgpu

                  1 条回复 最后回复 回复 引用 0
                  • A
                    anneng 最后由 编辑

                    安装 rocm裸金属rocm4.0.1安装.docx

                    1 条回复 最后回复 回复 引用 0
                    • A
                      anneng 最后由 anneng 编辑

                      [root@h01r1n08 build]# cmake -DCMAKE_BUILD_TYPE=Release -DTBB_ROOT=${PWD}/../submodules/TBB -DCMAKE_PREFIX_PATH=${PWD}/../submodules/TBB/cmake ..
                      -- The CXX compiler identification is GNU 7.3.1
                      -- The HIP compiler identification is Clang 12.0.0
                      CMake Error at /usr/local/share/cmake-3.22/Modules/CMakeDetermineHIPCompiler.cmake:105 (message):
                      The ROCm root directory:

                      /opt/rocm-4.0.1

                      does not contain the HIP runtime CMake package, expected at:

                      /opt/rocm-4.0.1/lib/cmake/hip-lang/hip-lang-config.cmake

                      Call Stack (most recent call first):
                      CMakeLists.txt:3 (project)

                      -- Configuring incomplete, errors occurred!
                      See also "/home/anneng/SegAlign/build/CMakeFiles/CMakeOutput.log".

                      ===========================

                      -- The CXX compiler identification is GNU 7.3.1
                      -- The HIP compiler identification is Clang 12.0.0
                      CMake Error at /usr/local/share/cmake-3.22/Modules/CMakeDetermineHIPCompiler.cmake:105 (message):
                      The ROCm root directory:

                      /opt/rocm-4.0.1

                      does not contain the HIP runtime CMake package, expected at:

                      /opt/rocm-4.0.1/lib/cmake/hip-lang/hip-lang-config.cmake

                      Call Stack (most recent call first):
                      CMakeLists.txt:3 (project)

                      -- Configuring incomplete, errors occurred!

                      =======rocm-hip-sdk在4.5上面有=======
                      之前给您装的是4.0.1的rocm,没有支持rocm-hip-sdk

                      ==还有类似的几个包也需要安装下==========
                      yum install rocm-language-runtime
                      yum install rocm-hip-runtime
                      yum install rocm-hip-runtime-devel
                      yum install rocm-hip-library
                      yum install rocm-hip-libraries

                      1 条回复 最后回复 回复 引用 0
                      • A
                        anneng 最后由 编辑

                        /home/anneng/SegAlign/submodules/TBB/./src/tbbmalloc/proxy.cpp:299:68: error: alias must point to a defined variable or function
                        void *aligned_alloc(size_t alignment, size_t size) attribute ((alias ("memalign")));
                        ^
                        /home/anneng/SegAlign/submodules/TBB/./src/tbbmalloc/proxy.cpp:311:62: error: alias must point to a defined variable or function
                        void *__libc_calloc(size_t num, size_t size) attribute ((alias ("calloc")));
                        ^
                        /home/anneng/SegAlign/submodules/TBB/./src/tbbmalloc/proxy.cpp:312:70: error: alias must point to a defined variable or function
                        void *__libc_memalign(size_t alignment, size_t size) attribute ((alias ("memalign")));
                        ^
                        /home/anneng/SegAlign/submodules/TBB/./src/tbbmalloc/proxy.cpp:313:51: error: alias must point to a defined variable or function
                        void *__libc_pvalloc(size_t size) attribute ((alias ("pvalloc")));
                        ^
                        /home/anneng/SegAlign/submodules/TBB/./src/tbbmalloc/proxy.cpp:314:50: error: alias must point to a defined variable or function
                        void *__libc_valloc(size_t size) attribute ((alias ("valloc")));

                        1 条回复 最后回复 回复 引用 0
                        • A
                          anneng 最后由 编辑

                          编译TBB的时候clang 找不到
                          export PATH=$PATH:/opt/rocm-4.5.0/llvm/bin/

                          1 条回复 最后回复 回复 引用 0
                          • First post
                            Last post
                          Powered by 暗能星系