暗能星系

    • 登录
    • 搜索

    曙光环境验证

    数据中心 IDC HPC
    1
    11
    34
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 anneng 编辑

      AMD的整个GPU体系基于ROCm
      95c3025c-b949-4c85-8484-6e3096271df0-image.png

      其中编程模型部分市HIP,如果是cuda的代码 可以使用hipify将其转成hip代码
      The Heterogeneous Computing Interface for Portability (HIP) is a vendor-neutral C++ programming model for implementing highly tuned workload for GPUs. HIP (like CUDA) is a dialect of C++ supporting templates, classes, lambdas, and other C++ constructs.

      A “hipify” tool is provided to ease conversion of CUDA codes to HIP, enabling code compilation for either AMD or NVIDIA GPU (CUDA) environments. The ROCm™ HIP compiler is based on Clang, the LLVM compiler infrastructure, and the “libc++” C++ standard library.

      d003df19-6c67-4adc-bfb7-ba15587782be-image.png

      =============
      cannot open file /mnt/repodata/repomd.xml
      因为下面这个仓库导致 禁用这个repo 设置enabled=0
      /etc/yum.repos.d/CentOS-Media.repo

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 编辑

        查看GPU个数
        sudo lshw -class display
        *-display
        description: Display controller
        product: Pre-Wukong DCU
        vendor: Pre-Wukong DCU
        physical id: 0
        bus info: pci@0000:63:00.0
        version: 04
        width: 64 bits
        clock: 33MHz
        capabilities: pm pciexpress msi bus_master cap_list rom
        configuration: driver=amdgpu latency=0
        resources: iomemory:880-87f iomemory:8c0-8bf irq:188 memory:8800000000-8bffffffff memory:8c00000000-8c001fffff memory:e4c00000-e4c7ffff memory:e4c80000-e4c9ffff

        曙光的机器有4块卡

        d3b07699-7d2a-45ae-bdc5-34cca393abfe-image.png

        1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 anneng 编辑

          oneTBB依赖 cmake3
          https://gist.github.com/1duo/38af1abd68a2c7fe5087532ab968574e
          wget https://cmake.org/files/v3.21/cmake-3.21.3.tar.gz
          tar zxvf cmake-3.*
          cd cmake-3.*
          ./bootstrap --prefix=/usr
          make -j$(nproc)
          make install
          cmake --version

          cmake version ..*
          CMake suite maintained and supported by Kitware (kitware.com/cmake).

          编译tbb
          cmake -DCMAKE_CXX_FLAGS=-DTBB_ALLOCATOR_TRAITS_BROKEN ..
          make -j
          make install

          A 1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 anneng 编辑

            https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-porting-guide.html#porting-a-new-cuda-project

            使用这个文档转换segalign 代码

            hipexamine-perl.sh .
            hipify-perl --inplace
            hipconvertinplace-perl.sh .

            export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/lib

            CMakeLists.txt

            CXX=/opt/rocm/bin/hipcc cmake ..

            1 条回复 最后回复 回复 引用 0
            • A
              anneng @anneng 最后由 编辑

              @anneng https://cmake.org/cmake/help/latest/command/enable_language.html cmake在3.21版本正式支持了HIP 使用这个版本

              1 条回复 最后回复 回复 引用 0
              • A
                anneng 最后由 编辑

                曙光升级后 报错
                [root@h01r1n08 ~]# /opt/rocm/bin/hipcc --version
                Can't exec "/opt/rocm-4.0.1/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.1/hip/bin/hipconfig line 141.
                Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.1/hip/bin/hipconfig line 142.
                Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.1/hip/bin/hipconfig line 145.
                Can't exec "/opt/rocm-4.0.1/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.1/hip/bin/hipconfig line 141.
                Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.1/hip/bin/hipconfig line 142.
                Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.1/hip/bin/hipconfig line 145.
                Can't exec "/opt/rocm-4.0.1/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.1/hip/bin/hipconfig line 141.
                Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.1/hip/bin/hipconfig line 142.
                Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.1/hip/bin/hipconfig line 145.
                Can't exec "/opt/rocm-4.0.1/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.1/hip/bin/hipconfig line 141.
                Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.1/hip/bin/hipconfig line 142.
                Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.1/hip/bin/hipconfig line 145.
                Can't exec "/opt/rocm-4.0.1/llvm/bin/clang": No such file or directory at /opt/rocm/bin/hipcc line 203.
                Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm/bin/hipcc line 204.
                Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm/bin/hipcc line 208.
                Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm/bin/hipcc line 846.

                修复:缺少llvm包
                yum install llvm-amdgpu

                1 条回复 最后回复 回复 引用 0
                • A
                  anneng 最后由 编辑

                  安装 rocm裸金属rocm4.0.1安装.docx

                  1 条回复 最后回复 回复 引用 0
                  • A
                    anneng 最后由 anneng 编辑

                    [root@h01r1n08 build]# cmake -DCMAKE_BUILD_TYPE=Release -DTBB_ROOT=${PWD}/../submodules/TBB -DCMAKE_PREFIX_PATH=${PWD}/../submodules/TBB/cmake ..
                    -- The CXX compiler identification is GNU 7.3.1
                    -- The HIP compiler identification is Clang 12.0.0
                    CMake Error at /usr/local/share/cmake-3.22/Modules/CMakeDetermineHIPCompiler.cmake:105 (message):
                    The ROCm root directory:

                    /opt/rocm-4.0.1

                    does not contain the HIP runtime CMake package, expected at:

                    /opt/rocm-4.0.1/lib/cmake/hip-lang/hip-lang-config.cmake

                    Call Stack (most recent call first):
                    CMakeLists.txt:3 (project)

                    -- Configuring incomplete, errors occurred!
                    See also "/home/anneng/SegAlign/build/CMakeFiles/CMakeOutput.log".

                    ===========================

                    -- The CXX compiler identification is GNU 7.3.1
                    -- The HIP compiler identification is Clang 12.0.0
                    CMake Error at /usr/local/share/cmake-3.22/Modules/CMakeDetermineHIPCompiler.cmake:105 (message):
                    The ROCm root directory:

                    /opt/rocm-4.0.1

                    does not contain the HIP runtime CMake package, expected at:

                    /opt/rocm-4.0.1/lib/cmake/hip-lang/hip-lang-config.cmake

                    Call Stack (most recent call first):
                    CMakeLists.txt:3 (project)

                    -- Configuring incomplete, errors occurred!

                    =======rocm-hip-sdk在4.5上面有=======
                    之前给您装的是4.0.1的rocm,没有支持rocm-hip-sdk

                    ==还有类似的几个包也需要安装下==========
                    yum install rocm-language-runtime
                    yum install rocm-hip-runtime
                    yum install rocm-hip-runtime-devel
                    yum install rocm-hip-library
                    yum install rocm-hip-libraries

                    1 条回复 最后回复 回复 引用 0
                    • A
                      anneng 最后由 编辑

                      /home/anneng/SegAlign/submodules/TBB/./src/tbbmalloc/proxy.cpp:299:68: error: alias must point to a defined variable or function
                      void *aligned_alloc(size_t alignment, size_t size) attribute ((alias ("memalign")));
                      ^
                      /home/anneng/SegAlign/submodules/TBB/./src/tbbmalloc/proxy.cpp:311:62: error: alias must point to a defined variable or function
                      void *__libc_calloc(size_t num, size_t size) attribute ((alias ("calloc")));
                      ^
                      /home/anneng/SegAlign/submodules/TBB/./src/tbbmalloc/proxy.cpp:312:70: error: alias must point to a defined variable or function
                      void *__libc_memalign(size_t alignment, size_t size) attribute ((alias ("memalign")));
                      ^
                      /home/anneng/SegAlign/submodules/TBB/./src/tbbmalloc/proxy.cpp:313:51: error: alias must point to a defined variable or function
                      void *__libc_pvalloc(size_t size) attribute ((alias ("pvalloc")));
                      ^
                      /home/anneng/SegAlign/submodules/TBB/./src/tbbmalloc/proxy.cpp:314:50: error: alias must point to a defined variable or function
                      void *__libc_valloc(size_t size) attribute ((alias ("valloc")));

                      1 条回复 最后回复 回复 引用 0
                      • A
                        anneng 最后由 编辑

                        编译TBB的时候clang 找不到
                        export PATH=$PATH:/opt/rocm-4.5.0/llvm/bin/

                        1 条回复 最后回复 回复 引用 0
                        • First post
                          Last post
                        Powered by 暗能星系