曙光环境验证
-
第一个节点
ssh root@10.8.150.53 用户名/密码:root/TanzU2020_vipArchitecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 1
Core(s) per socket: 32
Socket(s): 1
NUMA node(s): 4
Vendor ID: HygonGenuine
CPU family: 24
Model: 0
Model name: Hygon C86 7185 32-core Processor
Stepping: 1
CPU MHz: 2000.000
CPU max MHz: 2000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4000.15
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
NUMA node2 CPU(s): 16-23
NUMA node3 CPU(s): 24-31
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf eagerfpu pni monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 hw_pstate retpoline_amd ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smcaMemory block size: 128M
Total online memory: 128G
Total offline memory: 0B -
AMD的整个GPU体系基于ROCm

其中编程模型部分市HIP,如果是cuda的代码 可以使用hipify将其转成hip代码
The Heterogeneous Computing Interface for Portability (HIP) is a vendor-neutral C++ programming model for implementing highly tuned workload for GPUs. HIP (like CUDA) is a dialect of C++ supporting templates, classes, lambdas, and other C++ constructs.A “hipify” tool is provided to ease conversion of CUDA codes to HIP, enabling code compilation for either AMD or NVIDIA GPU (CUDA) environments. The ROCm
HIP compiler is based on Clang, the LLVM compiler infrastructure, and the “libc++” C++ standard library.
=============
cannot open file /mnt/repodata/repomd.xml
因为下面这个仓库导致 禁用这个repo 设置enabled=0
/etc/yum.repos.d/CentOS-Media.repo -
查看GPU个数
sudo lshw -class display
*-display
description: Display controller
product: Pre-Wukong DCU
vendor: Pre-Wukong DCU
physical id: 0
bus info: pci@0000:63:00.0
version: 04
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi bus_master cap_list rom
configuration: driver=amdgpu latency=0
resources: iomemory:880-87f iomemory:8c0-8bf irq:188 memory:8800000000-8bffffffff memory:8c00000000-8c001fffff memory:e4c00000-e4c7ffff memory:e4c80000-e4c9ffff曙光的机器有4块卡

-
oneTBB依赖 cmake3
https://gist.github.com/1duo/38af1abd68a2c7fe5087532ab968574e
wget https://cmake.org/files/v3.21/cmake-3.21.3.tar.gz
tar zxvf cmake-3.*
cd cmake-3.*
./bootstrap --prefix=/usr
make -j$(nproc)
make install
cmake --versioncmake version ..*
CMake suite maintained and supported by Kitware (kitware.com/cmake).编译tbb
cmake -DCMAKE_CXX_FLAGS=-DTBB_ALLOCATOR_TRAITS_BROKEN ..
make -j
make install -
使用这个文档转换segalign 代码
hipexamine-perl.sh .
hipify-perl --inplace
hipconvertinplace-perl.sh .export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/lib
CXX=/opt/rocm/bin/hipcc cmake ..
-
@anneng https://cmake.org/cmake/help/latest/command/enable_language.html cmake在3.21版本正式支持了HIP 使用这个版本
-
曙光升级后 报错
[root@h01r1n08 ~]# /opt/rocm/bin/hipcc --version
Can't exec "/opt/rocm-4.0.1/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.1/hip/bin/hipconfig line 141.
Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.1/hip/bin/hipconfig line 142.
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.1/hip/bin/hipconfig line 145.
Can't exec "/opt/rocm-4.0.1/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.1/hip/bin/hipconfig line 141.
Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.1/hip/bin/hipconfig line 142.
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.1/hip/bin/hipconfig line 145.
Can't exec "/opt/rocm-4.0.1/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.1/hip/bin/hipconfig line 141.
Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.1/hip/bin/hipconfig line 142.
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.1/hip/bin/hipconfig line 145.
Can't exec "/opt/rocm-4.0.1/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.1/hip/bin/hipconfig line 141.
Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.1/hip/bin/hipconfig line 142.
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.1/hip/bin/hipconfig line 145.
Can't exec "/opt/rocm-4.0.1/llvm/bin/clang": No such file or directory at /opt/rocm/bin/hipcc line 203.
Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm/bin/hipcc line 204.
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm/bin/hipcc line 208.
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm/bin/hipcc line 846.修复:缺少llvm包
yum install llvm-amdgpu -
安装 rocm裸金属rocm4.0.1安装.docx
-
[root@h01r1n08 build]# cmake -DCMAKE_BUILD_TYPE=Release -DTBB_ROOT=${PWD}/../submodules/TBB -DCMAKE_PREFIX_PATH=${PWD}/../submodules/TBB/cmake ..
-- The CXX compiler identification is GNU 7.3.1
-- The HIP compiler identification is Clang 12.0.0
CMake Error at /usr/local/share/cmake-3.22/Modules/CMakeDetermineHIPCompiler.cmake:105 (message):
The ROCm root directory:/opt/rocm-4.0.1
does not contain the HIP runtime CMake package, expected at:
/opt/rocm-4.0.1/lib/cmake/hip-lang/hip-lang-config.cmake
Call Stack (most recent call first):
CMakeLists.txt:3 (project)-- Configuring incomplete, errors occurred!
See also "/home/anneng/SegAlign/build/CMakeFiles/CMakeOutput.log".===========================
-- The CXX compiler identification is GNU 7.3.1
-- The HIP compiler identification is Clang 12.0.0
CMake Error at /usr/local/share/cmake-3.22/Modules/CMakeDetermineHIPCompiler.cmake:105 (message):
The ROCm root directory:/opt/rocm-4.0.1
does not contain the HIP runtime CMake package, expected at:
/opt/rocm-4.0.1/lib/cmake/hip-lang/hip-lang-config.cmake
Call Stack (most recent call first):
CMakeLists.txt:3 (project)-- Configuring incomplete, errors occurred!
=======rocm-hip-sdk在4.5上面有=======
之前给您装的是4.0.1的rocm,没有支持rocm-hip-sdk==还有类似的几个包也需要安装下==========
yum install rocm-language-runtime
yum install rocm-hip-runtime
yum install rocm-hip-runtime-devel
yum install rocm-hip-library
yum install rocm-hip-libraries -
/home/anneng/SegAlign/submodules/TBB/./src/tbbmalloc/proxy.cpp:299:68: error: alias must point to a defined variable or function
void *aligned_alloc(size_t alignment, size_t size) attribute ((alias ("memalign")));
^
/home/anneng/SegAlign/submodules/TBB/./src/tbbmalloc/proxy.cpp:311:62: error: alias must point to a defined variable or function
void *__libc_calloc(size_t num, size_t size) attribute ((alias ("calloc")));
^
/home/anneng/SegAlign/submodules/TBB/./src/tbbmalloc/proxy.cpp:312:70: error: alias must point to a defined variable or function
void *__libc_memalign(size_t alignment, size_t size) attribute ((alias ("memalign")));
^
/home/anneng/SegAlign/submodules/TBB/./src/tbbmalloc/proxy.cpp:313:51: error: alias must point to a defined variable or function
void *__libc_pvalloc(size_t size) attribute ((alias ("pvalloc")));
^
/home/anneng/SegAlign/submodules/TBB/./src/tbbmalloc/proxy.cpp:314:50: error: alias must point to a defined variable or function
void *__libc_valloc(size_t size) attribute ((alias ("valloc"))); -
编译TBB的时候clang 找不到
export PATH=$PATH:/opt/rocm-4.5.0/llvm/bin/