<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[问题记录及解决]]></title><description><![CDATA[开发中遇到的问题]]></description><link>http://an.forum.genostack.com/category/24</link><generator>RSS for Node</generator><lastBuildDate>Sat, 13 Jun 2026 11:36:01 GMT</lastBuildDate><atom:link href="http://an.forum.genostack.com/category/24.rss" rel="self" type="application/rss+xml"/><pubDate>Sat, 16 Aug 2025 02:04:24 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[slurm ubuntu环境安装]]></title><description><![CDATA[<p dir="auto"><a href="https://www.cnblogs.com/yanshier/p/18670428" rel="nofollow ugc">https://www.cnblogs.com/yanshier/p/18670428</a><br />
<a href="https://blog.csdn.net/qq_46264842/article/details/149001898" rel="nofollow ugc">https://blog.csdn.net/qq_46264842/article/details/149001898</a></p>
]]></description><link>http://an.forum.genostack.com/topic/1128/slurm-ubuntu环境安装</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1128/slurm-ubuntu环境安装</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Sat, 16 Aug 2025 02:04:24 GMT</pubDate></item><item><title><![CDATA[ubuntu 18.04 升级gcc4.8到gcc4.9]]></title><description><![CDATA[<p dir="auto"><a href="https://freedium.cfd/https://medium.com/@orhanakal/install-gcc-4-9-and-g-4-9-on-ubuntu-18-04-6888b92e5bab" rel="nofollow ugc">https://freedium.cfd/https://medium.com/@orhanakal/install-gcc-4-9-and-g-4-9-on-ubuntu-18-04-6888b92e5bab</a></p>
]]></description><link>http://an.forum.genostack.com/topic/1126/ubuntu-18-04-升级gcc4-8到gcc4-9</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1126/ubuntu-18-04-升级gcc4-8到gcc4-9</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 05 Jun 2025 07:55:12 GMT</pubDate></item><item><title><![CDATA[jupyter中的网络策略配置，控制jupyter-user-pod 是否能访问别的pod]]></title><description><![CDATA[<p dir="auto">jupyter装文件中values.yaml,中：networkPolicy默认为true</p>
<pre><code>networkPolicy:
  enabled: true
</code></pre>
<p dir="auto">禁止pod 和pod相互访问<br />
在需要pod之间相互通讯是需要设置成</p>
<pre><code>networkPolicy:
  enabled: false
</code></pre>
<p dir="auto">原理介绍：<br />
该配置是k8s 中的</p>
<pre><code>(base) root@node1:/opt/app/genostack_v3_service/jupyter# kubectl get netpol -n jhub
NAME         POD-SELECTOR                                              AGE
hub          app=jupyterhub,component=hub,release=jhub                 19m
proxy        app=jupyterhub,component=proxy,release=jhub               19m
singleuser   app=jupyterhub,component=singleuser-server,release=jhub   19m
</code></pre>
<pre><code>apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: restrict-pod-access
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: restricted-app
  policyTypes:
    - Egress
  egress:
    - to: []
      ports: []
</code></pre>
]]></description><link>http://an.forum.genostack.com/topic/1124/jupyter中的网络策略配置-控制jupyter-user-pod-是否能访问别的pod</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1124/jupyter中的网络策略配置-控制jupyter-user-pod-是否能访问别的pod</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Wed, 09 Apr 2025 05:56:50 GMT</pubDate></item><item><title><![CDATA[Ceph OSD无法启动 报时间不同步]]></title><description><![CDATA[<p dir="auto">问题描述：ceph osd 无法正常启动，提示时间不正确。<br />
解决方案：<br />
将集群的各个节点进行同步，但是由于集群不能连接互联网，我们无法使用外部的时间服务器。办公的Windows可以临时用热点联网，时间是同步的，因此我们考虑用Windows作为NTP服务器（IP地址为 192.168.10.11），将集群的Ubuntu作为客户端，都与该服务器进行时间同步。<br />
1.将Windows设置为NTP服务器<br />
<a href="https://support.industry.siemens.com/cs/document/22144502/how-do-you-configure-your-pc-as-ntp-server-?dti=0&amp;lc=en-AE" rel="nofollow ugc">https://support.industry.siemens.com/cs/document/22144502/how-do-you-configure-your-pc-as-ntp-server-?dti=0&amp;lc=en-AE</a><br />
通过“开始”&gt;“控制面板”&gt;“系统和安全”&gt;“管理工具”&gt;“服务”停止“Windows 时间”服务。 （对于 Windows 11：开始 &gt; 控制面板 &gt; 系统和安全 &gt; Windows 工具 &gt; 服务）</p>
<p dir="auto">通过“开始 &gt; 运行... &gt; regedit”打开注册表编辑器。<br />
"HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\W32Time\TimeProviders\NtpServer".<br />
设置 "Enable" 为1<br />
"HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\W32Time\Config".<br />
AnnounceFlags  为5  可以直接搜索一下这键 不同的操作系统版本可能路径有差别。<br />
其他Windows time服务并设置为自动<br />
2.Ubuntu是用timedatectl作为时钟同步服务的<br />
/etc/systemd/timesyncd.conf<br />
[Time]<br />
NTP=192.168.10.11    Windows的IP<br />
RootDistanceMaxSec=50<br />
Systemctl restart system-timesyncd<br />
3.OSD自动会恢复为Running状态</p>
]]></description><link>http://an.forum.genostack.com/topic/1123/ceph-osd无法启动-报时间不同步</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1123/ceph-osd无法启动-报时间不同步</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 03 Apr 2025 14:56:22 GMT</pubDate></item><item><title><![CDATA[ceph 设置85%阈值]]></title><description><![CDATA[<p dir="auto">ceph osd crush reweight osd.10 9.0</p>
]]></description><link>http://an.forum.genostack.com/topic/1122/ceph-设置85-阈值</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1122/ceph-设置85-阈值</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Thu, 03 Apr 2025 03:54:30 GMT</pubDate></item><item><title><![CDATA[postgres数据库在电脑关机重启后，状态不一致导致无法启动]]></title><description><![CDATA[<p dir="auto">报错日志：</p>
<pre><code>PostgreSQL Database directory appears to contain a database; Skipping initialization
2025-04-02 13:24:07.154 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2025-04-02 13:24:07.154 UTC [1] LOG: listening on IPv6 address "::", port 5432
2025-04-02 13:24:07.157 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-04-02 13:24:07.411 UTC [26] LOG: database system was shut down at 2025-04-02 11:47:48 UTC
2025-04-02 13:24:07.411 UTC [26] LOG: invalid record length at 8/DBE56CD8: wanted 24, got 0
2025-04-02 13:24:07.411 UTC [26] LOG: invalid primary checkpoint record
2025-04-02 13:24:07.411 UTC [26] LOG: invalid resource manager ID in secondary checkpoint record
2025-04-02 13:24:07.411 UTC [26] PANIC: could not locate a valid checkpoint record
2025-04-02 13:24:08.323 UTC [1] LOG: startup process (PID 26) was terminated by signal 6: Aborted
2025-04-02 13:24:08.323 UTC [1] LOG: aborting startup due to startup process failure
2025-04-02 13:24:08.338 UTC [1] LOG: database system is shut down
</code></pre>
<p dir="auto">解决措施：<br />
<a href="%E9%93%BE%E6%8E%A5%E5%9C%B0%E5%9D%80">https://stackoverflow.com/questions/71258094/invalid-resource-manager-id-in-primary-checkpoint-record</a></p>
<pre><code>pg_resetwal /var/lib/postgresql/data/pgdata
</code></pre>
]]></description><link>http://an.forum.genostack.com/topic/1121/postgres数据库在电脑关机重启后-状态不一致导致无法启动</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1121/postgres数据库在电脑关机重启后-状态不一致导致无法启动</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Thu, 03 Apr 2025 01:37:52 GMT</pubDate></item><item><title><![CDATA[open Web UI 配置title&#x2F;默认语言]]></title><description><![CDATA[<h3>1、默认title配置</h3>
<p dir="auto">目前代码backend中有个congfig.py文件，打开增加一行代码</p>
<pre><code>WEBUI_NAME = _get_env_by_type("WEBUI_NAME", dft="灵岸")
</code></pre>
<p dir="auto">然后在.env中才能生效</p>
<h3>2、配置语言</h3>
<p dir="auto">目前docker有映射一个dada目录，在目录中找到config.json文件，修改如下<code>  "default_locale": "zh-CN",</code>：</p>
<pre><code>
{
    "version": 0,
    "ui": {
        "default_locale": "zh-CN",
        "prompt_suggestions": [
            {
                "title": [
                    "Help me study",
                    "vocabulary for a college entrance exam"
                ],
                "content": "Help me study vocabulary: write a sentence for me to fill in the blank, and I'll try to pick the correct option."
            }
        ]
    }
}
</code></pre>
<p dir="auto">最后重启灵岸docker服务</p>
]]></description><link>http://an.forum.genostack.com/topic/1120/open-web-ui-配置title-默认语言</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1120/open-web-ui-配置title-默认语言</guid><dc:creator><![CDATA[zhangfanglin]]></dc:creator><pubDate>Sat, 22 Mar 2025 13:47:24 GMT</pubDate></item><item><title><![CDATA[三黍zip流下载配置更新记录]]></title><description><![CDATA[<h3>问题排查初期（第一天）：</h3>
<p dir="auto">1、报错<br />
docker内部ng</p>
<pre><code>2025/03/12 18:43:34 [warn] 29#29: *218120 an upstream response is buffered to a temporary file /tmp/fastcgi_temp/0000006707 while reading upstream, client: 10.233.90.0, server: _, request: "GET /api/dow_report/c656a052-9ea1-8de2-efd0-11c7a051eb6c?type=raw HTTP/1.0", upstream: "fastcgi://127.0.0.1:8991", host: "omic.sanshugroup.com"
10.233.90.0 - - [12/Mar/2025:18:47:51 +0800] "GET /api/dow_report/c656a052-9ea1-8de2-efd0-11c7a051eb6c?type=raw HTTP/1.0" 200 52940341857 "-" "Wget/1.19.4 (linux-gnu)" "103.114.101.5" 265.672 265.672 . -
</code></pre>
<p dir="auto">服务器NG</p>
<pre><code>2025/03/12 18:43:20 [error] 2052141#0: *166617 readv() failed (104: Connection reset by peer) while reading upstream, client: 103.114.101.5, server: omic.sanshugroup.com, request: "GET /api/dow_report/c656a052-9ea1-8de2-efd0-11c7a051eb6c?type=raw HTTP/1.1", upstream: "http://192.168.30.202:30000/api/dow_report/c656a052-9ea1-8de2-efd0-11c7a051eb6c?type=raw", host: "omic.sanshugroup.com"
2025/03/12 18:53:32 [error] 2052141#0: *166073 readv() failed (104: Connection reset by peer) while reading upstream, client: 113.132.179.7, server: omic.sanshugroup.com, request: "GET /api/dow_report/c656a052-9ea1-8de2-efd0-11c7a051eb6c?type=raw HTTP/1.1", upstream: "http://192.168.30.202:30000/api/dow_report/c656a052-9ea1-8de2-efd0-11c7a051eb6c?type=raw", host: "omic.sanshugroup.com", referrer: "https://omic.sanshugroup.com/"

</code></pre>
<p dir="auto">2、问题排查<br />
根据上放问题是临时文件的缓冲区不够<br />
修改NGINX配置</p>
<pre><code>	proxy_buffer_size   32m;
	proxy_buffers       12 16m;
	proxy_busy_buffers_size   64m;
        fastcgi_connect_timeout 24h;
        fastcgi_send_timeout 24h;
        fastcgi_read_timeout 24h;
</code></pre>
<h3>问题排查初期（第二天）：</h3>
<p dir="auto">1、每次下载执行到49.30G就无故断开，连续测试多次</p>
<pre><code>secure_download?key=54d4e2deca0598059daa524255a9c685     [                                                                                                                  &lt;=&gt; ]  49.30G  4.57MB/s    in 3h 21m 
</code></pre>
<p dir="auto">2、修改php配置<br />
但是实际观察占用内存并不是很高</p>
<pre><code>ini_set('memory_limit', '100G');
</code></pre>
<p dir="auto">测试还是和第二天一样</p>
<h3>问题排查初期（第三天）：</h3>
<p dir="auto">找到外部nginx配置，修改如下：</p>
<pre><code>        client_header_buffer_size 20m;
	large_client_header_buffers 4 18m;
	client_max_body_size 100g;
	proxy_max_temp_file_size 1001024m;
</code></pre>
<h3>晚上同时执行两个下载</h3>
<p dir="auto">浏览器和命令行，两个下载均已成功</p>
<h3>总结：</h3>
<p dir="auto">问题报错与断开原因有两个<br />
1、缓冲区的配置不够，配置如下</p>
<pre><code>	proxy_buffer_size   32m;
	proxy_buffers       12 16m;
	proxy_busy_buffers_size   64m;
</code></pre>
<p dir="auto">2、固定下载到49.30G断开，配置如下</p>
<pre><code>	client_max_body_size 100g;
	proxy_max_temp_file_size 1001024m;
</code></pre>
]]></description><link>http://an.forum.genostack.com/topic/1119/三黍zip流下载配置更新记录</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1119/三黍zip流下载配置更新记录</guid><dc:creator><![CDATA[zhangfanglin]]></dc:creator><pubDate>Fri, 14 Mar 2025 03:02:25 GMT</pubDate></item><item><title><![CDATA[三黍运维文档]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="http://an.forum.genostack.com/uid/4">@zhanglu</a> # 取消了docker的开机自动启动<br />
mount /dev/sdg1 /public<br />
systemctl start docker</p>
]]></description><link>http://an.forum.genostack.com/topic/1117/三黍运维文档</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1117/三黍运维文档</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Thu, 02 Jan 2025 10:27:24 GMT</pubDate></item><item><title><![CDATA[晶能-local环境运维记录]]></title><description><![CDATA[<p dir="auto">海云解读系统，hdfs：路径 192.168.0.210  ，/opt/app/hadoop-2.8.5</p>
]]></description><link>http://an.forum.genostack.com/topic/1116/晶能-local环境运维记录</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1116/晶能-local环境运维记录</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Wed, 25 Dec 2024 09:05:14 GMT</pubDate></item><item><title><![CDATA[rook-ceph问题记录（吉凯）]]></title><description><![CDATA[<p dir="auto"><a href="/assets/uploads/files/1733822436483-%E6%9C%AC%E5%9C%B0%E6%8C%82%E8%BD%BD%E7%A3%81%E7%9B%98%E7%AA%81%E7%84%B6%E5%8D%A1%E9%A1%BF%E7%8E%B0%E8%B1%A1-ceph.pptx">本地挂载磁盘突然卡顿现象-ceph.pptx</a></p>
<p dir="auto">吉凯集群(node2,node3, node4)中，node2的磁盘sdb松动，导致osd一直无法正常启动， 重新插拔重启</p>
]]></description><link>http://an.forum.genostack.com/topic/1115/rook-ceph问题记录-吉凯</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1115/rook-ceph问题记录-吉凯</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Tue, 10 Dec 2024 09:22:03 GMT</pubDate></item><item><title><![CDATA[k8s问题记录]]></title><description><![CDATA[<p dir="auto">晶能环境k8s v1.28.6 kube-controller-scheduler 三个podcpu使用率过高。剩余的cpu会占用完，需要手动限制cpu<br />
kubectl edit  DaemonSet kube-controller-scheduler -n kube-system</p>
<pre><code>    resources:
      limits:
        cpu: 100m
</code></pre>
]]></description><link>http://an.forum.genostack.com/topic/1114/k8s问题记录</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1114/k8s问题记录</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Thu, 05 Dec 2024 11:25:27 GMT</pubDate></item><item><title><![CDATA[k8s gpu 配置]]></title><description><![CDATA[<p dir="auto"><a href="https://www.cnblogs.com/shook/p/17836015.html" rel="nofollow ugc">https://www.cnblogs.com/shook/p/17836015.html</a></p>
]]></description><link>http://an.forum.genostack.com/topic/1110/k8s-gpu-配置</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1110/k8s-gpu-配置</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Thu, 21 Nov 2024 02:26:49 GMT</pubDate></item><item><title><![CDATA[k8s gpu 配置]]></title><description><![CDATA[<p dir="auto"><a href="https://www.cnblogs.com/shook/p/17836015.html" rel="nofollow ugc">https://www.cnblogs.com/shook/p/17836015.html</a></p>
]]></description><link>http://an.forum.genostack.com/topic/1109/k8s-gpu-配置</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1109/k8s-gpu-配置</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Thu, 21 Nov 2024 02:06:51 GMT</pubDate></item><item><title><![CDATA[sftp断点续传]]></title><description><![CDATA[<p dir="auto">lftp -u BN240928NJ01S02N1,benagen0928 -p 29 sftp://111.46.78.239  -e 'get --continue /home/Drive/BC2024090921/BC2024090921-ONT-DNA-1samples/8-764-三 代/pass.fq.gz /cephfs_data/data' &amp;</p>
]]></description><link>http://an.forum.genostack.com/topic/1106/sftp断点续传</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1106/sftp断点续传</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Tue, 29 Oct 2024 03:32:29 GMT</pubDate></item><item><title><![CDATA[docker下载镜像代理方式]]></title><description><![CDATA[<p dir="auto">docker hub仓库无法下载镜像，可以使用代理方式<br />
docker pull <a href="http://daocloud.io/docker.io/dyrnq/kube-scheduler:v1.27.7" rel="nofollow ugc">daocloud.io/docker.io/dyrnq/kube-scheduler:v1.27.7</a><br />
docker pull dyrnq/kube-scheduler:v1.27.7<br />
docker pull <a href="http://dockerproxy.net/dyrnq/kube-scheduler:v1.23.9" rel="nofollow ugc">dockerproxy.net/dyrnq/kube-scheduler:v1.23.9</a><br />
docker pull dyrnq/kube-scheduler:v1.23.9<br />
docker pull <a href="http://dockerproxy.net/library/ubuntu:24.04" rel="nofollow ugc">dockerproxy.net/library/ubuntu:24.04</a><br />
docker pull ubuntu:24.04</p>
<p dir="auto">参考： <a href="https://github.com/DaoCloud/public-image-mirror" rel="nofollow ugc">https://github.com/DaoCloud/public-image-mirror</a></p>
]]></description><link>http://an.forum.genostack.com/topic/1102/docker下载镜像代理方式</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1102/docker下载镜像代理方式</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Mon, 21 Oct 2024 09:51:34 GMT</pubDate></item><item><title><![CDATA[三黍磁盘格式化，系统修复]]></title><description><![CDATA[<p dir="auto">acc8cf65-4c4d-4692-9723-94bbc521e27b-image.png</p>
]]></description><link>http://an.forum.genostack.com/topic/1101/三黍磁盘格式化-系统修复</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1101/三黍磁盘格式化-系统修复</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Thu, 17 Oct 2024 04:04:32 GMT</pubDate></item><item><title><![CDATA[rook-ceph磁盘删除重新添加操作记录]]></title><description><![CDATA[<p dir="auto">初始状态<br />
<img src="/assets/uploads/files/1729045569925-3af004ff-6466-4943-97c2-c1acb224ec5b-image.png" alt="3af004ff-6466-4943-97c2-c1acb224ec5b-image.png" class=" img-responsive img-markdown" /><br />
<img src="/assets/uploads/files/1729045585495-133f500d-3dc7-4107-8f44-d81852edea35-image.png" alt="133f500d-3dc7-4107-8f44-d81852edea35-image.png" class=" img-responsive img-markdown" /><br />
<img src="/assets/uploads/files/1729045639858-23c927fe-48f1-45b8-b617-bde2b7c5c35c-image.png" alt="23c927fe-48f1-45b8-b617-bde2b7c5c35c-image.png" class=" img-responsive img-markdown" /></p>
<p dir="auto">删除node3, sdc<br />
<img src="/assets/uploads/files/1729045711943-8a177d8c-7a1e-4477-ac01-d3b326443de2-image.png" alt="8a177d8c-7a1e-4477-ac01-d3b326443de2-image.png" class=" img-responsive img-markdown" /><br />
删除sdc的osd,  状态才会更新<br />
kubectl delete pod rook-ceph-osd-4-64b4f4f9b4-lpgts -n rook-ceph<br />
<img src="/assets/uploads/files/1729045894022-ace84ded-88c1-41bf-a83a-253526b9dc02-image.png" alt="ace84ded-88c1-41bf-a83a-253526b9dc02-image.png" class=" img-responsive img-markdown" /></p>
<p dir="auto"><img src="/assets/uploads/files/1729045993304-71bed423-3a01-497c-84cb-28da22895f70-image.png" alt="71bed423-3a01-497c-84cb-28da22895f70-image.png" class=" img-responsive img-markdown" /></p>
<p dir="auto">将osd4移除</p>
<pre><code>ceph osd out osd.4
ceph osd purge 4 --yes-i-really-mean-it
ceph osd crush remove osd.4
ceph auth rm osd.4
ceph osd rm osd.4


</code></pre>
<p dir="auto">kubectl delete deploy rook-ceph-osd-4 -n rook-ceph</p>
<p dir="auto"><img src="/assets/uploads/files/1729046578473-363b3e8a-32ee-4775-9021-623c4ee57a01-image.png" alt="363b3e8a-32ee-4775-9021-623c4ee57a01-image.png" class=" img-responsive img-markdown" /></p>
]]></description><link>http://an.forum.genostack.com/topic/1100/rook-ceph磁盘删除重新添加操作记录</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1100/rook-ceph磁盘删除重新添加操作记录</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Wed, 16 Oct 2024 08:12:30 GMT</pubDate></item><item><title><![CDATA[rook-ceph磁盘删除重新添加]]></title><description><![CDATA[<p dir="auto">Rook Ceph OSD异常，格式化osd硬盘重新挂载</p>
<h3>1、进入rook ceph toolbox工具pod</h3>
<pre><code>kubectl exec -it rook-ceph-tools-68754fc9dd-sj9h6 bash -n rook-ceph
</code></pre>
<h3>2、命名用ceph命令查询并删除osd</h3>
<p dir="auto">#查询状态，找到要移除的osd id</p>
<pre><code>ceph osd status
#标记移除的osd
ceph osd out osd.1
ceph osd purge 1 --yes-i-really-mean-it
ceph osd crush remove osd.1
ceph auth rm osd.1
ceph osd rm osd.1
</code></pre>
<h3>3、删除相关osd节点的deployment</h3>
<pre><code>kubectl delete deploy rook-ceph-osd-1 -n rook-ceph
</code></pre>
<h3>4、登录要删除osd所在的服务器，格式化osd硬盘</h3>
<pre><code>#检查硬盘路径
fdisk -l
#删除硬盘分区信息
DISK="/dev/sdb"
sgdisk --zap-all $DISK
#清理硬盘数据（hdd硬盘使用dd，ssd硬盘使用blkdiscard，二选一）
dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync
blkdiscard $DISK
#删除原osd的lvm信息（如果单个节点有多个osd，那么就不能用*拼配模糊删除，而根据lsblk -f查询出明确的lv映射信息再具体删除，参照第5项操作）
ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %
rm -rf /dev/ceph-*
#重启，sgdisk –zzap-all需要重启后才生效
reboot
</code></pre>
<p dir="auto">报错错误：cannot open /dev/sdb: Device or resource busy</p>
<h3>5、手动查看并删除原osd创建的lvm信息</h3>
<p dir="auto">（可选，根据第4步执行情况决定），否则格式化时会报 cannot open /dev/sdb: Device or resource busy 异常</p>
<p dir="auto">#查看lvm设备信息</p>
<pre><code>dmsetup ls;
#删除ceph osd lvm映射关系
dmsetup remove ceph--5a4cb4bb--70b3--40bd--9da7--09d4f264a513-osd-xxxxxxxxx
#移动lv
lvremove /dev/mapper/ceph--5a4cb4bb--70b3--40bd--9da7--09d4f264a513-osd—xxxxxxxxx
#删除相关文件
rm –rf /dev/ceph--5a4cb4bb--70b3--40bd--9da7--09d4f264a513
</code></pre>
<h3>5、重启ceph operator调度，使检测到格式化后的osd硬盘，osd启动后ceph集群会自动平衡数据</h3>
<p dir="auto">注：如果新osd pod无法执行起来可以通过查询osd prepare日志找问题</p>
<pre><code>kubectl -n rook-ceph logs rook-ceph-osd-prepare-node1-fvmrp provision
</code></pre>
]]></description><link>http://an.forum.genostack.com/topic/1093/rook-ceph磁盘删除重新添加</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1093/rook-ceph磁盘删除重新添加</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Tue, 15 Oct 2024 08:05:34 GMT</pubDate></item><item><title><![CDATA[ceph故障问题记录]]></title><description><![CDATA[<p dir="auto">ceph tools中执行ceph status会卡死   重启tools pod<br />
这个问题可能和mgr状态不对有关</p>
]]></description><link>http://an.forum.genostack.com/topic/1082/ceph故障问题记录</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1082/ceph故障问题记录</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Tue, 25 Jun 2024 03:07:07 GMT</pubDate></item><item><title><![CDATA[k8s 启动时 很多pod无法挂载volume]]></title><description><![CDATA[<p dir="auto">在原有的集群中 新增了三个节点 调试的时候一个节点突然死机 导致集群资源混乱</p>
<p dir="auto">使用delete命令将死机的节点先删除 重启k8s</p>
]]></description><link>http://an.forum.genostack.com/topic/1080/k8s-启动时-很多pod无法挂载volume</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1080/k8s-启动时-很多pod无法挂载volume</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sun, 23 Jun 2024 17:56:15 GMT</pubDate></item><item><title><![CDATA[吉凯服务器重启后集群恢复问题记录]]></title><description><![CDATA[<p dir="auto">节点142重启后需要关闭防火墙<br />
systemctl stop firewalld</p>
]]></description><link>http://an.forum.genostack.com/topic/1073/吉凯服务器重启后集群恢复问题记录</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1073/吉凯服务器重启后集群恢复问题记录</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Tue, 11 Jun 2024 06:44:58 GMT</pubDate></item><item><title><![CDATA[Failed to sync kube-dns config directory &#x2F;etc&#x2F;kube-dns, err: lstat &#x2F;etc&#x2F;kube-dns: no such file or directory]]></title><description><![CDATA[<p dir="auto">Failed to sync kube-dns config directory /etc/kube-dns, err: lstat /etc/kube-dns: no such file or directory</p>
<p dir="auto">/etc/resolv.conf<br />
nameserver 210.22.84.3</p>
]]></description><link>http://an.forum.genostack.com/topic/1072/failed-to-sync-kube-dns-config-directory-etc-kube-dns-err-lstat-etc-kube-dns-no-such-file-or-directory</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1072/failed-to-sync-kube-dns-config-directory-etc-kube-dns-err-lstat-etc-kube-dns-no-such-file-or-directory</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Wed, 05 Jun 2024 10:47:53 GMT</pubDate></item><item><title><![CDATA[slurm rest api 提交任务job id和系统squene差1]]></title><description><![CDATA[<p dir="auto">接口地址：<a href="http://192.168.0.135:6820/slurm/v0.0.40/job/submit" rel="nofollow ugc">http://192.168.0.135:6820/slurm/v0.0.40/job/submit</a></p>
<pre><code>{
    "job": {
        "name": "slurm_test220240428_13_55_34",
        "current_working_directory": "/cephfs_data/slurm_hpc/genostack_v3/genostack_core/genostack_php/slurm_run/c1/09c/42a-c99c-37ed-0e3f-2b2c8ddf3860",
        "standard_output": "/cephfs_data/slurm_hpc/genostack_v3/genostack_core/genostack_php/slurm_run/c1/09c/42a-c99c-37ed-0e3f-2b2c8ddf3860/output.out",
        "standard_error": "/cephfs_data/slurm_hpc/genostack_v3/genostack_core/genostack_php/slurm_run/c1/09c/42a-c99c-37ed-0e3f-2b2c8ddf3860/error.out",
        "environment": {
            "PATH": "/bin:/usr/bin/:/usr/local/bin/",
            "LD_LIBRARY_PATH": "/lib/:/lib64/:/usr/local/lib"
        },
        "partition": "debug",
        "cpus_per_task": 1,
        "memory_per_node":100,
        "required_nodes":["node3"]
    },
    "script": "#!/bin/bash \nsbatch test.sh"
}
</code></pre>
<p dir="auto">原因分析：通过slurm rest api 提交任务，实际是用sbatch做了封装，在script中只需要写运行的job命令，不能用sbatch提交sh脚本。<br />
这种错误做法等价于： <a href="http://xn--test-ft4g.sh" rel="nofollow ugc">在test.sh</a> 中写入  sbatch sleep 20s,  然后再用 sbatch <a href="http://test.sh" rel="nofollow ugc">test.sh</a></p>
]]></description><link>http://an.forum.genostack.com/topic/1065/slurm-rest-api-提交任务job-id和系统squene差1</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1065/slurm-rest-api-提交任务job-id和系统squene差1</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Mon, 06 May 2024 06:21:09 GMT</pubDate></item><item><title><![CDATA[R740自动引导]]></title><description><![CDATA[<p dir="auto">R740恢复过程中 Grub不能自动引导操作系统 在Grub命令行执行<br />
Grub&gt;set root=(hd1,gpt7)<br />
Grub&gt;linux (hd1,gpt5)/vmlinuz-4.15.0-197-generic root=/dev/sdb7<br />
Grub&gt;initrd (hd1,gpt5)/initrd.img-4.15.0-197-generic<br />
Grub&gt;boot<br />
正常启动后<br />
在linux命令行执行<br />
update-grub<br />
grub-install /dev/sdb</p>
]]></description><link>http://an.forum.genostack.com/topic/1060/r740自动引导</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1060/r740自动引导</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 01 Apr 2024 10:35:33 GMT</pubDate></item><item><title><![CDATA[matplotlib画图不能正确显示中文]]></title><description><![CDATA[<p dir="auto">解决方案</p>
<ol>
<li>
<p dir="auto">下载simhei字体（<a href="https://github.com/zhangsheng377/stats_stock/blob/master/simhei.ttf" rel="nofollow ugc">https://github.com/zhangsheng377/stats_stock/blob/master/simhei.ttf</a>)，存到 /usr/share/fonts 目录下，可新建文件夹。</p>
</li>
<li>
<p dir="auto">然后刷新字体：</p>
</li>
</ol>
<pre><code>sudo fc-cache -f -v
</code></pre>
<p dir="auto">可从回显中检查有无载入刚才的字体文件。</p>
<ol start="3">
<li>然后删除matplotlib字体缓存：</li>
</ol>
<pre><code>rm -rf /home/xxx/.cache/matplotlib
</code></pre>
<blockquote>
<p dir="auto">xxx替换为自己的账号</p>
</blockquote>
<ol start="4">
<li>然后在python中指定字体，即可正常显示中文：</li>
</ol>
<pre><code>import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
</code></pre>
]]></description><link>http://an.forum.genostack.com/topic/1059/matplotlib画图不能正确显示中文</link><guid isPermaLink="true">http://an.forum.genostack.com/topic/1059/matplotlib画图不能正确显示中文</guid><dc:creator><![CDATA[mengpf]]></dc:creator><pubDate>Thu, 21 Mar 2024 08:13:26 GMT</pubDate></item></channel></rss>