<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[slurm计算节点重启后状态为down]]></title><description><![CDATA[<p dir="auto">问题状态</p>
<pre><code>[root@fda-0d01-ai01-cv4 slurm]# sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST 
debug*       up   infinite     17   idle wd[12-15,21,23-25,31-34,75],we[14-15,21,63] 
debug*       up   infinite      2   down wd22,we85
</code></pre>
<p dir="auto">查看node详情</p>
<pre><code>scontrol show node

NodeName=wd13 Arch=x86_64 CoresPerSocket=18 
   CPUAlloc=0 CPUTot=36 CPULoad=8.01
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=wd13 NodeHostName=wd13 Version=20.02.7
   OS=Linux 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 
   RealMemory=1 AllocMem=0 FreeMem=253045 Sockets=2 Boards=1
   State=DOWN ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=debug 
   BootTime=2021-07-20T14:52:43 SlurmdStartTime=2021-07-29T11:58:56
   CfgTRES=cpu=36,mem=1M,billing=36
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=Node unexpectedly rebooted [slurm@2021-07-20T14:54:26]

NodeName=wd22 Arch=x86_64 CoresPerSocket=18 
   CPUAlloc=0 CPUTot=36 CPULoad=8.01
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=wd22 NodeHostName=wd22 Version=20.02.7
   OS=Linux 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 
   RealMemory=1 AllocMem=0 FreeMem=252883 Sockets=2 Boards=1
   State=DOWN ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=debug 
   BootTime=2021-07-20T15:19:00 SlurmdStartTime=2021-07-20T15:20:24
   CfgTRES=cpu=36,mem=1M,billing=36
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=Node unexpectedly rebooted [slurm@2021-07-20T15:20:40]
</code></pre>
<p dir="auto">解决办法</p>
<pre><code>scontrol update NodeName=wd22 State=RESUME
</code></pre>
]]></description><link>http://an.forum.genostack.com/topic/365/slurm计算节点重启后状态为down</link><generator>RSS for Node</generator><lastBuildDate>Sat, 13 Jun 2026 09:38:43 GMT</lastBuildDate><atom:link href="http://an.forum.genostack.com/topic/365.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 29 Jul 2021 05:48:24 GMT</pubDate><ttl>60</ttl></channel></rss>