tmp
-
ceph status
cluster:
id: 807d820b-5c5b-451c-9f52-41b93d5d905a
health: HEALTH_ERR
1 MDSs report oversized cache
1 MDSs report slow metadata IOs
2 MDSs behind on trimming
mon bh is low on available space
10 backfillfull osd(s)
1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
full ratio(s) out of order
Low space hindering backfill (add storage if this doesn't resolve itself): 22 pgs backfill_toofull
Degraded data redundancy: 715/452014992 objects degraded (0.000%), 150 pgs degraded, 2 pgs undersized
206 pgs not deep-scrubbed in time
128 pgs not scrubbed in time
4 pool(s) backfillfull -
[WARN] [09/06/2025 06:54:11.355] [cromwell-system-akka.actor.default-dispatcher-3] [cromwell-system/Pool(shared->http://tesk-api.default.svc.cluster.local:8080)] [1 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or call
discardBytes()on it. POST /ga4gh/tes/v1/tasks/task-68f2dfc5:cancel Empty -> 400 Bad Request Chunked
[WARN] [09/06/2025 06:54:16.926] [cromwell-system-akka.actor.default-dispatcher-24] [cromwell-system/Pool(shared->http://tesk-api.default.svc.cluster.local:8080)] [0 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or calldiscardBytes()on it. POST /ga4gh/tes/v1/tasks/task-3a62617f:cancel Empty -> 400 Bad Request Chunked
[WARN] [09/06/2025 06:54:17.945] [cromwell-system-akka.actor.default-dispatcher-26] [cromwell-system/Pool(shared->http://tesk-api.default.svc.cluster.local:8080)] [1 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or calldiscardBytes()on it. POST /ga4gh/tes/v1/tasks/task-d38cd4a6:cancel Empty -> 400 Bad Request Chunked
[WARN] [09/06/2025 06:54:18.095] [cromwell-system-akka.actor.default-dispatcher-24] [cromwell-system/Pool(shared->http://tesk-api.default.svc.cluster.local:8080)] [3 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or calldiscardBytes()on it. POST /ga4gh/tes/v1/tasks/task-d38cd4a6:cancel Empty -> 400 Bad Request Chunked
[WARN] [09/06/2025 06:54:22.325] [cromwell-system-akka.actor.default-dispatcher-4] [cromwell-system/Pool(shared->http://tesk-api.default.svc.cluster.local:8080)] [2 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or calldiscardBytes()on it. POST /ga4gh/tes/v1/tasks/task-3299d1e5:cancel Empty -> 400 Bad Request Chunked
[WARN] [09/06/2025 06:54:25.545] [cromwell-system-akka.actor.default-dispatcher-2] [cromwell-system/Pool(shared->http://tesk-api.default.svc.cluster.local:8080)] [1 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or calldiscardBytes()on it. POST /ga4gh/tes/v1/tasks/task-68f2dfc5:cancel Empty -> 400 Bad Request Chunked
[WARN] [09/06/2025 06:54:25.575] [cromwell-system-akka.actor.default-dispatcher-3] [cromwell-system/Pool(shared->http://tesk-api.default.svc.cluster.local:8080)] [0 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or calldiscardBytes()on it. POST /ga4gh/tes/v1/tasks/task-3299d1e5:cancel Empty -> 400 Bad Request Chunked
[WARN] [09/06/2025 06:54:30.316] [cromwell-system-akka.actor.default-dispatcher-4] [cromwell-system/Pool(shared->http://tesk-api.default.svc.cluster.local:8080)] [3 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or calldiscardBytes()on it. POST /ga4gh/tes/v1/tasks/task-3a62617f:cancel Empty -> 400 Bad Request Chunked
[WARN] [09/06/2025 06:54:32.715] [cromwell-system-akka.actor.default-dispatcher-4] [cromwell-system/Pool(shared->http://tesk-api.default.svc.cluster.local:8080)] [0 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or calldiscardBytes()on it. POST /ga4gh/tes/v1/tasks/task-d38cd4a6:cancel Empty -> 400 Bad Request Chunked
[WARN] [09/06/2025 06:54:32.865] [cromwell-system-akka.actor.default-dispatcher-24] [cromwell-system/Pool(shared->http://tesk-api.default.svc.cluster.local:8080)] [1 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or calldiscardBytes()on it. POST /ga4gh/tes/v1/tasks/task-d38cd4a6:cancel Empty -> 400 Bad Request Chunked
[WARN] [09/06/2025 06:54:37.295] [cromwell-system-akka.actor.default-dispatcher-2] [cromwell-system/Pool(shared->http://tesk-api.default.svc.cluster.local:8080)] [3 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or calldiscardBytes()on it. POST /ga4gh/tes/v1/tasks/task-3299d1e5:cancel Empty -> 400 Bad Request Chunked -
import time
import re
import osdef follow_log(file_path):
"""实时跟踪日志文件的新内容,类似tail -f命令"""
# 打开文件并移动到文件末尾
with open(file_path, 'r') as f:
f.seek(0, os.SEEK_END)while True: line = f.readline() if not line: # 没有新内容时短暂休眠 time.sleep(0.1) continue yield linedef extract_task_id(line):
"""从日志行中提取task ID"""
# 正则表达式匹配task-开头的ID
pattern = r'task-[0-9a-f]+'
match = re.search(pattern, line)
if match:
return match.group()
return Nonedef main(log_file_path):
print(f"开始监控日志文件: {log_file_path}")
print("正在提取task ID... (按Ctrl+C停止)")try: for line in follow_log(log_file_path): # 检查行中是否包含目标POST请求 if 'POST /ga4gh/tes/v1/tasks/' in line and ':cancel' in line: task_id = extract_task_id(line) if task_id: print(f"提取到task ID: {task_id}") except KeyboardInterrupt: print("\n程序已停止") except FileNotFoundError: print(f"错误: 找不到日志文件 {log_file_path}") except Exception as e: print(f"发生错误: {str(e)}")if name == "main":
# 日志文件路径,可根据实际情况修改
log_file = "nohup.out"
main(log_file) -
delete from "JOB_KEY_VALUE_ENTRY" where "STORE_VALUE" = '195'
-
cde89041-fe8a-4712-8385-0cb8afc0efcc, 7cf42b54-9888-486e-9460-9d5b607c02fd, 3178dfab-75f8-4ab3-95a4-408af8e3d7c1, 170c1ac2-e068-456b-b5ea-c948f432e87d, fbafa7ff-408b-4869-b9c5-c66a45d9843f, 9c35008e-0f1a-433d-a974-266d0ac5038d, 0caaa4b6-6a32-4973-8e4f-9f65fb0ece2b, 1b3b877d-cd72-4694-bfad-07fde451e6ff, cd08e0b4-a0b7-4ce3-ae67-79650a6a6bb2, 601d6df9-ab99-4a65-b790-0a2b6cc5c191, 971df97e-e631-4f4a-9d78-4704eb0a7591, c712c011-afd1-4e6c-b1d1-ef6532ff48f4, a66c9c15-22c8-4bb3-920b-9354b408862b, 58d69c6a-df71-4780-8b18-b3020532997f, 1cc9b767-573a-447d-b257-968f2c91df17, dab528d2-1eda-46e3-a232-622cb188a7c5, 2026b96c-019b-4a6c-bc13-e4cc3bb78e98, edabc69e-72d1-48df-942e-528cc60b835d, 846a9eb7-5c60-4171-98db-c7d4b370cf27, d33d18c7-293c-4d0b-ab3b-38f049a305c2, 351735e1-31cd-4577-82d6-e3a4d10d679c, 93acc752-31bf-4538-b3ce-59c0762d00f6, b0599234-046e-4914-9941-00aadf0a292e, 524ddb2b-e921-4483-b6c1-bc449047b6a4, f09dd720-5851-47d5-92d0-a36aa7d2be72, 35d9125b-cee8-4cea-abd9-717e7ba75ce2, 81df380e-50a9-434d-ba6e-c74500038d8d, 2f853868-6d6c-4b51-88b7-506dffd0e4ae, c1eb19f2-9ce4-4ccd-986e-a8abb7c5b737, 3e21be2c-b074-4a4c-9802-173be8ea45eb, 0cf1a30b-4031-4789-819e-f5a3e6f58b50, 042e5576-988e-4bdc-b782-2e212ad88e8c, 4766114f-efee-4200-a3fa-dd109a4ee9df, 3f84c54c-0e03-4ea3-a258-ab71691cee60, 41278df2-f761-4463-8df9-0c9223585348, f47e22e1-a046-4f93-ac9d-65e2801ab9ac, a1611293-04c2-4b81-847a-6a53defa7399, a164dc0d-388f-4b3e-ba7a-5f9543b34185, 98b1d8ba-757d-4599-ad01-b703d58b5d0d, 1ac95950-d62c-4fe8-ba7d-7df645dcebbf, bb3fad51-a03e-4a47-95b3-f74711577c64, beb7ba8d-0909-4f04-9223-1ec72ae7fac4, 057ed284-0b60-4ede-ae87-d7db476c1882, c9c98451-9773-4245-a636-e7dcb3c19282, d7b55685-cdd6-483b-8ea9-38131ad5ea92, 486875ec-76a0-45c5-a734-fd63bc599af3, fcd11583-245f-42a9-94c9-1c4524217923, ecd92665-0550-48d6-99b0-0588b3367469, 071e9c51-931b-4c05-b3e8-a227449f3b9e
-
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/edabc69e-72d1-48df-942e-528cc60b835d/abort" -H "accept: application/json"
-
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/cde89041-fe8a-4712-8385-0cb8afc0efcc/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/7cf42b54-9888-486e-9460-9d5b607c02fd/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/3178dfab-75f8-4ab3-95a4-408af8e3d7c1/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/170c1ac2-e068-456b-b5ea-c948f432e87d/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/fbafa7ff-408b-4869-b9c5-c66a45d9843f/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/9c35008e-0f1a-433d-a974-266d0ac5038d/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/0caaa4b6-6a32-4973-8e4f-9f65fb0ece2b/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/1b3b877d-cd72-4694-bfad-07fde451e6ff/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/cd08e0b4-a0b7-4ce3-ae67-79650a6a6bb2/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/601d6df9-ab99-4a65-b790-0a2b6cc5c191/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/971df97e-e631-4f4a-9d78-4704eb0a7591/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/c712c011-afd1-4e6c-b1d1-ef6532ff48f4/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/a66c9c15-22c8-4bb3-920b-9354b408862b/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/58d69c6a-df71-4780-8b18-b3020532997f/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/1cc9b767-573a-447d-b257-968f2c91df17/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/dab528d2-1eda-46e3-a232-622cb188a7c5/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/2026b96c-019b-4a6c-bc13-e4cc3bb78e98/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/edabc69e-72d1-48df-942e-528cc60b835d/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/846a9eb7-5c60-4171-98db-c7d4b370cf27/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/d33d18c7-293c-4d0b-ab3b-38f049a305c2/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/351735e1-31cd-4577-82d6-e3a4d10d679c/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/93acc752-31bf-4538-b3ce-59c0762d00f6/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/b0599234-046e-4914-9941-00aadf0a292e/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/524ddb2b-e921-4483-b6c1-bc449047b6a4/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/f09dd720-5851-47d5-92d0-a36aa7d2be72/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/35d9125b-cee8-4cea-abd9-717e7ba75ce2/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/81df380e-50a9-434d-ba6e-c74500038d8d/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/2f853868-6d6c-4b51-88b7-506dffd0e4ae/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/c1eb19f2-9ce4-4ccd-986e-a8abb7c5b737/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/3e21be2c-b074-4a4c-9802-173be8ea45eb/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/0cf1a30b-4031-4789-819e-f5a3e6f58b50/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/042e5576-988e-4bdc-b782-2e212ad88e8c/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/4766114f-efee-4200-a3fa-dd109a4ee9df/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/3f84c54c-0e03-4ea3-a258-ab71691cee60/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/41278df2-f761-4463-8df9-0c9223585348/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/f47e22e1-a046-4f93-ac9d-65e2801ab9ac/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/a1611293-04c2-4b81-847a-6a53defa7399/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/a164dc0d-388f-4b3e-ba7a-5f9543b34185/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/98b1d8ba-757d-4599-ad01-b703d58b5d0d/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/1ac95950-d62c-4fe8-ba7d-7df645dcebbf/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/bb3fad51-a03e-4a47-95b3-f74711577c64/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/beb7ba8d-0909-4f04-9223-1ec72ae7fac4/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/057ed284-0b60-4ede-ae87-d7db476c1882/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/c9c98451-9773-4245-a636-e7dcb3c19282/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/d7b55685-cdd6-483b-8ea9-38131ad5ea92/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/486875ec-76a0-45c5-a734-fd63bc599af3/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/fcd11583-245f-42a9-94c9-1c4524217923/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/ecd92665-0550-48d6-99b0-0588b3367469/abort" -H "accept: application/json"
curl -X POST "http://192.168.30.202:31237/api/workflows/v1/071e9c51-931b-4c05-b3e8-a227449f3b9e/abort" -H "accept: application/json" -
################# retry : Some(9999) ################################## retry : Some(9998) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9998) #################[INFO] [09/10/2025 13:24:08.451] [cromwell-system-akka.dispatchers.backend-dispatcher-195] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-830dc691-9767-475f-a4c2-e65543225903/WorkflowExecutionActor-830dc691-9767-475f-a4c2-e65543225903/830dc691-9767-475f-a4c2-e65543225903-EngineJobExecutionActor-meta_workflow.Bar:NA:1/830dc691-9767-475f-a4c2-e65543225903-BackendJobExecutionActor-meta_workflow.Bar:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(830dc691)meta_workflow.Bar:NA:1]: Status change from - to Running
################# retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9999) ################################## retry : Some(9999) #################[INFO] [09/10/2025 13:24:09.672] [cromwell-system-akka.dispatchers.backend-dispatcher-195] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-830dc691-9767-475f-a4c2-e65543225903/WorkflowExecutionActor-830dc691-9767-475f-a4c2-e65543225903/830dc691-9767-475f-a4c2-e65543225903-EngineJobExecutionActor-meta_workflow.metacor:NA:1/830dc691-9767-475f-a4c2-e65543225903-BackendJobExecutionActor-meta_workflow.metacor:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(830dc691)meta_workflow.metacor:NA:1]: Status change from - to Running
################# retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9999) ################################## retry : Some(9998) ################################## retry : Some(9999) ################################## retry : Some(9998) ################################## retry : Some(9997) ################################## retry : Some(9998) ################################## retry : Some(9997) ################################## retry : Some(9998) ################################## retry : Some(9998) #################[INFO] [09/10/2025 13:24:09.730] [cromwell-system-akka.dispatchers.backend-dispatcher-195] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-efce8835-18f0-4b86-8783-d9080d75b68f/WorkflowExecutionActor-efce8835-18f0-4b86-8783-d9080d75b68f/efce8835-18f0-4b86-8783-d9080d75b68f-EngineJobExecutionActor-meta_workflow.TICstdredeal:NA:1/efce8835-18f0-4b86-8783-d9080d75b68f-BackendJobExecutionActor-meta_workflow.TICstdredeal:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(efce8835)meta_workflow.TICstdredeal:NA:1]: Status change from - to Running
################# retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) #################[INFO] [09/10/2025 13:24:12.450] [cromwell-system-akka.dispatchers.backend-dispatcher-188] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-830dc691-9767-475f-a4c2-e65543225903/WorkflowExecutionActor-830dc691-9767-475f-a4c2-e65543225903/830dc691-9767-475f-a4c2-e65543225903-EngineJobExecutionActor-meta_workflow.roplsplsda:NA:1/830dc691-9767-475f-a4c2-e65543225903-BackendJobExecutionActor-meta_workflow.roplsplsda:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(830dc691)meta_workflow.roplsplsda:NA:1]: Status change from - to Running
################# retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) #################[INFO] [09/10/2025 13:24:15.939] [cromwell-system-akka.dispatchers.backend-dispatcher-211] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-830dc691-9767-475f-a4c2-e65543225903/WorkflowExecutionActor-830dc691-9767-475f-a4c2-e65543225903/830dc691-9767-475f-a4c2-e65543225903-EngineJobExecutionActor-meta_workflow.TICsampleredeal:NA:1/830dc691-9767-475f-a4c2-e65543225903-BackendJobExecutionActor-meta_workflow.TICsampleredeal:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(830dc691)meta_workflow.TICsampleredeal:NA:1]: Status change from - to Running
################# retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9997) ################################## retry : Some(9998) ################################## retry : Some(9999) ################################## retry : Some(9998) ################################## retry : Some(9999) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9998) #################[INFO] [09/10/2025 13:24:17.438] [cromwell-system-akka.dispatchers.backend-dispatcher-211] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-830dc691-9767-475f-a4c2-e65543225903/WorkflowExecutionActor-830dc691-9767-475f-a4c2-e65543225903/830dc691-9767-475f-a4c2-e65543225903-EngineJobExecutionActor-meta_workflow.kmeans:NA:1/830dc691-9767-475f-a4c2-e65543225903-BackendJobExecutionActor-meta_workflow.kmeans:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(830dc691)meta_workflow.kmeans:NA:1]: Status change from - to Running
################# retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9997) ################################## retry : Some(9998) ################################## retry : Some(9998) #################[INFO] [09/10/2025 13:24:17.632] [cromwell-system-akka.dispatchers.backend-dispatcher-211] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-830dc691-9767-475f-a4c2-e65543225903/WorkflowExecutionActor-830dc691-9767-475f-a4c2-e65543225903/830dc691-9767-475f-a4c2-e65543225903-EngineJobExecutionActor-meta_workflow.all_sample_map:NA:1/830dc691-9767-475f-a4c2-e65543225903-BackendJobExecutionActor-meta_workflow.all_sample_map:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(830dc691)meta_workflow.all_sample_map:NA:1]: Status change from - to Running
################# retry : Some(9997) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9997) ################################## retry : Some(9998) #################[INFO] [09/10/2025 13:24:19.638] [cromwell-system-akka.dispatchers.backend-dispatcher-211] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-830dc691-9767-475f-a4c2-e65543225903/WorkflowExecutionActor-830dc691-9767-475f-a4c2-e65543225903/830dc691-9767-475f-a4c2-e65543225903-EngineJobExecutionActor-meta_workflow.KEGG:NA:1/830dc691-9767-475f-a4c2-e65543225903-BackendJobExecutionActor-meta_workflow.KEGG:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(830dc691)meta_workflow.KEGG:NA:1]: Status change from - to Running
################# retry : Some(9996) ################################## retry : Some(9997) ################################## retry : Some(9997) ################################## retry : Some(9998) ################################## retry : Some(9997) #################[INFO] [09/10/2025 13:24:22.193] [cromwell-system-akka.dispatchers.backend-dispatcher-211] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-efce8835-18f0-4b86-8783-d9080d75b68f/WorkflowExecutionActor-efce8835-18f0-4b86-8783-d9080d75b68f/efce8835-18f0-4b86-8783-d9080d75b68f-EngineJobExecutionActor-meta_workflow.TICsampleredeal:NA:1/efce8835-18f0-4b86-8783-d9080d75b68f-BackendJobExecutionActor-meta_workflow.TICsampleredeal:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(efce8835)meta_workflow.TICsampleredeal:NA:1]: Status change from - to Running
################# retry : Some(9999) ################################## retry : Some(9997) ################################## retry : Some(9997) ################################## retry : Some(9997) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9997) #################[INFO] [09/10/2025 13:24:25.386] [cromwell-system-akka.dispatchers.backend-dispatcher-211] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-830dc691-9767-475f-a4c2-e65543225903/WorkflowExecutionActor-830dc691-9767-475f-a4c2-e65543225903/830dc691-9767-475f-a4c2-e65543225903-EngineJobExecutionActor-meta_workflow.heatmap:NA:1/830dc691-9767-475f-a4c2-e65543225903-BackendJobExecutionActor-meta_workflow.heatmap:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(830dc691)meta_workflow.heatmap:NA:1]: Status change from - to Running
################# retry : Some(9997) ################################## retry : Some(9997) ################################## retry : Some(9997) ################################## retry : Some(9997) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9997) ################################## retry : Some(9998) ################################## retry : Some(9996) ################################## retry : Some(9998) ################################## retry : Some(9999) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9998) #################[INFO] [09/10/2025 13:24:25.501] [cromwell-system-akka.dispatchers.backend-dispatcher-211] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-830dc691-9767-475f-a4c2-e65543225903/WorkflowExecutionActor-830dc691-9767-475f-a4c2-e65543225903/830dc691-9767-475f-a4c2-e65543225903-EngineJobExecutionActor-meta_workflow.TICstdredeal:NA:1/830dc691-9767-475f-a4c2-e65543225903-BackendJobExecutionActor-meta_workflow.TICstdredeal:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(830dc691)meta_workflow.TICstdredeal:NA:1]: Status change from - to Running
################# retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9997) ################################## retry : Some(9998) #################[INFO] [09/10/2025 13:24:26.411] [cromwell-system-akka.dispatchers.backend-dispatcher-211] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-830dc691-9767-475f-a4c2-e65543225903/WorkflowExecutionActor-830dc691-9767-475f-a4c2-e65543225903/830dc691-9767-475f-a4c2-e65543225903-EngineJobExecutionActor-meta_workflow.roplsoplsda:NA:1/830dc691-9767-475f-a4c2-e65543225903-BackendJobExecutionActor-meta_workflow.roplsoplsda:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(830dc691)meta_workflow.roplsoplsda:NA:1]: Status change from - to Running
################# retry : Some(9998) ################################## retry : Some(9997) ################################## retry : Some(9997) ################################## retry : Some(9998) ################################## retry : Some(9996) ################################## retry : Some(9997) ################################## retry : Some(9998) #################[INFO] [09/10/2025 13:24:27.693] [cromwell-system-akka.dispatchers.backend-dispatcher-194] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-efce8835-18f0-4b86-8783-d9080d75b68f/WorkflowExecutionActor-efce8835-18f0-4b86-8783-d9080d75b68f/efce8835-18f0-4b86-8783-d9080d75b68f-EngineJobExecutionActor-meta_workflow.roplspca:NA:1/efce8835-18f0-4b86-8783-d9080d75b68f-BackendJobExecutionActor-meta_workflow.roplspca:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(efce8835)meta_workflow.roplspca:NA:1]: job id: task-9d37d05a
################# retry : Some(9997) ################################## retry : Some(9997) ################################## retry : Some(9997) ################################## retry : Some(9997) ################################## retry : Some(9997) #################[INFO] [09/10/2025 13:24:27.693] [cromwell-system-akka.dispatchers.backend-dispatcher-211] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-efce8835-18f0-4b86-8783-d9080d75b68f/WorkflowExecutionActor-efce8835-18f0-4b86-8783-d9080d75b68f/efce8835-18f0-4b86-8783-d9080d75b68f-EngineJobExecutionActor-meta_workflow.all_sample_map:NA:1/efce8835-18f0-4b86-8783-d9080d75b68f-BackendJobExecutionActor-meta_workflow.all_sample_map:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(efce8835)meta_workflow.all_sample_map:NA:1]: job id: task-893eb7df
################# retry : Some(9997) ################################## retry : Some(9997) ################################## retry : Some(9997) ################################## retry : Some(9997) ################################## retry : Some(9996) ################################## retry : Some(9997) ################################## retry : Some(9997) #################[INFO] [09/10/2025 13:24:27.693] [cromwell-system-akka.dispatchers.backend-dispatcher-189] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-efce8835-18f0-4b86-8783-d9080d75b68f/WorkflowExecutionActor-efce8835-18f0-4b86-8783-d9080d75b68f/efce8835-18f0-4b86-8783-d9080d75b68f-EngineJobExecutionActor-meta_workflow.roplsoplsda:NA:1/efce8835-18f0-4b86-8783-d9080d75b68f-BackendJobExecutionActor-meta_workflow.roplsoplsda:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(efce8835)meta_workflow.roplsoplsda:NA:1]: job id: task-7a6a121a
################# retry : Some(9996) ################################## retry : Some(9997) ################################## retry : Some(9996) ################################## retry : Some(9996) ################################## retry : Some(9998) #################[INFO] [09/10/2025 13:24:27.693] [cromwell-system-akka.dispatchers.backend-dispatcher-229] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-efce8835-18f0-4b86-8783-d9080d75b68f/WorkflowExecutionActor-efce8835-18f0-4b86-8783-d9080d75b68f/efce8835-18f0-4b86-8783-d9080d75b68f-EngineJobExecutionActor-meta_workflow.kmeans:NA:1/efce8835-18f0-4b86-8783-d9080d75b68f-BackendJobExecutionActor-meta_workflow.kmeans:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(efce8835)meta_workflow.kmeans:NA:1]: job id: task-e853a7d3
################# retry : Some(9997) ################################## retry : Some(9995) ################################## retry : Some(9996) ################################## retry : Some(9997) ################################## retry : Some(9996) ################################## retry : Some(9997) ################################## retry : Some(9996) ################################## retry : Some(9997) #################[INFO] [09/10/2025 13:24:27.693] [cromwell-system-akka.dispatchers.backend-dispatcher-204] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-efce8835-18f0-4b86-8783-d9080d75b68f/WorkflowExecutionActor-efce8835-18f0-4b86-8783-d9080d75b68f/efce8835-18f0-4b86-8783-d9080d75b68f-EngineJobExecutionActor-meta_workflow.roplsplsda:NA:1/efce8835-18f0-4b86-8783-d9080d75b68f-BackendJobExecutionActor-meta_workflow.roplsplsda:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(efce8835)meta_workflow.roplsplsda:NA:1]: job id: task-fdfbad9a
################# retry : Some(9997) ################################## retry : Some(9996) ################################## retry : Some(9996) ################################## retry : Some(9997) ################################## retry : Some(9996) #################[INFO] [09/10/2025 13:24:27.693] [cromwell-system-akka.dispatchers.backend-dispatcher-192] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-efce8835-18f0-4b86-8783-d9080d75b68f/WorkflowExecutionActor-efce8835-18f0-4b86-8783-d9080d75b68f/efce8835-18f0-4b86-8783-d9080d75b68f-EngineJobExecutionActor-meta_workflow.heatmap:NA:1/efce8835-18f0-4b86-8783-d9080d75b68f-BackendJobExecutionActor-meta_workflow.heatmap:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(efce8835)meta_workflow.heatmap:NA:1]: job id: task-5c817f97
################# retry : Some(9996) ################################## retry : Some(9997) ################################## retry : Some(9996) ################################## retry : Some(9996) ################################## retry : Some(9995) ################################## retry : Some(9997) ################################## retry : Some(9997) #################[INFO] [09/10/2025 13:24:27.693] [cromwell-system-akka.dispatchers.backend-dispatcher-195] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-efce8835-18f0-4b86-8783-d9080d75b68f/WorkflowExecutionActor-efce8835-18f0-4b86-8783-d9080d75b68f/efce8835-18f0-4b86-8783-d9080d75b68f-EngineJobExecutionActor-meta_workflow.metacor:NA:1/efce8835-18f0-4b86-8783-d9080d75b68f-BackendJobExecutionActor-meta_workflow.metacor:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(efce8835)meta_workflow.metacor:NA:1]: job id: task-2a5ebaf9
################# retry : Some(9997) ################################## retry : Some(9997) ################################## retry : Some(9996) ################################## retry : Some(9997) ################################## retry : Some(9998) #################[INFO] [09/10/2025 13:24:27.693] [cromwell-system-akka.dispatchers.backend-dispatcher-233] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-efce8835-18f0-4b86-8783-d9080d75b68f/WorkflowExecutionActor-efce8835-18f0-4b86-8783-d9080d75b68f/efce8835-18f0-4b86-8783-d9080d75b68f-EngineJobExecutionActor-meta_workflow.KEGG:NA:1/efce8835-18f0-4b86-8783-d9080d75b68f-BackendJobExecutionActor-meta_workflow.KEGG:NA:1/TesAsyncBackendJobExecutionActor] TesAsyncBackendJobExecutionActor [UUID(efce8835)meta_workflow.KEGG:NA:1]: job id: task-0b91a00d
################# retry : Some(9996) ################################## retry : Some(9996) ################################## retry : Some(9997) ################################## retry : Some(9997) ############## -
[INFO] [09/11/2025 06:46:32.437] [cromwell-system-akka.dispatchers.engine-dispatcher-30] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-2f372e06-2cd8-424f-8e6c-062e0b506e40/WorkflowExecutionActor-2f372e06-2cd8-424f-8e6c-062e0b506e40] WorkflowExecutionActor-2f372e06-2cd8-424f-8e6c-062e0b506e40 [UUID(2f372e06)]: Restarting blood_meta.check_file, blood_meta.predeal
[INFO] [09/11/2025 06:46:32.438] [cromwell-system-akka.dispatchers.engine-dispatcher-27] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-a5686468-95da-46ca-8498-50187928d6d6/WorkflowExecutionActor-a5686468-95da-46ca-8498-50187928d6d6] WorkflowExecutionActor-a5686468-95da-46ca-8498-50187928d6d6 [UUID(a5686468)]: Restarting metage_megahit.kneaddata
[INFO] [09/11/2025 06:46:32.438] [cromwell-system-akka.dispatchers.engine-dispatcher-6] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-f95d7ecd-be71-428b-8195-9c121ad27007/WorkflowExecutionActor-f95d7ecd-be71-428b-8195-9c121ad27007] WorkflowExecutionActor-f95d7ecd-be71-428b-8195-9c121ad27007 [UUID(f95d7ecd)]: Restarting RNASeq_eukaryon.predeal, RNASeq_eukaryon.getkeggtype -
#!/bin/bash
1. 定义日志文件路径(默认是当前目录的nohup.out,可根据实际路径修改)
LOG_FILE="./nohup.out"
2. 检查日志文件是否存在
if [ ! -f "$LOG_FILE" ]; then
echo "错误:日志文件 $LOG_FILE 不存在!请检查路径是否正确。"
exit 1
fi3. 实时监听日志 + 提取目标任务ID(UUID)
echo "=== 开始监听日志 $LOG_FILE,提取含 Restarting 的任务ID ==="
echo "=== 按 Ctrl+C 停止监听 ==="
echo "=========================="核心逻辑:
- tail -f:实时跟踪日志新增内容
- grep "Restarting":筛选包含“Restarting”的行
- sed 正则:提取“WorkflowActor-”后的36位UUID(格式:8-4-4-4-12位字符)
- sort -u:去重(避免同一任务多次重启导致重复输出)
tail -f "$LOG_FILE" |
grep --line-buffered "Restarting" |
sed -n 's/.WorkflowActor-([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})./\1/p' |
sort -u -
estarting micro_dy_gro.upstream
[INFO] [09/23/2025 06:46:37.396] [cromwell-system-akka.dispatchers.engine-dispatcher-9] [akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-fa3a52b6-19db-4435-ac0f-a5c1fbeec385/WorkflowExecutionActor-fa3a52b6-19db-4435-ac0f-a5c1fbeec385] WorkflowExecutionActor-fa3a52b6-19db-4435-ac0f-a5c1fbeec385 [UUID(fa3a52b6)]: Restarting blood_meta.jsonFile, blood_meta.reportNoFile, blood_meta.resFile
################# retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9998) ################################## retry : Some(9998) ################################## retry : Some(9999) ################################## retry : Some(9999) ################################## retry : Some(9999) ############### -
Type Reason Age From Message
Normal NodeReady 47m (x11 over 6h56m) kubelet Node node1 status is now: NodeReady
Normal NodeNotReady 44m (x12 over 7h3m) kubelet Node node1 status is now: NodeNotReady
Normal Starting 37m kubelet Starting kubelet.
Normal NodeHasSufficientMemory 37m kubelet Node node1 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 37m kubelet Node node1 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 37m kubelet Node node1 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 37m kubelet Updated Node Allocatable limit across pods
Normal NodeReady 37m kubelet Node node1 status is now: NodeReady
Normal NodeNotReady 34m kubelet Node node1 status is now: NodeNotReady
Normal Starting 31m kubelet Starting kubelet.
Normal NodeHasSufficientMemory 31m kubelet Node node1 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 31m kubelet Node node1 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 31m kubelet Node node1 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 31m kubelet Updated Node Allocatable limit across pods
Normal NodeReady 9m29s (x2 over 31m) kubelet Node node1 status is now: NodeReady
Normal NodeNotReady 6m28s (x2 over 28m) kubelet Node node1 status is now: NodeNotReady -
安装NVIDIA仓库配置包(适用于CentOS 8)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.reposudo dnf install -y nvidia-container-toolkit
-
nvidia-ctk runtime configure --runtime=docker
-
kubectl get pods --all-namespaces -o json | jq -r '.items[] | select(.spec.containers[].resources.limits["<gpu-resource-name>"] != null) | .metadata.namespace + " " + .metadata.name'
-
2025/10/13 02:31:43 Starting FS watcher.
2025/10/13 02:31:43 Starting OS watcher.
2025/10/13 02:31:43 Starting Plugins.
2025/10/13 02:31:43 Loading configuration.
2025/10/13 02:31:43 Initializing NVML.
2025/10/13 02:31:43 Failed to initialize NVML: could not load NVML library.
2025/10/13 02:31:43 If this is a GPU node, did you set the docker default runtime tonvidia?
2025/10/13 02:31:43 You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
2025/10/13 02:31:43 You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start
2025/10/13 02:31:43 If this is not a GPU node, you should set up a toleration or nodeSelector to only deploy this plugin on GPU nodes -
-
2025/10/13 07:23:31 Failed to initialize NVML: could not load NVML library.
2025/10/13 07:23:31 If this is a GPU node, did you set the docker default runtime tonvidia? -
-