Upgrade to py 3.11.14; ROS2 Humble 0.7; unilabos 0.10.16

Workbench example, adjust log level, and ci check (#220)

* TestLatency Return Value Example & gitignore update

* Adjust log level & Add workbench virtual example & Add not action decorator & Add check_mode &

* Add CI Check

Fix/workstation yb revision (#217)

* Revert log change & update registry

* Revert opcua client & move electrolyte node

Workstation yb merge dev ready 260113 (#216)

* feat(bioyond): 添加计算实验设计功能,支持化合物配比和滴定比例参数

* feat(bioyond): 添加测量小瓶功能,支持基本参数配置

* feat(bioyond): 添加测量小瓶配置,支持新设备参数

* feat(bioyond): 更新仓库布局和尺寸,支持竖向排列的测量小瓶和试剂存放堆栈

* feat(bioyond): 优化任务创建流程,确保无论成功与否都清理任务队列以避免重复累积

* feat(bioyond): 添加设置反应器温度功能,支持温度范围和异常处理

* feat(bioyond): 调整反应器位置配置,统一坐标格式

* feat(bioyond): 添加调度器启动功能,支持任务队列执行并处理异常

* feat(bioyond): 优化调度器启动功能,添加异常处理并更新相关配置

* feat(opcua): 增强节点ID解析兼容性和数据类型处理

改进节点ID解析逻辑以支持多种格式,包括字符串和数字标识符
添加数据类型转换处理,确保写入值时类型匹配
优化错误提示信息,便于调试节点连接问题

* feat(registry): 新增后处理站的设备配置文件

添加后处理站的YAML配置文件,包含动作映射、状态类型和设备描述

* 添加调度器启动功能,合并物料参数配置,优化物料参数处理逻辑

* 添加从 Bioyond 系统自动同步工作流序列的功能,并更新相关配置

* fix:兼容 BioyondReactionStation 中 workflow_sequence 被重写为 property

* fix:同步工作流序列

* feat: remove commented workflow synchronization from `reaction_station.py`.

* 添加时间约束功能及相关配置

* fix:自动更新物料缓存功能,添加物料时更新缓存并在删除时移除缓存项

* fix:在添加物料时处理字符串和字典返回值,确保正确更新缓存

* fix:更新奔曜错误处理报送为物料变更报送,调整日志记录和响应消息

* feat:添加实验报告简化功能,去除冗余信息并保留关键信息

* feat: 添加任务状态事件发布功能,监控并报告任务运行、超时、完成和错误状态

* fix: 修复添加物料时数据格式错误

* Refactor bioyond_dispensing_station and reaction_station_bioyond YAML configurations

- Removed redundant action value mappings from bioyond_dispensing_station.
- Updated goal properties in bioyond_dispensing_station to use enums for target_stack and other parameters.
- Changed data types for end_point and start_point in reaction_station_bioyond to use string enums (Start, End).
- Simplified descriptions and updated measurement units from μL to mL where applicable.
- Removed unused commands from reaction_station_bioyond to streamline the configuration.

* fix:Change the material unit from μL to mL

* fix:refresh_material_cache

* feat: 动态获取工作流步骤ID,优化工作流配置

* feat: 添加清空服务端所有非核心工作流功能

* fix:修复Bottle类的序列化和反序列化方法

* feat:增强材料缓存更新逻辑,支持处理返回数据中的详细信息

* Add debug log

* feat(workstation): update bioyond config migration and coin cell material search logic

- Migrate bioyond_cell config to JSON structure and remove global variable dependencies
- Implement material search confirmation dialog auto-handling
- Add documentation: 20260113_物料搜寻确认弹窗自动处理功能.md and 20260113_配置迁移修改总结.md

* Refactor module paths for Bioyond devices in YAML configuration files

- Updated the module path for BioyondDispensingStation in bioyond_dispensing_station.yaml to reflect the new directory structure.
- Updated the module path for BioyondReactionStation and BioyondReactor in reaction_station_bioyond.yaml to align with the revised organization of the codebase.

* fix: WareHouse 的不可哈希类型错误,优化父节点去重逻辑

* refactor: Move config from module to instance initialization

* fix: 修正 reaction_station 目录名拼写错误

* feat: Integrate material search logic and cleanup deprecated files

- Update coin_cell_assembly.py with material search dialog handling
- Update YB_warehouses.py with latest warehouse configurations
- Remove outdated documentation and test data files

* Refactor: Use instance attributes for action names and workflow step IDs

* refactor: Split tipbox storage into left and right warehouses

* refactor: Merge tipbox storage left and right into single warehouse

---------

Co-authored-by: ZiWei <131428629+ZiWei09@users.noreply.github.com>
Co-authored-by: Andy6M <xieqiming1132@qq.com>

fix: WareHouse 的不可哈希类型错误,优化父节点去重逻辑

fix parent_uuid fetch when bind_parent_id == node_name

物料更新也是用父节点进行报送

Add None conversion for tube rack etc.

Add set_liquid example.

Add create_resource and test_resource example.

Add restart.
Temp allow action message.

Add no_update_feedback option.

Create session_id by edge.

bump version to 0.10.15

temp cancel update req
This commit is contained in:
Xuwznln
2026-01-08 15:26:31 +08:00
parent 8580b84167
commit 2a5ddd611d
106 changed files with 17805 additions and 25458 deletions

View File

@@ -7,7 +7,6 @@ import sys
import threading
import time
from typing import Dict, Any, List
import networkx as nx
import yaml
@@ -17,9 +16,15 @@ unilabos_dir = os.path.dirname(os.path.dirname(current_dir))
if unilabos_dir not in sys.path:
sys.path.append(unilabos_dir)
from unilabos.app.utils import cleanup_for_restart
from unilabos.utils.banner_print import print_status, print_unilab_banner
from unilabos.config.config import load_config, BasicConfig, HTTPConfig
# Global restart flags (used by ws_client and web/server)
_restart_requested: bool = False
_restart_reason: str = ""
def load_config_from_file(config_path):
if config_path is None:
config_path = os.environ.get("UNILABOS_BASICCONFIG_CONFIG_PATH", None)
@@ -41,7 +46,7 @@ def convert_argv_dashes_to_underscores(args: argparse.ArgumentParser):
for i, arg in enumerate(sys.argv):
for option_string in option_strings:
if arg.startswith(option_string):
new_arg = arg[:2] + arg[2:len(option_string)].replace("-", "_") + arg[len(option_string):]
new_arg = arg[:2] + arg[2 : len(option_string)].replace("-", "_") + arg[len(option_string) :]
sys.argv[i] = new_arg
break
@@ -49,6 +54,8 @@ def convert_argv_dashes_to_underscores(args: argparse.ArgumentParser):
def parse_args():
"""解析命令行参数"""
parser = argparse.ArgumentParser(description="Start Uni-Lab Edge server.")
subparsers = parser.add_subparsers(title="Valid subcommands", dest="command")
parser.add_argument("-g", "--graph", help="Physical setup graph file path.")
parser.add_argument("-c", "--controllers", default=None, help="Controllers config file path.")
parser.add_argument(
@@ -153,6 +160,50 @@ def parse_args():
default=False,
help="Complete registry information",
)
parser.add_argument(
"--check_mode",
action="store_true",
default=False,
help="Run in check mode for CI: validates registry imports and ensures no file changes",
)
parser.add_argument(
"--no_update_feedback",
action="store_true",
help="Disable sending update feedback to server",
)
# workflow upload subcommand
workflow_parser = subparsers.add_parser(
"workflow_upload",
aliases=["wf"],
help="Upload workflow from xdl/json/python files",
)
workflow_parser.add_argument(
"-f",
"--workflow_file",
type=str,
required=True,
help="Path to the workflow file (JSON format)",
)
workflow_parser.add_argument(
"-n",
"--workflow_name",
type=str,
default=None,
help="Workflow name, if not provided will use the name from file or filename",
)
workflow_parser.add_argument(
"--tags",
type=str,
nargs="*",
default=[],
help="Tags for the workflow (space-separated)",
)
workflow_parser.add_argument(
"--published",
action="store_true",
default=False,
help="Whether to publish the workflow (default: False)",
)
return parser
@@ -165,10 +216,12 @@ def main():
args_dict = vars(args)
# 环境检查 - 检查并自动安装必需的包 (可选)
if not args_dict.get("skip_env_check", False):
skip_env_check = args_dict.get("skip_env_check", False)
check_mode = args_dict.get("check_mode", False)
if not skip_env_check:
from unilabos.utils.environment_check import check_environment
print_status("正在进行环境依赖检查...", "info")
if not check_environment(auto_install=True):
print_status("环境检查失败,程序退出", "error")
os._exit(1)
@@ -177,7 +230,21 @@ def main():
# 加载配置文件优先加载config然后从env读取
config_path = args_dict.get("config")
if os.getcwd().endswith("unilabos_data"):
if check_mode:
args_dict["working_dir"] = os.path.abspath(os.getcwd())
# 当 skip_env_check 时,默认使用当前目录作为 working_dir
if skip_env_check and not args_dict.get("working_dir") and not config_path:
working_dir = os.path.abspath(os.getcwd())
print_status(f"跳过环境检查模式:使用当前目录作为工作目录 {working_dir}", "info")
# 检查当前目录是否有 local_config.py
local_config_in_cwd = os.path.join(working_dir, "local_config.py")
if os.path.exists(local_config_in_cwd):
config_path = local_config_in_cwd
print_status(f"发现本地配置文件: {config_path}", "info")
else:
print_status(f"未指定config路径可通过 --config 传入 local_config.py 文件路径", "info")
elif os.getcwd().endswith("unilabos_data"):
working_dir = os.path.abspath(os.getcwd())
else:
working_dir = os.path.abspath(os.path.join(os.getcwd(), "unilabos_data"))
@@ -196,7 +263,7 @@ def main():
working_dir = os.path.dirname(config_path)
elif os.path.exists(working_dir) and os.path.exists(os.path.join(working_dir, "local_config.py")):
config_path = os.path.join(working_dir, "local_config.py")
elif not config_path and (
elif not skip_env_check and not config_path and (
not os.path.exists(working_dir) or not os.path.exists(os.path.join(working_dir, "local_config.py"))
):
print_status(f"未指定config路径可通过 --config 传入 local_config.py 文件路径", "info")
@@ -210,9 +277,11 @@ def main():
print_status(f"已创建 local_config.py 路径: {config_path}", "info")
else:
os._exit(1)
# 加载配置文件
# 加载配置文件 (check_mode 跳过)
print_status(f"当前工作目录为 {working_dir}", "info")
load_config_from_file(config_path)
if not check_mode:
load_config_from_file(config_path)
# 根据配置重新设置日志级别
from unilabos.utils.log import configure_logger, logger
@@ -241,9 +310,12 @@ def main():
if args_dict.get("sk", ""):
BasicConfig.sk = args_dict.get("sk", "")
print_status("传入了sk参数优先采用传入参数", "info")
BasicConfig.working_dir = working_dir
workflow_upload = args_dict.get("command") in ("workflow_upload", "wf")
# 使用远程资源启动
if args_dict["use_remote_resource"]:
if not workflow_upload and args_dict["use_remote_resource"]:
print_status("使用远程资源启动", "info")
from unilabos.app.web import http_client
@@ -256,15 +328,16 @@ def main():
BasicConfig.port = args_dict["port"] if args_dict["port"] else BasicConfig.port
BasicConfig.disable_browser = args_dict["disable_browser"] or BasicConfig.disable_browser
BasicConfig.working_dir = working_dir
BasicConfig.is_host_mode = not args_dict.get("is_slave", False)
BasicConfig.slave_no_host = args_dict.get("slave_no_host", False)
BasicConfig.upload_registry = args_dict.get("upload_registry", False)
BasicConfig.no_update_feedback = args_dict.get("no_update_feedback", False)
BasicConfig.communication_protocol = "websocket"
machine_name = os.popen("hostname").read().strip()
machine_name = "".join([c if c.isalnum() or c == "_" else "_" for c in machine_name])
BasicConfig.machine_name = machine_name
BasicConfig.vis_2d_enable = args_dict["2d_vis"]
BasicConfig.check_mode = check_mode
from unilabos.resources.graphio import (
read_node_link_json,
@@ -283,10 +356,36 @@ def main():
# 显示启动横幅
print_unilab_banner(args_dict)
# 注册表
lab_registry = build_registry(
args_dict["registry_path"], args_dict.get("complete_registry", False), args_dict["upload_registry"]
)
# 注册表 - check_mode 时强制启用 complete_registry
complete_registry = args_dict.get("complete_registry", False) or check_mode
lab_registry = build_registry(args_dict["registry_path"], complete_registry, BasicConfig.upload_registry)
# Check mode: complete_registry 完成后直接退出git diff 检测由 CI workflow 执行
if check_mode:
print_status("Check mode: complete_registry 完成,退出", "info")
os._exit(0)
if BasicConfig.upload_registry:
# 设备注册到服务端 - 需要 ak 和 sk
if BasicConfig.ak and BasicConfig.sk:
print_status("开始注册设备到服务端...", "info")
try:
register_devices_and_resources(lab_registry)
print_status("设备注册完成", "info")
except Exception as e:
print_status(f"设备注册失败: {e}", "error")
else:
print_status("未提供 ak 和 sk跳过设备注册", "info")
else:
print_status("本次启动注册表不报送云端,如果您需要联网调试,请在启动命令增加--upload_registry", "warning")
# 处理 workflow_upload 子命令
if workflow_upload:
from unilabos.workflow.wf_utils import handle_workflow_upload_command
handle_workflow_upload_command(args_dict)
print_status("工作流上传完成,程序退出", "info")
os._exit(0)
if not BasicConfig.ak or not BasicConfig.sk:
print_status("后续运行必须拥有一个实验室,请前往 https://uni-lab.bohrium.com 注册实验室!", "warning")
@@ -368,20 +467,6 @@ def main():
args_dict["devices_config"] = resource_tree_set
args_dict["graph"] = graph_res.physical_setup_graph
if BasicConfig.upload_registry:
# 设备注册到服务端 - 需要 ak 和 sk
if BasicConfig.ak and BasicConfig.sk:
print_status("开始注册设备到服务端...", "info")
try:
register_devices_and_resources(lab_registry)
print_status("设备注册完成", "info")
except Exception as e:
print_status(f"设备注册失败: {e}", "error")
else:
print_status("未提供 ak 和 sk跳过设备注册", "info")
else:
print_status("本次启动注册表不报送云端,如果您需要联网调试,请在启动命令增加--upload_registry", "warning")
if args_dict["controllers"] is not None:
args_dict["controllers_config"] = yaml.safe_load(open(args_dict["controllers"], encoding="utf-8"))
else:
@@ -396,6 +481,7 @@ def main():
comm_client = get_communication_client()
if "websocket" in args_dict["app_bridges"]:
args_dict["bridges"].append(comm_client)
def _exit(signum, frame):
comm_client.stop()
sys.exit(0)
@@ -437,16 +523,13 @@ def main():
resource_visualization.start()
except OSError as e:
if "AMENT_PREFIX_PATH" in str(e):
print_status(
f"ROS 2环境未正确设置跳过3D可视化启动。错误详情: {e}",
"warning"
)
print_status(f"ROS 2环境未正确设置跳过3D可视化启动。错误详情: {e}", "warning")
print_status(
"建议解决方案:\n"
"1. 激活Conda环境: conda activate unilab\n"
"2. 或使用 --backend simple 参数\n"
"3. 或使用 --visual disable 参数禁用可视化",
"info"
"info",
)
else:
raise
@@ -454,13 +537,19 @@ def main():
time.sleep(1)
else:
start_backend(**args_dict)
start_server(
restart_requested = start_server(
open_browser=not args_dict["disable_browser"],
port=BasicConfig.port,
)
if restart_requested:
print_status("[Main] Restart requested, cleaning up...", "info")
cleanup_for_restart()
return
else:
start_backend(**args_dict)
start_server(
# 启动服务器默认支持WebSocket触发重启
restart_requested = start_server(
open_browser=not args_dict["disable_browser"],
port=BasicConfig.port,
)

176
unilabos/app/utils.py Normal file
View File

@@ -0,0 +1,176 @@
"""
UniLabOS 应用工具函数
提供清理、重启等工具函数
"""
import glob
import os
import shutil
import sys
def patch_rclpy_dll_windows():
"""在 Windows + conda 环境下为 rclpy 打 DLL 加载补丁"""
if sys.platform != "win32" or not os.environ.get("CONDA_PREFIX"):
return
try:
import rclpy
return
except ImportError as e:
if not str(e).startswith("DLL load failed"):
return
cp = os.environ["CONDA_PREFIX"]
impl = os.path.join(cp, "Lib", "site-packages", "rclpy", "impl", "implementation_singleton.py")
pyd = glob.glob(os.path.join(cp, "Lib", "site-packages", "rclpy", "_rclpy_pybind11*.pyd"))
if not os.path.exists(impl) or not pyd:
return
with open(impl, "r", encoding="utf-8") as f:
content = f.read()
lib_bin = os.path.join(cp, "Library", "bin").replace("\\", "/")
patch = f'# UniLabOS DLL Patch\nimport os,ctypes\nos.add_dll_directory("{lib_bin}") if hasattr(os,"add_dll_directory") else None\ntry: ctypes.CDLL("{pyd[0].replace(chr(92),"/")}")\nexcept: pass\n# End Patch\n'
shutil.copy2(impl, impl + ".bak")
with open(impl, "w", encoding="utf-8") as f:
f.write(patch + content)
patch_rclpy_dll_windows()
import gc
import threading
import time
from unilabos.utils.banner_print import print_status
def cleanup_for_restart() -> bool:
"""
Clean up all resources for restart without exiting the process.
This function prepares the system for re-initialization by:
1. Stopping all communication clients
2. Destroying ROS nodes
3. Resetting singletons
4. Waiting for threads to finish
Returns:
bool: True if cleanup was successful, False otherwise
"""
print_status("[Restart] Starting cleanup for restart...", "info")
# Step 1: Stop WebSocket communication client
print_status("[Restart] Step 1: Stopping WebSocket client...", "info")
try:
from unilabos.app.communication import get_communication_client
comm_client = get_communication_client()
if comm_client is not None:
comm_client.stop()
print_status("[Restart] WebSocket client stopped", "info")
except Exception as e:
print_status(f"[Restart] Error stopping WebSocket: {e}", "warning")
# Step 2: Get HostNode and cleanup ROS
print_status("[Restart] Step 2: Cleaning up ROS nodes...", "info")
try:
from unilabos.ros.nodes.presets.host_node import HostNode
import rclpy
from rclpy.timer import Timer
host_instance = HostNode.get_instance(timeout=5)
if host_instance is not None:
print_status(f"[Restart] Found HostNode: {host_instance.device_id}", "info")
# Gracefully shutdown background threads
print_status("[Restart] Shutting down background threads...", "info")
HostNode.shutdown_background_threads(timeout=5.0)
print_status("[Restart] Background threads shutdown complete", "info")
# Stop discovery timer
if hasattr(host_instance, "_discovery_timer") and isinstance(host_instance._discovery_timer, Timer):
host_instance._discovery_timer.cancel()
print_status("[Restart] Discovery timer cancelled", "info")
# Destroy device nodes
device_count = len(host_instance.devices_instances)
print_status(f"[Restart] Destroying {device_count} device instances...", "info")
for device_id, device_node in list(host_instance.devices_instances.items()):
try:
if hasattr(device_node, "ros_node_instance") and device_node.ros_node_instance is not None:
device_node.ros_node_instance.destroy_node()
print_status(f"[Restart] Device {device_id} destroyed", "info")
except Exception as e:
print_status(f"[Restart] Error destroying device {device_id}: {e}", "warning")
# Clear devices instances
host_instance.devices_instances.clear()
host_instance.devices_names.clear()
# Destroy host node
try:
host_instance.destroy_node()
print_status("[Restart] HostNode destroyed", "info")
except Exception as e:
print_status(f"[Restart] Error destroying HostNode: {e}", "warning")
# Reset HostNode state
HostNode.reset_state()
print_status("[Restart] HostNode state reset", "info")
# Shutdown executor first (to stop executor.spin() gracefully)
if hasattr(rclpy, "__executor") and rclpy.__executor is not None:
try:
rclpy.__executor.shutdown()
rclpy.__executor = None # Clear for restart
print_status("[Restart] ROS executor shutdown complete", "info")
except Exception as e:
print_status(f"[Restart] Error shutting down executor: {e}", "warning")
# Shutdown rclpy
if rclpy.ok():
rclpy.shutdown()
print_status("[Restart] rclpy shutdown complete", "info")
except ImportError as e:
print_status(f"[Restart] ROS modules not available: {e}", "warning")
except Exception as e:
print_status(f"[Restart] Error in ROS cleanup: {e}", "warning")
return False
# Step 3: Reset communication client singleton
print_status("[Restart] Step 3: Resetting singletons...", "info")
try:
from unilabos.app import communication
if hasattr(communication, "_communication_client"):
communication._communication_client = None
print_status("[Restart] Communication client singleton reset", "info")
except Exception as e:
print_status(f"[Restart] Error resetting communication singleton: {e}", "warning")
# Step 4: Wait for threads to finish
print_status("[Restart] Step 4: Waiting for threads to finish...", "info")
time.sleep(3) # Give threads time to finish
# Check remaining threads
remaining_threads = []
for t in threading.enumerate():
if t.name != "MainThread" and t.is_alive():
remaining_threads.append(t.name)
if remaining_threads:
print_status(
f"[Restart] Warning: {len(remaining_threads)} threads still running: {remaining_threads}", "warning"
)
else:
print_status("[Restart] All threads stopped", "info")
# Step 5: Force garbage collection
print_status("[Restart] Step 5: Running garbage collection...", "info")
gc.collect()
gc.collect() # Run twice for weak references
print_status("[Restart] Garbage collection complete", "info")
print_status("[Restart] Cleanup complete. Ready for re-initialization.", "info")
return True

View File

@@ -58,14 +58,14 @@ class JobResultStore:
feedback=feedback or {},
timestamp=time.time(),
)
logger.debug(f"[JobResultStore] Stored result for job {job_id[:8]}, status={status}")
logger.trace(f"[JobResultStore] Stored result for job {job_id[:8]}, status={status}")
def get_and_remove(self, job_id: str) -> Optional[JobResult]:
"""获取并删除任务结果"""
with self._results_lock:
result = self._results.pop(job_id, None)
if result:
logger.debug(f"[JobResultStore] Retrieved and removed result for job {job_id[:8]}")
logger.trace(f"[JobResultStore] Retrieved and removed result for job {job_id[:8]}")
return result
def get_result(self, job_id: str) -> Optional[JobResult]:

View File

@@ -6,7 +6,6 @@ Web服务器模块
import webbrowser
import uvicorn
from fastapi import FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware
from starlette.responses import Response
@@ -96,7 +95,7 @@ def setup_server() -> FastAPI:
return app
def start_server(host: str = "0.0.0.0", port: int = 8002, open_browser: bool = True) -> None:
def start_server(host: str = "0.0.0.0", port: int = 8002, open_browser: bool = True) -> bool:
"""
启动服务器
@@ -104,7 +103,14 @@ def start_server(host: str = "0.0.0.0", port: int = 8002, open_browser: bool = T
host: 服务器主机
port: 服务器端口
open_browser: 是否自动打开浏览器
Returns:
bool: True if restart was requested, False otherwise
"""
import threading
import time
from uvicorn import Config, Server
# 设置服务器
setup_server()
@@ -123,7 +129,37 @@ def start_server(host: str = "0.0.0.0", port: int = 8002, open_browser: bool = T
# 启动服务器
info(f"[Web] 启动FastAPI服务器: {host}:{port}")
uvicorn.run(app, host=host, port=port, log_config=log_config)
# 使用支持重启的模式
config = Config(app=app, host=host, port=port, log_config=log_config)
server = Server(config)
# 启动服务器线程
server_thread = threading.Thread(target=server.run, daemon=True, name="uvicorn_server")
server_thread.start()
info("[Web] Server started, monitoring for restart requests...")
# 监控重启标志
import unilabos.app.main as main_module
while server_thread.is_alive():
if hasattr(main_module, "_restart_requested") and main_module._restart_requested:
info(
f"[Web] Restart requested via WebSocket, reason: {getattr(main_module, '_restart_reason', 'unknown')}"
)
main_module._restart_requested = False
# 停止服务器
server.should_exit = True
server_thread.join(timeout=5)
info("[Web] Server stopped, ready for restart")
return True
time.sleep(1)
return False
# 当脚本直接运行时启动服务器

View File

@@ -23,7 +23,7 @@ from typing import Optional, Dict, Any, List
from urllib.parse import urlparse
from enum import Enum
from jedi.inference.gradual.typing import TypedDict
from typing_extensions import TypedDict
from unilabos.app.model import JobAddReq
from unilabos.ros.nodes.presets.host_node import HostNode
@@ -154,7 +154,7 @@ class DeviceActionManager:
job_info.set_ready_timeout(10) # 设置10秒超时
self.active_jobs[device_key] = job_info
job_log = format_job_log(job_info.job_id, job_info.task_id, job_info.device_id, job_info.action_name)
logger.info(f"[DeviceActionManager] Job {job_log} can start immediately for {device_key}")
logger.trace(f"[DeviceActionManager] Job {job_log} can start immediately for {device_key}")
return True
def start_job(self, job_id: str) -> bool:
@@ -210,8 +210,9 @@ class DeviceActionManager:
job_info.update_timestamp()
# 从all_jobs中移除已结束的job
del self.all_jobs[job_id]
job_log = format_job_log(job_info.job_id, job_info.task_id, job_info.device_id, job_info.action_name)
logger.info(f"[DeviceActionManager] Job {job_log} ended for {device_key}")
# job_log = format_job_log(job_info.job_id, job_info.task_id, job_info.device_id, job_info.action_name)
# logger.debug(f"[DeviceActionManager] Job {job_log} ended for {device_key}")
pass
else:
job_log = format_job_log(job_info.job_id, job_info.task_id, job_info.device_id, job_info.action_name)
logger.warning(f"[DeviceActionManager] Job {job_log} was not active for {device_key}")
@@ -227,7 +228,7 @@ class DeviceActionManager:
next_job_log = format_job_log(
next_job.job_id, next_job.task_id, next_job.device_id, next_job.action_name
)
logger.info(f"[DeviceActionManager] Next job {next_job_log} can start for {device_key}")
logger.trace(f"[DeviceActionManager] Next job {next_job_log} can start for {device_key}")
return next_job
return None
@@ -268,7 +269,7 @@ class DeviceActionManager:
# 从all_jobs中移除
del self.all_jobs[job_id]
job_log = format_job_log(job_info.job_id, job_info.task_id, job_info.device_id, job_info.action_name)
logger.info(f"[DeviceActionManager] Active job {job_log} cancelled for {device_key}")
logger.trace(f"[DeviceActionManager] Active job {job_log} cancelled for {device_key}")
# 启动下一个任务
if device_key in self.device_queues and self.device_queues[device_key]:
@@ -281,7 +282,7 @@ class DeviceActionManager:
next_job_log = format_job_log(
next_job.job_id, next_job.task_id, next_job.device_id, next_job.action_name
)
logger.info(f"[DeviceActionManager] Next job {next_job_log} can start after cancel")
logger.trace(f"[DeviceActionManager] Next job {next_job_log} can start after cancel")
return True
# 如果是排队中的任务
@@ -295,7 +296,7 @@ class DeviceActionManager:
job_log = format_job_log(
job_info.job_id, job_info.task_id, job_info.device_id, job_info.action_name
)
logger.info(f"[DeviceActionManager] Queued job {job_log} cancelled for {device_key}")
logger.trace(f"[DeviceActionManager] Queued job {job_log} cancelled for {device_key}")
return True
job_log = format_job_log(job_info.job_id, job_info.task_id, job_info.device_id, job_info.action_name)
@@ -359,7 +360,7 @@ class MessageProcessor:
self.device_manager = device_manager
self.queue_processor = None # 延迟设置
self.websocket_client = None # 延迟设置
self.session_id = ""
self.session_id = str(uuid.uuid4())[:6] # 产生一个随机的session_id
# WebSocket连接
self.websocket = None
@@ -488,7 +489,20 @@ class MessageProcessor:
async for message in self.websocket:
try:
data = json.loads(message)
await self._process_message(data)
message_type = data.get("action", "")
message_data = data.get("data")
if self.session_id and self.session_id == data.get("edge_session"):
await self._process_message(message_type, message_data)
else:
if message_type.endswith("_material"):
logger.trace(
f"[MessageProcessor] 收到一条归属 {data.get('edge_session')} 的旧消息:{data}"
)
logger.debug(
f"[MessageProcessor] 跳过了一条归属 {data.get('edge_session')} 的旧消息: {data.get('action')}"
)
else:
await self._process_message(message_type, message_data)
except json.JSONDecodeError:
logger.error(f"[MessageProcessor] Invalid JSON received: {message}")
except Exception as e:
@@ -554,12 +568,9 @@ class MessageProcessor:
finally:
logger.debug("[MessageProcessor] Send handler stopped")
async def _process_message(self, data: Dict[str, Any]):
async def _process_message(self, message_type: str, message_data: Dict[str, Any]):
"""处理收到的消息"""
message_type = data.get("action", "")
message_data = data.get("data")
logger.debug(f"[MessageProcessor] Processing message: {message_type}")
logger.trace(f"[MessageProcessor] Processing message: {message_type}")
try:
if message_type == "pong":
@@ -571,16 +582,19 @@ class MessageProcessor:
elif message_type == "cancel_action" or message_type == "cancel_task":
await self._handle_cancel_action(message_data)
elif message_type == "add_material":
# noinspection PyTypeChecker
await self._handle_resource_tree_update(message_data, "add")
elif message_type == "update_material":
# noinspection PyTypeChecker
await self._handle_resource_tree_update(message_data, "update")
elif message_type == "remove_material":
# noinspection PyTypeChecker
await self._handle_resource_tree_update(message_data, "remove")
elif message_type == "session_id":
self.session_id = message_data.get("session_id")
logger.info(f"[MessageProcessor] Session ID: {self.session_id}")
elif message_type == "request_reload":
await self._handle_request_reload(message_data)
# elif message_type == "session_id":
# self.session_id = message_data.get("session_id")
# logger.info(f"[MessageProcessor] Session ID: {self.session_id}")
elif message_type == "request_restart":
await self._handle_request_restart(message_data)
else:
logger.debug(f"[MessageProcessor] Unknown message type: {message_type}")
@@ -628,13 +642,13 @@ class MessageProcessor:
await self._send_action_state_response(
device_id, action_name, task_id, job_id, "query_action_status", True, 0
)
logger.info(f"[MessageProcessor] Job {job_log} can start immediately")
logger.trace(f"[MessageProcessor] Job {job_log} can start immediately")
else:
# 需要排队
await self._send_action_state_response(
device_id, action_name, task_id, job_id, "query_action_status", False, 10
)
logger.info(f"[MessageProcessor] Job {job_log} queued")
logger.trace(f"[MessageProcessor] Job {job_log} queued")
# 通知QueueProcessor有新的队列更新
if self.queue_processor:
@@ -838,9 +852,7 @@ class MessageProcessor:
device_action_groups[key_add] = []
device_action_groups[key_add].append(item["uuid"])
logger.info(
f"[MessageProcessor] Resource migrated: {item['uuid'][:8]} from {device_old_id} to {device_id}"
)
logger.info(f"[资源同步] 跨站Transfer: {item['uuid'][:8]} from {device_old_id} to {device_id}")
else:
# 正常update
key = (device_id, "update")
@@ -854,11 +866,13 @@ class MessageProcessor:
device_action_groups[key] = []
device_action_groups[key].append(item["uuid"])
logger.info(f"触发物料更新 {action} 分组数量: {len(device_action_groups)}, 总数量: {len(resource_uuid_list)}")
logger.trace(
f"[资源同步] 动作 {action} 分组数量: {len(device_action_groups)}, 总数量: {len(resource_uuid_list)}"
)
# 为每个(device_id, action)创建独立的更新线程
for (device_id, actual_action), items in device_action_groups.items():
logger.info(f"设备 {device_id} 物料更新 {actual_action} 数量: {len(items)}")
logger.trace(f"[资源同步] {device_id} 物料动作 {actual_action} 数量: {len(items)}")
def _notify_resource_tree(dev_id, act, item_list):
try:
@@ -890,19 +904,50 @@ class MessageProcessor:
)
thread.start()
async def _handle_request_reload(self, data: Dict[str, Any]):
async def _handle_request_restart(self, data: Dict[str, Any]):
"""
处理重请求
当LabGo发送request_reload时重新发送设备注册信息
处理重请求
当LabGo发送request_restart时执行清理并触发重启
"""
reason = data.get("reason", "unknown")
logger.info(f"[MessageProcessor] Received reload request, reason: {reason}")
# 重新发送host_node_ready信息
delay = data.get("delay", 2) # 默认延迟2秒
logger.info(f"[MessageProcessor] Received restart request, reason: {reason}, delay: {delay}s")
# 发送确认消息
if self.websocket_client:
self.websocket_client.publish_host_ready()
logger.info("[MessageProcessor] Re-sent host_node_ready after reload request")
await self.websocket_client.send_message(
{"action": "restart_acknowledged", "data": {"reason": reason, "delay": delay}}
)
# 设置全局重启标志
import unilabos.app.main as main_module
main_module._restart_requested = True
main_module._restart_reason = reason
# 延迟后执行清理
await asyncio.sleep(delay)
# 在新线程中执行清理,避免阻塞当前事件循环
def do_cleanup():
import time
time.sleep(0.5) # 给当前消息处理完成的时间
logger.info(f"[MessageProcessor] Starting cleanup for restart, reason: {reason}")
try:
from unilabos.app.utils import cleanup_for_restart
if cleanup_for_restart():
logger.info("[MessageProcessor] Cleanup successful, main() will restart")
else:
logger.error("[MessageProcessor] Cleanup failed")
except Exception as e:
logger.error(f"[MessageProcessor] Error during cleanup: {e}")
cleanup_thread = threading.Thread(target=do_cleanup, name="RestartCleanupThread", daemon=True)
cleanup_thread.start()
logger.info(f"[MessageProcessor] Restart cleanup scheduled")
async def _send_action_state_response(
self, device_id: str, action_name: str, task_id: str, job_id: str, typ: str, free: bool, need_more: int
@@ -1090,7 +1135,7 @@ class QueueProcessor:
success = self.message_processor.send_message(message)
job_log = format_job_log(job_info.job_id, job_info.task_id, job_info.device_id, job_info.action_name)
if success:
logger.debug(f"[QueueProcessor] Sent busy/need_more for queued job {job_log}")
logger.trace(f"[QueueProcessor] Sent busy/need_more for queued job {job_log}")
else:
logger.warning(f"[QueueProcessor] Failed to send busy status for job {job_log}")
@@ -1113,7 +1158,7 @@ class QueueProcessor:
job_info.action_name,
)
logger.info(f"[QueueProcessor] Job {job_log} completed with status: {status}")
logger.trace(f"[QueueProcessor] Job {job_log} completed with status: {status}")
# 结束任务,获取下一个可执行的任务
next_job = self.device_manager.end_job(job_id)
@@ -1133,8 +1178,8 @@ class QueueProcessor:
},
}
self.message_processor.send_message(message)
next_job_log = format_job_log(next_job.job_id, next_job.task_id, next_job.device_id, next_job.action_name)
logger.info(f"[QueueProcessor] Notified next job {next_job_log} can start")
# next_job_log = format_job_log(next_job.job_id, next_job.task_id, next_job.device_id, next_job.action_name)
# logger.debug(f"[QueueProcessor] Notified next job {next_job_log} can start")
# 立即触发下一轮状态检查
self.notify_queue_update()
@@ -1279,7 +1324,7 @@ class WebSocketClient(BaseCommunicationClient):
except (KeyError, AttributeError):
logger.warning(f"[WebSocketClient] Failed to remove job {item.job_id} from HostNode status")
logger.info(f"[WebSocketClient] Intercepting final status for job_id: {item.job_id} - {status}")
# logger.debug(f"[WebSocketClient] Intercepting final status for job_id: {item.job_id} - {status}")
# 通知队列处理器job完成包括timeout的job
self.queue_processor.handle_job_completed(item.job_id, status)
@@ -1340,15 +1385,17 @@ class WebSocketClient(BaseCommunicationClient):
# 收集设备信息
devices = []
machine_name = BasicConfig.machine_name
try:
host_node = HostNode.get_instance(0)
if host_node:
# 获取设备信息
for device_id, namespace in host_node.devices_names.items():
device_key = f"{namespace}/{device_id}" if namespace.startswith("/") else f"/{namespace}/{device_id}"
device_key = (
f"{namespace}/{device_id}" if namespace.startswith("/") else f"/{namespace}/{device_id}"
)
is_online = device_key in host_node._online_devices
# 获取设备的动作信息
actions = {}
for action_id, client in host_node._action_clients.items():
@@ -1359,16 +1406,18 @@ class WebSocketClient(BaseCommunicationClient):
"action_path": action_id,
"action_type": str(type(client).__name__),
}
devices.append({
"device_id": device_id,
"namespace": namespace,
"device_key": device_key,
"is_online": is_online,
"machine_name": host_node.device_machine_names.get(device_id, machine_name),
"actions": actions,
})
devices.append(
{
"device_id": device_id,
"namespace": namespace,
"device_key": device_key,
"is_online": is_online,
"machine_name": host_node.device_machine_names.get(device_id, machine_name),
"actions": actions,
}
)
logger.info(f"[WebSocketClient] Collected {len(devices)} devices for host_ready")
except Exception as e:
logger.warning(f"[WebSocketClient] Error collecting device info: {e}")