当前位置：首页 > news >正文

【Week 37, 2025】每周阅读三篇论文

news 2026/6/17 9:49:20

Paper 1: Graph Neural Network for Decentralized Multi-Robot Goal Assignment

用图神经网络GNN去解决具有通信约束情况下的Linear Sum Assignment Problem (LSAP)——机器人任务一对一约束下最小化总成本

论文信息

标题：Graph Neural Network for Decentralized Multi-Robot Goal Assignment
作者 / 单位：Manohari Goarin, Giuseppe Loianno / Tandon School of Engineering, New York University, Brooklyn, NY 11201 USA
来源：IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 9, NO. 5, MAY 2024
原文链接：https://ieeexplore.ieee.org/document/10452797

背景和贡献

解决LSAP思路有：Centralized Method 和 Decentralized Method
Decentralied Method 可以有
- Optimization-based Method：例如分布式匈牙利算法（Hungarian algorithm）
- Market-based Methods：例如拍卖算法
- Learning-based Methods：例如GNN
主要贡献：处理了具有通信拓扑约束的情况

方法

网络的输入是一个节点包含机器人r和目标g的异构图，图的边是通信拓扑约束和代价。
网络的输出是机器人是否分配目标的概率
训练采用监督学习（supervised learning），模仿集中式 Hungarian algorithm 的最优 LSAP 解。

结果和评价

GNN类方法提供了一个思路，即如何去建模和表征任务与Agent之间的关系。

Paper 2：Dynamic Coalition Formation and Routing for Multirobot Task Allocation via Reinforcement Learning

用 attention 网络去解决同构机器人集群的任务分配问题 ST-MR-TA

论文信息

标题：Dynamic Coalition Formation and Routing for Multirobot Task Allocation via Reinforcement Learning
作者 / 单位：Weiheng Dai1, Aditya Bidwai, Guillaume Sartoretti / Department of Mechanical Engineering, College of Design and Engineering, National University of Singapore
来源：2024 IEEE International Conference on Robotics and Automation (ICRA)
原文链接：https://ieeexplore.ieee.org/document/10611244/

背景和贡献

Work falls under the category of ST-MR-TA, where each robot can perform only one task at a time (ST), each task can require the cooperation of multiple robots (MR), and task allocation continuously happens across time (TA).
Agents learn to reason about their position, the status of all tasks, as well as the position and short-term intent of other agents, to make reactive movement decisions (i.e., which task to travel to and complete next)
提供了一个减少训练时真正用到的决策变量的leader-follower的trick

方法

Observation(task state, agent info, task_already_done_flag) --[Linear Projection]--> embeddings --[Multi-head Attentions Encoder]--> contents --[Decoder]--> Probability Distribution of Task

Training: REINFORCE [28] algorithm with greedy rollout baseline

结果和评价

采用Attension网络去构造规划器，提供了一个例子
用到的Agent是全局的信息，不是局部感知信息

Paper 3：Learning Policies for Dynamic Coalition Formation in Multi-Robot Task Allocation

一句话总结

论文信息

标题：Learning Policies for Dynamic Coalition Formation in Multi-Robot Task Allocation
作者 / 单位：Lucas C. D. Bezerra , Ata ́ıde M. G. dos Santos , and Shinkyu Park / Electrical and Computer Engineering, King Abdullah University of Science and Technology (KAUST), Thuwal, 239556900, Kingdom of Saudi Arabia；Department of Electrical Engineering, Federal University of Sergipe (UFS), Sa ̃o Cristo ́va ̃o, Sergipe, 49107-230, Brazil.
来源：IEEE Robotics and Automation Letters ( Volume: 10, Issue: 9, September 2025)
原文来源：https://ieeexplore.ieee.org/document/11091462

背景和贡献

Multi-Robot Task Allocation (MRTA)， Single-Task robots, Multi-Robot tasks, Timeextended Assignment (ST-MR-TA)
The problem of decentralized dynamic coalition formation under partial observability has not been previously addressed
Focus on developing policies for a team of robots capable of performing tasks that require coalition in dynamic environments.
An end-to-end convolutional neural network based on the U-Net architecture

方法

In this framework, the policy selects a task at each time step, while a motion planner handles the low-level actuator control to navigate the robot to the task location —— This abstraction alleviates the learned policy from handling low-level control, allowing it to concentrate on long-term planning.
Model the problem as a Decentralized Partially-Observable Markov Decision Process (Dec-POMDP)
Adopt MAPPO, a CTDE algorithm designed for MARL, is an Actor-Critic algorithm