Learning attention-based strategies to cooperate for multi-agent path finding

Jinchao Ma; Defu Lian

doi:10.52396/JUSTC-2022-0048

PDF( 2444 KB)

Open Access JUSTC Information Science and Intelligence Technology 02 May 2023

Learning attention-based strategies to cooperate for multi-agent path finding

Jinchao Ma,
Defu Lian^,

School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China

Cite this:

https://doi.org/10.52396/JUSTC-2022-0048

More Information

Author Bio:
Jinchao Ma received the B.S. degree in Mathematics from Shandong University in 2019. He was pursuing the M.E. degree at the School of Computer Science and Technology, University of Science and Technology of China. His current research interests include reinforcement learning and path finding algorithm

Defu Lian received his Ph.D. degree in Computer Science from the University of Science and Technology of China (USTC) in 2014. He is currently a Professor in the School of Data Science, USTC. His current research interests include spatial data mining, recommender systems, and learning to hash
Corresponding author: E-mail: liandefu@ustc.edu.cn
Received Date: 11 March 2022
Accepted Date: 04 July 2022

Available Online: 02 May 2023

Abstract Full text PDF

Abstract

Abstract

Multi-agent path finding (MAPF) is a challenging multi-agent systems problem where all agents are required to effectively reach their goals concurrently with not colliding with each other and avoiding obstacles. In MAPF, it is a challenge to effectively express the observation of agents, utilize historical information, and effectively communicate with neighbor agents. To tackle these issues, in this work, we proposed a well-designed model that utilizes the local states of nearby agents and outputs an optimal action for each agent to execute. We build the local observation encoder by using residual attention CNN to extract local observations and use the Transformer architecture to build an interaction layer to combine local observations of agents. With the purpose of overcoming the deficiency of success rate, we also designed a new evaluation index, namely extra time rate (ETR). The experimental results show that our model is superior to most previous models in terms of success rate and ETR. In addition, we also completed the ablation study on the model, and the effectiveness of each component of the model was proved.

Graphical abstract

To solve the problem of multi-agent path finding (MAPF), a new deep reinforcement learning model with local attention cooperation is proposed in this work.

Abstract

Multi-agent path finding (MAPF) is a challenging multi-agent systems problem where all agents are required to effectively reach their goals concurrently with not colliding with each other and avoiding obstacles. In MAPF, it is a challenge to effectively express the observation of agents, utilize historical information, and effectively communicate with neighbor agents. To tackle these issues, in this work, we proposed a well-designed model that utilizes the local states of nearby agents and outputs an optimal action for each agent to execute. We build the local observation encoder by using residual attention CNN to extract local observations and use the Transformer architecture to build an interaction layer to combine local observations of agents. With the purpose of overcoming the deficiency of success rate, we also designed a new evaluation index, namely extra time rate (ETR). The experimental results show that our model is superior to most previous models in terms of success rate and ETR. In addition, we also completed the ablation study on the model, and the effectiveness of each component of the model was proved.

Public Summary

We build the local observation encoder by using residual attention CNN to extract local observations and use the transformer architecture to build an interaction layer to combine local observations of agents.
To overcome the deficiency of the success rate, we also designed a new evaluation index, namely the extra time rate (ETR).
The experimental results show that our model is superior to most of the previous models in terms of success rate and extra time rate.

FullText(HTML)

References(38)

References

[1]	Stern R, Sturtevant N, Felner A, et al. Multi-agent pathfinding: Definitions, variants, and benchmarks. arXiv: 1906.08291, 2019.
[2]	Hönig W, Kiesel S, Tinka A, et al. Persistent and robust execution of MAPF schedules in warehouses. IEEE Robotics and Automation Letters, 2019, 4 (2): 1125–1131. doi: 10.1109/LRA.2019.2894217
[3]	Wurman P R, D’Andrea R, Mountz M. Coordinating hundreds of cooperative, autonomous vehicles in warehouses. AI Magazine, 2008, 29: 9–19. doi: 10.1609/aimag.v29i1.2082
[4]	Balakrishnan H, Jung Y. A framework for coordinated surface operations planning at Dallas-Fort Worth international airport. In: AIAA Guidance, Navigation and Control Conference and Exhibit. Hilton Head, USA: AIAA, 2007: 6553.
[5]	Baxter J L, Burke E, Garibaldi J, et al. Multi-robot search and rescue: A potential field based approach. Studies in Computational Intelligence, 2007, 76: 9–16. doi: 10.1007/978-3-540-73424-6_2
[6]	Zhang Y, Qian Y, Yao Y, et al. Learning to cooperate: Application of deep reinforcement learning for online AGV path finding. In: AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. Auckland, New Zealand: ACM, 2020: 2077–2079.
[7]	Corrales P A, Gregori F A. Swarm AGV optimization using deep reinforcement learning. In: MLMI '20: Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence. New York: ACM, 2020: 65–69.
[8]	Yu J, LaValle S. Structure and intractability of optimal multi-robot path planning on graphs. In: AAAI'13: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. New York: ACM, 2013: 1443–1449.
[9]	Panait L, Luke S. Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems, 2005, 11: 387–434. doi: 10.1007/s10458-005-2631-2
[10]	Silver D. Cooperative pathfinding. In: First Artificial Intelligence and Interactive Digital Entertainment Conference. Washington, DC, USA: AAAI, 2005: 117–122.
[11]	Sharon G, Stern R, Felner A, et al. Conflict-based search for optimal multi-agent pathfinding. Artificial Intelligence, 2015, 219: 40–66. doi: 10.1016/j.artint.2014.11.006
[12]	Sharon G, Stern R, Felner A, et al. Conflict-based search for optimal multi-agent path finding. In: Proceedings of the International Symposium on Combinatorial Search. Palo Alto, USA: AAAI, 2012: 190.
[13]	Barer M, Sharon G, Stern R, et al. Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem. In: Proceedings of the Seventh Annual Symposium on Combinatorial Search. Prague, Czech: AAAI, 2021, 5: 19–27.
[14]	Boyarski E, Felner A, Stern R, et al. ICBS: Improved conflict-based search algorithm for multi-agent pathfinding. In: IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence. New York: ACM, 2015: 740–746.
[15]	Sartoretti G, Kerr J, Shi Y, et al. PRIMAL: Pathfinding via reinforcement and imitation multi-agent learning. IEEE Robotics and Automation Letters, 2019, 4: 2378–2385. doi: 10.1109/LRA.2019.2903261
[16]	Damani M, Luo Z, Wenzel E, et al. PRIMAL₂: Pathfinding via reinforcement and imitation multi-agent learning-lifelong. IEEE Robotics and Automation Letters, 2021, 6: 2666–2673. doi: 10.1109/LRA.2021.3062803
[17]	Everett M, Chen Y F, How J P. Motion planning among dynamic, decision-making agents with deep reinforcement learning. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, Spain: IEEE, 2018: 3052–3059.
[18]	Li Q, Gama F, Ribeiro A, et al. Graph neural networks for decentralized multi-robot path planning. arXiv: 1912.06095, 2019.
[19]	Li Q, Lin W, Liu Z, et al. Message-aware graph attention networks for large-scale multi-robot path planning. IEEE Robotics and Automation Letters, 2021, 6 (3): 5533–5540. doi: 10.1109/LRA.2021.3077863
[20]	Liu Z, Chen B, Zhou H, et al. MAPPER: Multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, USA: IEEE, 2020: 11748–11754.
[21]	Jansen R, Sturtevant N. A new approach to cooperative pathfinding. In: AAMAS '08: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems. New York: ACM. 2008: 1401–1404.
[22]	Ryan M R K. Exploiting subgraph structure in multi-robot path planning. Journal of Artificial Intelligence Research, 2008, 31: 497–542. doi: 10.1613/jair.2408
[23]	Wang K H C, Botea A. Fast and memory-efficient multi-agent pathfinding. In: ICAPS'08: Proceedings of the Eighteenth International Conference on International Conference on Automated Planning and Scheduling. New York: ACM, 2008: 380–387.
[24]	Aljalaud F, Sturtevant N R. Finding bounded suboptimal multi-agent path planning solutions using increasing cost tree search. In: Proceedings of the Sixth International Symposium on Combinatorial Search. Washington, DC, USA: AAAI, 2013: 203.
[25]	Sharon G, Stern R, Goldenberg M, et al. The increasing cost tree search for optimal multi-agent pathfinding. Artificial Intelligence, 2013, 195: 470–495. doi: 10.1016/j.artint.2012.11.006
[26]	Walker T T, Sturtevant N R, Felner A. Extended increasing cost tree search for non-unit cost domains. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Main track. California: IJCAI, 2018: 534–540.
[27]	Surynek P. On propositional encodings of cooperative path-finding. In: 2012 IEEE 24th International Conference on Tools with Artificial Intelligence. Athens, Greece: IEEE, 2013: 524–531.
[28]	Surynek P. Compact representations of cooperative path-finding as SAT based on matchings in bipartite graphs. In: 2014 IEEE 26th International Conference on Tools with Artificial Intelligence. Limassol, Cyprus: IEEE, 2014: 875–882.
[29]	Surynek P. Reduced time-expansion graphs and goal decomposition for solving cooperative path finding sub-optimally. In: IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence. New York: ACM, 2015: 1916–1922.
[30]	Surynek P, Felner A, Stern R, et al. Efficient SAT approach to multi-agent path finding under the sum of costs objective. In: ECAI'16: Proceedings of the Twenty-Second European Conference on Artificial Intelligence. New York: ACM, 2016: 810–818.
[31]	Yu J, LaValle S M. Multi-agent path planning and network flow. In: Frazzoli E, Lozano-Perez T, Roy N, et al., editors. Algorithmic Foundations of Robotics X. Berlin: Springer, 2013: 157–173.
[32]	Yu J, LaValle S M. Optimal multirobot path planning on graphs: Complete algorithms and effective heuristics. IEEE Transactions on Robotics, 2016, 32 (5): 1163–1177. doi: 10.1109/TRO.2016.2593448
[33]	Lam E, Le Bodic P, Harabor D, et al. Branch-and-cut-and-price for multi-agent pathfinding. Computers & Operations Research, 2022, 144: 105809. doi: 10.1016/j.cor.2022.105809
[34]	Erdem E, Kisa D, Oztok U, et al. A general formal framework for pathfinding problems with multiple agents. Proceedings of the AAAI Conference on Artificial Intelligence, 2013, 27 (1): 290–296. doi: 10.1609/aaai.v27i1.8592
[35]	Chen Y F, Liu M, Everett M, et al. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). Singapore: IEEE, 2017: 285–292.
[36]	Wang B, Liu Z, Li Q, et al. Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robotics and Automation Letters, 2020, 5 (4): 6932–6939. doi: 10.1109/LRA.2020.3026638
[37]	Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv: 2010.11929, 2020.
[38]	Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv: 1801.01290, 2018.

Supplements(0)

Track Citations

Proportional views

Proportional views

Get Citation

PDF

XML

Figure 1. State’s structure of each agent (here, for the red agent). Agents represented as a red square are positioned in the squares, while their goals are represented as a solid circle of the same color. For the current agent (red agent), its local observation contains four channel information: positions of obstacles, positions of nearby agents, goal positions of these nearby agents, and position of its own goal (if exist in the FOV). In addition to the current (red) agent status, other nearby agents’ states are also required (here, take the yellow agent and blue agent, for example).

Figure 2. Model overview. For the input observations, there are local observations of the target agent (red agent) that need to be planned and other agents nearby the target agent (blue agent, yellow agent, etc). For our deep reinforcement learning framework, there are three components: observation encoder, communication block, and decision block. Finally, the estimated optimal action from the policy network is taken as the output of the model.

Figure 3. The structure of LA-Encoder. Firstly, the local observation $ L^i_t $ needs to be convoluted three times with convolution kernels of 1×1 to obtain K tensor, Q tensor, and V tensor. Then, the attention map is carried out on each channel, and it is used as a weight to fully extract local information on each channel. Finally, the local feature extracted by the attention mechanism will be combined with the local guidance vector $\boldsymbol{v}^i_t$ to obtain the intermediate expression.

Figure 4. The structure of Transformer-based communication block (TC-Block). The TC-Block takes the features obtained from the LA-Encoder as input, and regards them as a sequence. In addition, there also need to embed the position of each feature tensor. We use relative coordinates as position information. And inspired by Ref. [37] using Transformer in image classification task, we embed the position information from X coordinate and Y coordinate.

Figure 5. Visual results of success rate. On each subgraph, we list the success rate results of our LACRL and other methods on different obstacle densities. And the higher the column in the histogram, the better the performance of the corresponding model.

Figure 6. Visual results of extra time rate. On each subgraph, we list the extra time rate results of our LACRL and other methods on different environment settings. And in each line graph, the lower the line, the better the performance of the corresponding model.

[1]	Stern R, Sturtevant N, Felner A, et al. Multi-agent pathfinding: Definitions, variants, and benchmarks. arXiv: 1906.08291, 2019.
[2]	Hönig W, Kiesel S, Tinka A, et al. Persistent and robust execution of MAPF schedules in warehouses. IEEE Robotics and Automation Letters, 2019, 4 (2): 1125–1131. doi: 10.1109/LRA.2019.2894217
[3]	Wurman P R, D’Andrea R, Mountz M. Coordinating hundreds of cooperative, autonomous vehicles in warehouses. AI Magazine, 2008, 29: 9–19. doi: 10.1609/aimag.v29i1.2082
[4]	Balakrishnan H, Jung Y. A framework for coordinated surface operations planning at Dallas-Fort Worth international airport. In: AIAA Guidance, Navigation and Control Conference and Exhibit. Hilton Head, USA: AIAA, 2007: 6553.
[5]	Baxter J L, Burke E, Garibaldi J, et al. Multi-robot search and rescue: A potential field based approach. Studies in Computational Intelligence, 2007, 76: 9–16. doi: 10.1007/978-3-540-73424-6_2
[6]	Zhang Y, Qian Y, Yao Y, et al. Learning to cooperate: Application of deep reinforcement learning for online AGV path finding. In: AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. Auckland, New Zealand: ACM, 2020: 2077–2079.
[7]	Corrales P A, Gregori F A. Swarm AGV optimization using deep reinforcement learning. In: MLMI '20: Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence. New York: ACM, 2020: 65–69.
[8]	Yu J, LaValle S. Structure and intractability of optimal multi-robot path planning on graphs. In: AAAI'13: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. New York: ACM, 2013: 1443–1449.
[9]	Panait L, Luke S. Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems, 2005, 11: 387–434. doi: 10.1007/s10458-005-2631-2
[10]	Silver D. Cooperative pathfinding. In: First Artificial Intelligence and Interactive Digital Entertainment Conference. Washington, DC, USA: AAAI, 2005: 117–122.
[11]	Sharon G, Stern R, Felner A, et al. Conflict-based search for optimal multi-agent pathfinding. Artificial Intelligence, 2015, 219: 40–66. doi: 10.1016/j.artint.2014.11.006
[12]	Sharon G, Stern R, Felner A, et al. Conflict-based search for optimal multi-agent path finding. In: Proceedings of the International Symposium on Combinatorial Search. Palo Alto, USA: AAAI, 2012: 190.
[13]	Barer M, Sharon G, Stern R, et al. Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem. In: Proceedings of the Seventh Annual Symposium on Combinatorial Search. Prague, Czech: AAAI, 2021, 5: 19–27.
[14]	Boyarski E, Felner A, Stern R, et al. ICBS: Improved conflict-based search algorithm for multi-agent pathfinding. In: IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence. New York: ACM, 2015: 740–746.
[15]	Sartoretti G, Kerr J, Shi Y, et al. PRIMAL: Pathfinding via reinforcement and imitation multi-agent learning. IEEE Robotics and Automation Letters, 2019, 4: 2378–2385. doi: 10.1109/LRA.2019.2903261
[16]	Damani M, Luo Z, Wenzel E, et al. PRIMAL₂: Pathfinding via reinforcement and imitation multi-agent learning-lifelong. IEEE Robotics and Automation Letters, 2021, 6: 2666–2673. doi: 10.1109/LRA.2021.3062803
[17]	Everett M, Chen Y F, How J P. Motion planning among dynamic, decision-making agents with deep reinforcement learning. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, Spain: IEEE, 2018: 3052–3059.
[18]	Li Q, Gama F, Ribeiro A, et al. Graph neural networks for decentralized multi-robot path planning. arXiv: 1912.06095, 2019.
[19]	Li Q, Lin W, Liu Z, et al. Message-aware graph attention networks for large-scale multi-robot path planning. IEEE Robotics and Automation Letters, 2021, 6 (3): 5533–5540. doi: 10.1109/LRA.2021.3077863
[20]	Liu Z, Chen B, Zhou H, et al. MAPPER: Multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, USA: IEEE, 2020: 11748–11754.
[21]	Jansen R, Sturtevant N. A new approach to cooperative pathfinding. In: AAMAS '08: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems. New York: ACM. 2008: 1401–1404.
[22]	Ryan M R K. Exploiting subgraph structure in multi-robot path planning. Journal of Artificial Intelligence Research, 2008, 31: 497–542. doi: 10.1613/jair.2408
[23]	Wang K H C, Botea A. Fast and memory-efficient multi-agent pathfinding. In: ICAPS'08: Proceedings of the Eighteenth International Conference on International Conference on Automated Planning and Scheduling. New York: ACM, 2008: 380–387.
[24]	Aljalaud F, Sturtevant N R. Finding bounded suboptimal multi-agent path planning solutions using increasing cost tree search. In: Proceedings of the Sixth International Symposium on Combinatorial Search. Washington, DC, USA: AAAI, 2013: 203.
[25]	Sharon G, Stern R, Goldenberg M, et al. The increasing cost tree search for optimal multi-agent pathfinding. Artificial Intelligence, 2013, 195: 470–495. doi: 10.1016/j.artint.2012.11.006
[26]	Walker T T, Sturtevant N R, Felner A. Extended increasing cost tree search for non-unit cost domains. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Main track. California: IJCAI, 2018: 534–540.
[27]	Surynek P. On propositional encodings of cooperative path-finding. In: 2012 IEEE 24th International Conference on Tools with Artificial Intelligence. Athens, Greece: IEEE, 2013: 524–531.
[28]	Surynek P. Compact representations of cooperative path-finding as SAT based on matchings in bipartite graphs. In: 2014 IEEE 26th International Conference on Tools with Artificial Intelligence. Limassol, Cyprus: IEEE, 2014: 875–882.
[29]	Surynek P. Reduced time-expansion graphs and goal decomposition for solving cooperative path finding sub-optimally. In: IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence. New York: ACM, 2015: 1916–1922.
[30]	Surynek P, Felner A, Stern R, et al. Efficient SAT approach to multi-agent path finding under the sum of costs objective. In: ECAI'16: Proceedings of the Twenty-Second European Conference on Artificial Intelligence. New York: ACM, 2016: 810–818.
[31]	Yu J, LaValle S M. Multi-agent path planning and network flow. In: Frazzoli E, Lozano-Perez T, Roy N, et al., editors. Algorithmic Foundations of Robotics X. Berlin: Springer, 2013: 157–173.
[32]	Yu J, LaValle S M. Optimal multirobot path planning on graphs: Complete algorithms and effective heuristics. IEEE Transactions on Robotics, 2016, 32 (5): 1163–1177. doi: 10.1109/TRO.2016.2593448
[33]	Lam E, Le Bodic P, Harabor D, et al. Branch-and-cut-and-price for multi-agent pathfinding. Computers & Operations Research, 2022, 144: 105809. doi: 10.1016/j.cor.2022.105809
[34]	Erdem E, Kisa D, Oztok U, et al. A general formal framework for pathfinding problems with multiple agents. Proceedings of the AAAI Conference on Artificial Intelligence, 2013, 27 (1): 290–296. doi: 10.1609/aaai.v27i1.8592
[35]	Chen Y F, Liu M, Everett M, et al. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). Singapore: IEEE, 2017: 285–292.
[36]	Wang B, Liu Z, Li Q, et al. Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robotics and Automation Letters, 2020, 5 (4): 6932–6939. doi: 10.1109/LRA.2020.3026638
[37]	Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv: 2010.11929, 2020.
[38]	Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv: 1801.01290, 2018.

TrendMD

Volume 53 Issue 4 page: 0404

Cover

Keywords

Article Metrics

Article views (555) PDF downloads(1860)

Learning attention-based strategies to cooperate for multi-agent path finding

Abstract

Graphical abstract

Abstract

Public Summary

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Learning attention-based strategies to cooperate for multi-agent path finding

Share

Tools

Abstract

Graphical abstract

Abstract

Public Summary

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Export File

Citation

Format

Content