Tutorial

Exploring Reinforcement Learning for 6G Networks: Fundamentals, Applications to the MAC Layer, and Future Directions

Topics:

  • Foundation of Single-Agent Reinforcement Learning (RL)
    • Definitions: State, agent, environment, and rewards.
    • Comparison: RL vs supervised, unsupervised, and imitation learning.
    • Key Components: Model, policy, value function.
    • Types of RL Agents: Model-based and model-free.
    • Markov Decision Process (MDP): Formal framework for decision-making.
    • Value Functions and Optimal Policies: Deterministic and stochastic policies.
    • Algorithms: Q-learning, Deep Q-Networks (DQN), and policy gradient methods.
  • Overview of Multi-Agent Reinforcement Learning (MARL)
    • Extension of RL to Multi-Agent Settings: Cooperative, competitive, and mixed environments.
    • Challenges: Non-stationarity, scalability, and partial observability.
    • Formal Framework: Multi-agent Partially Observable MDP (MPOMDP).
    • Learning Methods: Fully centralized learning (FLC), independent learning (IL), and centralized training with decentralized execution (CTDE).
    • MARL model-free algorithms: actor-critic and policy gradient approaches.
  • Applications of MARL in the 6G MAC Layers
    • 6G and AI: AI for Networks (AI4NET) and Networks for AI (NET4AI).
    • Motivations: Optimizing wireless channel access and MAC signaling.
    • MARL frameworks for MAC Protocols:
      • System model and problem formulation.
      • Performance analysis: convergence time and communication efficiency.
      • Challenges: adapting to dynamic conditions and training overhead.
      • Advances in MARL for the 6G MAC Layers
    • Generalization of MAC Protocols:
      • Abstraction and abstracted MPOMDP (AMPOMDP).
      • Autoencoder-based methods for providing the abstracted observation space.
    • Intrinsic Rewards for faster MAC protocol learning via MARL:
      • Definition of intrinsic reward and lifetime
      • Combining intrinsic and extrinsic rewards for efficient training.
    • Current Challenges and Future Directions: Scalability, robustness, and real-time adaptation for evolving 6G networks.

Summary:

Sixth-generation (6G) networks are expected to provide ultra-low latency, extreme network communication speed, and high-performance connectivity to a massive number of devices. In addition to new wireless technologies, it is expected that artificial intelligence (AI) will play a defining role in the end-to-end development of 6G networks, spanning across their design, deployment, and operational phases. Several studies have highlighted the potential of AI in enhancing the physical layer, and now this tutorial ventures into exploring the cutting-edge possibilities of reinforcement learning (RL) applied to the wireless medium access control (MAC) layer. Firstly, the RL framework for single-agent domains and the Markov decision process (MDP) are introduced, followed by an overview of the most adopted model-free algorithms. Secondly, we extend the single-agent setting to the multi-agent RL (MARL) domain, focusing on the related issues (such as non-stationarity, scalability, and partial observability), and learning methods, e.g., centralized training with decentralized execution (CTDE). Thirdly, we present a literature review of the application of MARL to learn emerging wireless MAC protocols and focus on some drawbacks, such as the limited adaption to changing environments and traffic patterns, and the slowness in re-training operations. Finally, we provide the attendees with insights into advanced technologies to deal with these issues based on the concepts of state abstraction and intrinsic reward learning. To conclude, we will shed light on several up-to-date challenges and potential research directions. The participants need to have basic knowledge of machine learning and deep learning.

Motivation:

Wireless communication systems are becoming increasingly complex, especially in the context of sixth-generation (6G) networks, which aim to support eXtreme ultra-reliable low-latency communication (xURLLC) with massive connectivity. To deal with this complexity, the convergence of communication networks and AI is rapidly evolving, opening up the potential for groundbreaking advancements in communication technology. Among the plethora of artificial intelligence (AI) techniques, reinforcement learning (RL) is gaining attention to fully explore the potential of 6G networks because it allows for the development of solutions without the need for creating expensive and single-configuration datasets. Moreover, given the nature of communication scenarios with several entities interacting with each other, the extension from RL to multi-agent RL (MARL) permits new opportunities for innovative solutions in areas such as spectrum management, resource allocation, network optimization, and autonomous networking. This tutorial aims to help the audience understand the basic principles of single-agent RL and MARL frameworks, particularly within the context of communication networks. Furthermore, it endeavors to provide attendees with insights into the application of MARL techniques in the realm of wireless medium access control (MAC) protocol learning.

The motivation behind this tutorial is to showcase how RL-based approaches can offer innovative solutions to enhance the performance metrics of communication networks, including throughput, latency, fairness, and energy efficiency. Additionally, it presents a fresh look at innovative solutions that adapt RL frameworks to changing network conditions and traffic patterns.

Presenters:

Luciano Miuccio,
University of Catania, Italy

Salvatore Riolo
University of Catania, Italy

Target Audience:

This topic is timely and aligns perfectly with the target audience of IEEE EUROCON. While many researchers work on machine learning for communication and networking within the context of 6G, few have expertise in RL, especially in MARL. A tutorial dedicated to providing basic knowledge on RL and MARL, covering theory to applications within the specific domain of communication networks, along with insights into advanced technologies, would make a significant contribution to the attendees. The EUROCON audience, concerned with wireless transmissions and networks, can learn how to approach RL and MARL frameworks in the context of wireless communication for 6G.

Given the nature of the tutorial, the intended audience is diverse, encompassing individuals with varying levels of expertise in wireless communications, artificial intelligence, and networking. The audience may include. 1) Individuals involved in research or academic study in the fields of AI, MARL, wireless communications, and specifically the MAC layer. 2) Professionals working in the telecommunications industry, particularly those focused on developing and implementing MAC layer protocols for next-generation wireless networks. 3) Professionals interested in exploring cutting-edge techniques and applications in 6G technology, including how RL and MARL can optimize MAC layer performance. The expected # of attendees is between ten and twenty.

Outline of Tutorial:

The tutorial proposal is for a half-day (3 hours) format and is organized into five parts, including a short break.

  1. Single-agent RL frameworks. [45 min.] Introduction to RL. RL vs supervised learning, unsupervised learning, and imitation learning. Definition of state, agent, environment, and rewards. Components of an RL agent: model, policy, and value function. Model-based vs model-free RL agents. Markov decision process (MDP). Deterministic and stochastic policies. State-value function and action-value function. Bellman Expectation Equation. Optimal value functions and optimal policies. Temporal-Difference Learning. On-policy and off-policy learning. Model-free RL algorithms: Q-learning, deep Q-networks (DQN), and policy gradient methods.
  2. Multi-agent RL frameworks. [30 min.] Extention from the single-agent RL to MARL. Different types of multi-agent environments (cooperative, competitive/adversarial, mixed). Formulation of the MARL with centralized rewards. Non-stationarity, scalability, and partial observability. Multi-agent partially observable MDP (MPOMDP). Learning methods [1]: fully centralized learning (FLC), independent learning (IL), and centralized training with decentralized execution (CTDE). Adaption of model-free single-RL algorithms to MARL. Actor-critic approach.Break [15 min.]
  3. Applications of MARL on 6G MAC. [45 min.] 6G Use cases & scenarios. AI for Network (AI4NET) and Network for AI (NET4AI). Motivations in applying RL on the MAC layer. Learning of optimal MAC signaling and wireless channel access through MARL [2]: system model; problem formulation; methodology; convergence time and communication performance; limits in adaption to changing network conditions and traffic patterns and in training time. The emergence of wireless MAC protocols with MARL [3]: methodology, results, and limits.
  4. Advances in MARL on 6G MAC. [30 min.] Learning generalized wireless MAC communication protocols via abstraction [4], [5]: the problem of generalization outside of the training distribution; learning in abstracted space invariant across tasks; abstract formulation; abstracted MPOMDP; observation abstraction and abstracted policies; abstraction via autoencoder architecture; results. Learning Intrinsic Rewards for Faster MARL-based MAC Protocol Design in 6G Wireless Networks [6]: Intrinsic reward learning, extrinsic reward and lifetime; overall reward function; episodic overall return, lifetime extrinsic return; training algorithm overview; results. Up-to-date challenges and potential research directions.
  5. Recap and Q&A [15min.]

Additional comments:

The tutorial speakers have previously delivered a tutorial titled ”Improving Wireless Next-Generation Industrial IoT (IIoT) Networks with Reinforcement Learning” at the IEEE RTSI 2024 conference. This tutorial can be accessed at the following link: ”https://2024.ieee-rtsi.org/tutorial-3/”. In that session, the speakers addressed the enhancement of wireless next-generation IIoT networks, which are known for their highly heterogeneous nature, including diverse types of traffic, varying numbers and types of devices, distinct service areas, different mobility patterns, and other complex factors. To tackle these challenges, the speakers introduced a 3GPP-based non-public network (NPN) architecture, empowered by the (multi-agent) reinforcement learning paradigm. This architecture enables industrial networks to dynamically learn from their environment and autonomously adjust to the specific requirements of industrial applications, such as Quality of Service (QoS), without the need for constant human supervision or large pre-labeled datasets. During the tutorial, two case studies were presented to demonstrate the practical effectiveness of this approach. These case studies offered real-world insights into how (multi-agent) reinforcement learning can optimize network performance and meet the complex demands of modern IIoT applications.

This past tutorial experience showcases the speakers’ proficiency in delivering advanced technical content related to wireless communication networks and the application of reinforcement learning, making them well-qualified to present the current tutorial proposal.

References:

[1] L. Miuccio, S. Riolo, M. Bennis, and D. Panno, “Design of a feasible wireless MAC communication protocol via multi-agent reinforcement learning,” in 2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN), 2024, pp. 94–100.
[2] A. Valcarce and J. Hoydis, “Toward joint learning of optimal MAC signaling and wireless channel access,” IEEE Transactions on Cognitive Communications and Networking, vol. 7, no. 4, pp. 1233–1243, Dec. 2021.
[3] M. P. Mota, A. Valcarce, J.-M. Gorce, and J. Hoydis, “The emergence of wireless MAC protocols with multi-agent reinforcement learning,” in Proceedings of the 2021 IEEE Globecom Workshops (GC Wkshps), Madrid, Spain, Dec. 2021, pp. 1–6.
[4] L. Miuccio, S. Riolo, S. Samarakoon, D. Panno, and M. Bennis, “Learning generalized wireless MAC communication protocols via abstraction,” in Proceedings of the GLOBECOM 2022 – 2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, Dec. 2022, pp. 2322–2327.
[5] L. Miuccio, S. Riolo, S. Samarakoon, M. Bennis, and D. Panno, “On learning generalized wireless MAC communication protocols via a feasible multi-agent reinforcement learning framework,” IEEE Transactions on Machine Learning in Communications and Networking, vol. 2, pp. 298–317, 2024.
[6] L. Miuccio, S. Riolo, M. Bennis, and D. Panno, “On learning intrinsic rewards for faster multi-agent reinforcement learning based MAC protocol design in 6G wireless networks,” in Proceedings of the ICC 2023 – IEEE International Conference on Communications, Rome, Italy, May 2023, pp. 466–471.