Page 87 - 2025S
P. 87

80                                                                UEC Int’l Mini-Conference No.54



                    Proximal Policy Optimization for Efficient D2D-assisted Computation
                    Offloading and Resource Allocation in Multi-Access Edge Computing

                    Chen Zhang,CelimugeWu
                    Department of Computer and Network Engineering
                    The University of Electro-Communications
                    Tokyo, Japan

                   1. Introduction                             We adopt the Proximal Policy Optimization (PPO) algorithm
                   This research presents a PPO-based computation   to solve the MDP due to its stability and effectiveness in large-
                   offloading and resource allocation scheme for D2D-  scale, uncertain environments. The left table summarizes the
                   assisted Mobile Edge Computing (MEC) networks. As   key environment settings used in our real MEC deployment,
                   illustrated in Fig. 1, a realistic MEC environment is   while the right table lists the hyperparameters configured for
                   constructed using ad-hoc communication and   PPO and baseline algorithms during training.
                   heterogeneous devices, enabling User Equipment (UE) to
                   dynamically choose among local, edge, D2D, and
                   migration computing modes. To address the complexity
                   and variability of MEC systems, we formulate a Markov
                   Decision Process (MDP) incorporating key features such
                   as CPU utilization, transmission delay, task execution time,
                   and energy consumption. The PPO algorithm is employed   4. Performance Evaluation
                   to optimize the offloading strategy, aiming to minimize
                   latency and energy cost. Experimental results show that the   To validate the proposed PPO-based offloading scheme,
                   proposed scheme outperforms baseline methods in delay,   As shown in Fig.2, we conducted experiments in a real
                   energy efficiency, and convergence speed.    MEC testbed composed of Raspberry Pi UEs and laptop-
                                                                based MEC servers. The system was evaluated under a
                                                                YOLOv5 object detection task with varying
                                                                computational loads.







                           Fig. 1 Multi-access edge computing scenarios.
                   2. Research Objectives
                   This work aims to improve task offloading efficiency in    Fig. 2 Experimental environment.
                   dynamic D2D-assisted MEC networks using reinforcement
                   learning. We (i) construct a realistic MEC environment
                   with physical devices, (ii) model the offloading problem as
                   a Markov Decision Process (MDP), (iii) propose a PPO-
                   based strategy to minimize latency and energy under four
                   offloading modes (Local, Edge, D2D, Migration), and (iv)
                   evaluate performance against DQN, A2C, and random
                   baselines in terms of convergence, cost, and stability.   Fig. 3 (left) Comparison of average reward values across different
                                                                schemes; (right) Comparison of average costs across different schemes.
                   3. Methodology
                   3.1 Problem Formulation
                   We formulate the computation offloading and resource
                   allocation problem as a Markov Decision Process (MDP)
                   to capture the dynamic nature of MEC systems with D2D
                   collaboration.
                   State Space (� ):CPU utilization, CPU utilization, Task
                   execution time, Task execution time, Number of tasks,    Fig. 4 (left) Comparison of time consumption across different schemes;
                   Energy consumption (transmission + computation).  (right) Comparison of energy consumption across different schemes.
                   Action Space (� ∈ A):{Local computing, D2D computing,   5. Future Work
                   Edge computing, Migration}.                 Future research will focus on integrating privacy-
                   Reward Function (� ): r =− (α ⋅ Delay + β ⋅ Energy)  preserving mechanisms into D2D offloading, improving
                   The goal is to minimize system delay and energy   scalability for large-scale MEC networks, and exploring
                   consumption, where � , �  are weight factors.  multi-agent reinforcement learning for decentralized
                   3.2 PPO-Based Optimization                  decision-making in dynamic environments.
   82   83   84   85   86   87   88   89