Page 87 - 2025S

P. 87

80 UEC Int’l Mini-Conference No.54

Proximal Policy Optimization for Efficient D2D-assisted Computation
Offloading and Resource Allocation in Multi-Access Edge Computing

Chen Zhang,CelimugeWu
Department of Computer and Network Engineering
The University of Electro-Communications
Tokyo, Japan

1. Introduction We adopt the Proximal Policy Optimization (PPO) algorithm
This research presents a PPO-based computation to solve the MDP due to its stability and effectiveness in large-
offloading and resource allocation scheme for D2D- scale, uncertain environments. The left table summarizes the
assisted Mobile Edge Computing (MEC) networks. As key environment settings used in our real MEC deployment,
illustrated in Fig. 1, a realistic MEC environment is while the right table lists the hyperparameters configured for
constructed using ad-hoc communication and PPO and baseline algorithms during training.
heterogeneous devices, enabling User Equipment (UE) to
dynamically choose among local, edge, D2D, and
migration computing modes. To address the complexity
and variability of MEC systems, we formulate a Markov
Decision Process (MDP) incorporating key features such
as CPU utilization, transmission delay, task execution time,
and energy consumption. The PPO algorithm is employed 4. Performance Evaluation
to optimize the offloading strategy, aiming to minimize
latency and energy cost. Experimental results show that the To validate the proposed PPO-based offloading scheme,
proposed scheme outperforms baseline methods in delay, As shown in Fig.2, we conducted experiments in a real
energy efficiency, and convergence speed. MEC testbed composed of Raspberry Pi UEs and laptop-
based MEC servers. The system was evaluated under a
YOLOv5 object detection task with varying
computational loads.

Fig. 1 Multi-access edge computing scenarios.
2. Research Objectives
This work aims to improve task offloading efficiency in Fig. 2 Experimental environment.
dynamic D2D-assisted MEC networks using reinforcement
learning. We (i) construct a realistic MEC environment
with physical devices, (ii) model the offloading problem as
a Markov Decision Process (MDP), (iii) propose a PPO-
based strategy to minimize latency and energy under four
offloading modes (Local, Edge, D2D, Migration), and (iv)
evaluate performance against DQN, A2C, and random
baselines in terms of convergence, cost, and stability. Fig. 3 (left) Comparison of average reward values across different
schemes; (right) Comparison of average costs across different schemes.
3. Methodology
3.1 Problem Formulation
We formulate the computation offloading and resource
allocation problem as a Markov Decision Process (MDP)
to capture the dynamic nature of MEC systems with D2D
collaboration.
State Space (� ):CPU utilization, CPU utilization, Task
execution time, Task execution time, Number of tasks, Fig. 4 (left) Comparison of time consumption across different schemes;
Energy consumption (transmission + computation). (right) Comparison of energy consumption across different schemes.
Action Space (� ∈ A):{Local computing, D2D computing, 5. Future Work
Edge computing, Migration}. Future research will focus on integrating privacy-
Reward Function (� ): r =− (α ⋅ Delay + β ⋅ Energy) preserving mechanisms into D2D offloading, improving
The goal is to minimize system delay and energy scalability for large-scale MEC networks, and exploring
consumption, where � , � are weight factors. multi-agent reinforcement learning for decentralized
3.2 PPO-Based Optimization decision-making in dynamic environments.

82 83 84 85 86 87 88 89