Page 86 - 2025S
P. 86

UEC Int’l Mini-Conference No.54                                                               79

                    Proximal Policy Optimization for Efficient D2D-assisted Computation

                    Offloading and Resource Allocation in Multi-Access Edge Computing


                                                 Chen Zhang, CelimugeWu
                                         Department of Computer and Network Engineering
                                            The University of Electro-Communications
                                                       Tokyo, Japan


             Keywords: Multi-access edge computing (MEC); 5G networks; Device-to-Device (D2D); Proximal Policy Optimization
             (PPO); Markov Decision Process (MDP); computation offloading; collaborative offloading; resource allocation.


                                                        Abstract

                      In advanced 5G and beyond networks, Multi-access Edge Computing (MEC) is increasingly
                    acknowledged as a promising technology, offering a dual advantage of reducing energy utilization in
                    cloud data centers while catering to the demands for reliability and real-time responsiveness in end
                    devices. However, the inherent complexity and variability of MEC networks pose significant
                    challenges in computational offloading decisions. Addressing this, we propose a Proximal Policy
                    Optimization (PPO)-based Device-to-Device (D2D) assisted computation offloading and resource
                    allocation scheme. We construct a realistic MEC network environment and develop a Markov Decision
                    Process (MDP) model that minimizes time loss and energy consumption. The integration of a D2D
                    communication-based offloading framework allows for collaborative task offloading between end
                    devices and MEC servers, enhancing both resource utilization and computational efficiency. The MDP
                    model is solved using the PPO algorithm in Deep Reinforcement Learning to derive an optimal policy
                    for offloading and resource allocation. Comparative analysis with three baseline approaches shows our
                    scheme's superior performance in latency, energy consumption, and algorithmic convergence,
                    demonstrating its potential in improving MEC network operations in the context of emerging 5G
                    technologies.






































                    *The author is supported by (fee-exempted) MEXT Scholarship
   81   82   83   84   85   86   87   88   89