Page 19 - 2024S
P. 19
12 UEC Int’l Mini-Conference No.52
Table 1: State set for the proposed algorithm.
State Link Order
S1 q1 ≥ q2 ≥ q3 ≥ q4
S2 q1 ≥ q2 ≥ q4 ≥ q3
S3 q1 ≥ q3 ≥ q2 ≥ q4
S4 q1 ≥ q4 ≥ q2 ≥ q3
S5 q1 ≥ q3 ≥ q4 ≥ q2
S6 q1 ≥ q4 ≥ q3 ≥ q2
S7 q2 ≥ q1 ≥ q3 ≥ q4
S8 q2 ≥ q1 ≥ q4 ≥ q3
S9 q3 ≥ q1 ≥ q2 ≥ q4
S10 q4 ≥ q1 ≥ q2 ≥ q3
S11 q3 ≥ q1 ≥ q4 ≥ q2
S12 q4 ≥ q1 ≥ q3 ≥ q2
S13 q2 ≥ q3 ≥ q1 ≥ q4
S14 q2 ≥ q4 ≥ q1 ≥ q3
S15 q3 ≥ q2 ≥ q1 ≥ q4
S16 q4 ≥ q2 ≥ q1 ≥ q3
S17 q3 ≥ q4 ≥ q1 ≥ q2
S18 q4 ≥ q3 ≥ q1 ≥ q2
S19 q2 ≥ q3 ≥ q4 ≥ q1
S20 q2 ≥ q4 ≥ q3 ≥ q1
S21 q3 ≥ q2 ≥ q4 ≥ q1
S22 q4 ≥ q2 ≥ q3 ≥ q1
S23 q3 ≥ q4 ≥ q2 ≥ q1
S24 q4 ≥ q3 ≥ q2 ≥ q1
2.2 Action Representation prove safety. It integrates data from loop detec-
tors and CAVs to improve accuracy. The queue
Actions in the proposed algorithm involve ad- length detected by both methods is used to cal-
justing signal timings based on current traffic culate the reward, which adjusts the weight of
conditions. Two primary actions can be chosen: the two methods through MPR.
extending the green time by 1 second or not ex- The reward function R(s, a) is defined as fol-
tending it. These actions allow the algorithm to
lows:
dynamically respond to real-time traffic condi-
R(s, a) = (1 − ϵ)q + ϵq ′ (1)
tions and optimize traffic flow.
The action space is represented as follows: where:
• Extend Green Time: Increase the green • q is the queue length estimated by loop de-
signal duration for the current phase by 1 tectors.
second.
′
• q is the queue length detected by CAVs.
• Do Not Extend Green Time: Maintain • ϵ is the current Market Penetration Rate
the current green signal duration. (MPR) of CAVs.
This weighted approach allows the algorithm
2.3 Reward Function
to balance the accuracy of data from different
The reward function is designed to encourage sources and make informed decisions that opti-
actions that reduce traffic congestion and im- mize traffic flow and safety.