Page 21 - 2024S

P. 21

14 UEC Int’l Mini-Conference No.52

4 Discussion 4.2 Robust Analysis
The robustness of the proposed algorithm was
4.1 Q-Learning Algorithm
tested by increasing traffic demand to simu-
The Q-learning algorithm used in this study fol- late peak hours. The results indicated that the
lows a reinforcement learning approach, where proposed algorithm outperformed FT and TAC
the algorithm learns to optimize traffic signal even under higher traffic demands.
timings through trial and error. The key steps
of the algorithm are as follows: 5 Conclusions

• Observe Current State: Analyze the The proposed Q-learning-based adaptive traffic
current traffic conditions at the intersec- signal control algorithm significantly improves
tion. the operational and safety performance of inter-
sections, particularly under high MPRs. The al-
• Choose an Action: Select an action gorithm provides a robust and effective solution
(e.g., change the traffic signal) based on an for managing traffic flow in urban environments,
exploration-exploitation strategy. offering a foundation for future smart city traffic
management systems.
• Execute the Action: Implement the cho- Future work will focus on further refining the
sen action and observe the result. algorithm and testing it in more complex ur-
ban environments. Integrating additional data
• Update Q-Value: Update the Q-value sources and enhancing the learning mechanism
based on the observed reward. If the re- are potential areas for improvement.
sult is positive (reduced congestion), the Q-
value is increased. If the result is negative
(increased congestion), the Q-value is ad- 6 Acknowledgments
justed accordingly.
The authors would like to thank the funding
support from JASSO Scholarship and the assis-
• Repeat until Convergence: Continue
this process until the Q-values stabilize, in- tance provided by the Advanced Wireless Com-
munication Research Center (AWCC) at the
dicating that the algorithm has learned the
University of Electro-Communications.
optimal signal timings.

The Q-learning formula used in this study is:

′
Q(s, a) ← Q(s, a)+α[r+γ max Q(s , a )−Q(s, a)]
′
a
(2)
where:
• Q(s, a) is the current Q-value for state s
and action a.

• α is the learning rate.

• r is the reward.

• γ is the discount factor.

• max a Q(s , a ) is the maximum estimated
′
′
future reward for the next state s .
′

16 17 18 19 20 21 22 23 24 25 26