Reduction of Number of Empty-Truck Trips in Inter-Terminal Transportation Using Multi-Agent Q-Learning
Reduction of Number of Empty-Truck Trips in Inter-Terminal Transportation Using Multi-Agent Q-Learning
Abstract. In a port consisting of multiple container terminals, the demand for transportation of
containers and cargo among port facilities is high. Almost all transshipment containers bound for a
vessel generally are transported from one terminal to another within a short period, which process is
known as inter-terminal transportation (ITT). Adequate ITT planning is required in order to reduce
ITT-related costs. Minimization of the number of Empty-Truck trips has gained attention, as the ITT
problem incurs ITT-related cost. A single Q-Learning-based technique developed in a previous study
for minimization of the number of empty-truck trips required high computational time while learning
from a considerable amount of orders data. This paper proposes multi-agent Q-Learning to improve
the performance offered by the previous single-agent-based model. Our results show that multi-agent
Q-Learning performs better than the single-agent alternative in terms of computation time and,
therefore too, the quality of its results.
Keywords: Inter-Terminal Transportation, Empty-Truck Trips, Multi-agent Q-Learning
1. Introduction
The immense growth of global trade has greatly increased the numbers of containerized shipments, which
fact obliges most major ports to develop more terminals to satisfy the swelling container transport
demand. Adding more terminals, however, increases the demand for transportation of containers between
different terminals in a port, which process is known as inter-terminal transportation (ITT). A shipping
liner has an exclusive contract with a trucking company to transport a container from one terminal to
another. This container transportation process might produce empty-truck trips if the trucking company
does not perform adequate planning. According to [1], [2], one of the efficient-ITT objectives is the
minimization of Empty-Truck trips. Appropriate planning is crucial in achieving such objectives in
support of a port’s competitiveness. [3] states that Empty-Truck trips at container terminals remain a
critical problem, since trucks are still the primary mode of freight transportation in most terminals, and
the trucking company has to pay the costs incurred whether the truck's cargo space is filled or empty.
Busan New Port is the new port of Busan developed by the South Korean government to alleviate cargo
congestion. It operates five container terminals with 23 berths (of 45 planned). According to data
collected in 2013, approximately 2600 containers are moved between the terminals each day [4]. The ITT
evaluation project conducted by Busan Port Authority (BPA) in 2014 [5] indicated how critical efficient
ITT operations are to the competitiveness of many large seaports. In our previous study, [6], we attempted
to alleviate Empty-Truck trips using single-agent Q-Learning. Q-Learning is one of the Reinforcement
Learning (RL) methods that can be applied to solve a broad spectrum of complex real-world problems
such as robotics, manufacturing, and others. Despite its interesting properties, Q-Learning is a prolonged
method that requires a long training time to learn an acceptable policy. Based on our previous study, the
performance of single-agent Q-Learning is acceptable when learning from a small amount of data, but the
performance drops as the data significantly expands in size. In order to solve or at least minimize this
8th International Conference on Information Systems, Logistics and Supply Chain
ILS Conference 2020, April 22-24, Austin, USA
problem, [8] and [9] propose multi-agent models to speed up the learning process and to gain better
results. Thus too, in the present study, multi-agent Q-learning was employed to overcome the
performance problem of the previous single-agent Q-learning.
This paper is organized as follows. Section 2 discusses the related work, section 3 provides the problem
description, section 4 outlines the proposed multi-agent Q-Learning, and section 5 presents the
experimental results. Finally, section 6 draws conclusions and looks forward to future work.
2. Related Work
In the literature, several studies on the reduction of the number of Empty-Truck trips can be found. [10]
conducted a comprehensive review of the Empty-Truck problem at container terminals. The author
identified the causes, benefits, and constraints, and proposed two collaborative approaches, namely
collaborative logistics network (CLN) and shared transportation, to overcome the problem. [11] proposed
a collaborative Truck Appointment System (TAS) to reduce Empty-Truck trips and evaluated that
approach with a simulation model based on a case study with real-world data from Santo Antonio, Chile.
The result showed that the collaborative TAS might be a useful tool to reduce Empty-Truck-trip numbers.
[12] proposed a mathematical approach that combines multiple trips, such as import, export, and inland
trips, in a specific port environment. This study considered a scheme to reduce the number of total
Empty-Truck trips, whereby two 20 ft containers are carried simultaneously on the same truck and
according to the same load unit. [13] used modified Q-Learning to find a more optimal travel plan for an
on-demand bus system. Most of the previous studies mentioned above tackled problems with similar
characteristics to those of ITT-related problems, such as the transportation mode used, the transportation
demand, the transportation capacity, the order time-window, and transportation trip plans. Utilization of
the RL approaches to solve ITT-related problems, meanwhile, remains limited. But certainly, as
emphasized in [9], performance improvement of single-agent RL remains an interesting topic of
discussion as well as an opportunity for further research.
3. Problem Description
ITT refers to container transportation between separated port facilities such as container terminals, empty
container depots, repair stations, logistics facilities, dedicated transport terminals, and administrative
facilities. Essential information needing to be provided in ITT operations includes ITT demand, origin,
destination, delivery time-windows, available modes, and available connections [14]. In our present case,
all of the essential information was known in advance. A solution to the ITT problem had to satisfy the
following feasibility restrictions: each transport task has to be performed, and it must be performed by
only one truck; the transport task has to be performed within its given time window. In the beginning, the
starting location of the truck is terminal 1. After the truck has served a task, its ending location will be the
destination of the task it has performed. The objective of our case is to produce a trip plan that entails the
minimum number of Empty-Truck trips for a given task list. Utilization of RL helped to determine which
task was to be served next under certain conditions in order to minimize the number of Empty-Truck
trips.
In our previous study [6], single-agent Q-learning was utilized to learn from different numbers of
container transportation tasks (10, 20, 40, 80, 180) consisting of task id, origin, destination, start, and end
of time window. Q-learning required 59 seconds for learning from the 180 tasks, and which time duration
increased drastically for more than 180 tasks. This performance issue became a significant problem when
Q-learning had to learn from real ITT data that might contain thousands of tasks.
This section presents single-agent Q-learning along with the proposed multi-agent Q-learning. Fig. 1
provides an overview of the RL component, which consists of state (s), action (a), and reward (r). An
8th International Conference on Information Systems, Logistics and Supply Chain
ILS Conference 2020, April 22-24, Austin, USA
agent interacts with the environment by executing an action in the current state; the agent receives a
reward from the environment and moves to the next state. The reward acts as feedback that indicates how
good the actions chosen by the agent from a state were.
Q-learning, the most popular among the RL algorithms, is categorized as a model-free algorithm, since it
does not require any knowledge of the agent's environment. In single-agent Q-learning, the environment
is mapped into a finite number of states. In any state, an agent can choose an action according to a given
policy. The agent learns the optimal state-action value for each state-action pair based on the principle of
Dynamic Programming (DP) and realizes it with the Bellman Equation (BE) [15]. The agent attempts to
determine the optimal policy in order to maximize the sum of discounted expected rewards [16]. Single-
agent Q-learning is updated using the following equation (1)[17]:
Fig. 2 shows the learning architecture of multi-agent Q-learning. The design of the proposed multi-agent
Q-learning was adopted from [9], which implemented one of the partitioning techniques known as
domain decomposition.
This technique will divide the whole data, which is learned by RL, into several groups. And, each group
of data will be assigned to an identical agent. In other words, the learning process of each agent operating
over a different portion of the Q-table will focus on one subgroup of states. An agent maintains its own
Q-table while learning. At the same time, at every end of the learning episode, an agent must store its Q-
value to the global Q-table by considering the specific updating rule. The structures of the global and
local Q-tables are the same. The local Q-table is used by an agent for storage of Q-values as learning
results from a segment of the data. The global Q-table is used to save the optimal learning results from all
of the agents.
With the architecture shown in Fig. 2, synchronization between the global and local Q-tables becomes a
challenging issue. A specific rule must be applied to prevent the race condition and an unexpected
overwriting value in the global Q-table. The following rule is applied when updating the Q-value in the
global Q-table:
. (2)
The agent performs the process of updating the global Q-table after finishing one episode of learning, and
notably, this process runs only one way, from local Q-table to global Q-table.
5. Experimental Results
The algorithm was implemented in Python and run on a PC equipped with an Intel® Xeon® CPU E3-
1230 v5 of 3.40 GHz and 16GB memory. To assess the proposed method, we considered three scenarios
for the numbers of tasks (250, 500, and 1000 tasks, respectively) and three scenarios for the numbers of
agents (1, 2, and 4 agents). For the Q-learning configurations, the γ value was set to 0.9, and α was set to
0.01; the algorithm ran for 100 episodes.
TABLE 1 compares the computational time and Empty-Truck trips for each scenario. The single agent
here was used as a baseline for the comparison in order to determine whether the multi-agents obtain a
better result of not. In terms of computational time, all of the scenarios showed a slight decrement within
the range of 1 to 3 seconds when the number of agents was increased. On the contrary, the increasing
number of agents did not always lead to the minimal number of Empty-Truck trips. In both respects,
scenario 3 obtained better results than did scenarios 1 or 2.
Number of tasks
Num. of the 250 Tasks 500 Tasks 1000 Tasks
agent(s) Time Speedup Time Speedup Time Speedup
1 564.976 s --- 955.672 s --- 1404.323 s ---
2 562.137 s 2.839 s 953.385 s 2.287 s 1401.285 s 3.038 s
4 564.027 s 0.949 s 954.154 s 1.518 s 1403.836 s 0.487 s
8th International Conference on Information Systems, Logistics and Supply Chain
ILS Conference 2020, April 22-24, Austin, USA
TABLE 2 shows the speedup improvement of the computational time for each scenario. Overall, the
increasing number of agents decreased the computational time by up to 3 seconds. The use of two agents
in all of the task scenarios obtained a better speedup (2.7213 seconds of average decrement) in
computational time than did the use of four agents, which obtained 0.984 seconds of average decrement.
Figure 3. Rewards comparison between single-agent RL, two agents, and four agents for scenario 3
Fig. 3 shows the rewards comparison between the single-agent and multi-agents for scenario 3. At the end
of the episode, single-agent RL obtains better rewards than did multi-agent RL. Nonetheless, overall,
multi-agent RL outperformed single-agent RL, since in most episodes, such as 20 to 75, it gained better
rewards than did single-agent RL.
6. Conclusion
In this paper, we have presented a multi-agent Q-learning implementation. The multi-agent architecture
adopts the concept of domain decomposition, which divides the data into several groups, and assigns each
group to identical RL agents to be learned. The results obtained show that, overall, multi-agent RL has the
potential to speed up the computational time by increasing the proper number of agents, and, thereby, to
achieve a better result. The critical challenge when dealing with multi-agents is the cost of
synchronization between the global Q-table and the agent's local Q-table, which might affect the overall
computational time. This notwithstanding, domain decomposition will enable the multi-agents to handle
the larger problem. The applicability of the proposed approach, however, remains limited by the
following factors: the number of container terminals is fixed (i.e., five), the number of tasks is static (i.e.,
known in advance), and the fact that the trip plan produced by this approach considers only one available
truck container and only one container size. The study of ITT is drawing more attention these days,
simply because container port competitiveness is so critical. Designing RL methods to solve more
complex and more realistic ITT problems, such as multi-objective inter-terminal truck routing,
collaborative inter-terminal transportation, and dynamic trip planning, is the goal and the opportunity of
future research.
Acknowledgement.
This research was a part of the project titled ‘Development of IoT Infrastructure Technology for Smart
Port’, funded by the Ministry of Oceans and Fisheries, Korea.
7. References
1. Duinkerken, M., Dekker, R., Kurstjens, S., Ottjes, J. and Dellaert, N.: Comparing transportation systems for
inter-terminal transport at the Maasvlakte container terminals, OR Spectrum. vol. 28. no. 4. pp. 469-493 (2006)
2. Tierney, K., Voß, S., and Stahlbock, R.: A mathematical model of inter-terminal transportation. European
Journal Operation Research 235(2). pp.448–460 (2014)
8th International Conference on Information Systems, Logistics and Supply Chain
ILS Conference 2020, April 22-24, Austin, USA
3. Islam, S.: Empty truck trips problem at container terminals. Business Process Management Journal. vol. 23. no.
2. pp. 248-274 (2017)
4. Jin, X., and Kim, K.: Collaborative Inter-Terminal Transportation of Containers. Industrial Engineering &
Management Systems. vol. 17. no. 3. pp. 407-416 (2018)
5. Kopfer, H., Jang, D., and Vornhusen, B.: Scenarios for Collaborative Planning of Inter-Terminal
Transportation. Lecture Notes in Computer Science. pp. 116-130 (2016)
6. Adi, T., Iskandar, Y., Bae, H.: Q-Learning-based Technique for Reduction of Number of Empty-Truck Trips in
Inter-Terminal Transportation, 14th International Conference on innovative Computing, Information and
Control (ICICIC2019), August 26-29, 2019, Seoul, South Korea.
7. Zhou, T., Hong, B., Shi, C. H. Z.: Cooperative behaviour acquisition based modular Q-Learning in multi-agent
system. Proceeding of the fourth international conference on machine learning and cybernetics. Guangzhou. 18-
21 August (2005)
8. Busoniu, L., Babuska, R., and De Schutter, B.: A Comprehensive Survey of Multiagent Reinforcement
Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). vol. 38.
no. 2. pp. 156-172 (2008)
9. Printista, A., Errecalde, M., and Montoya, C.: A parallel implementation of Q-learning based on
communication with cache. Journal of Computer Science & Technology. vol. 1, no. 6 (2002)
10. Islam, S.: Empty truck trips problem at container terminals. Business Process Management Journal. vol. 23, no.
2. pp. 248-274 (2017)
11. Schulte, F., González, R. and Voß, S.: Reducing Port-Related Truck Emissions: Coordinated Truck
Appointments to Reduce Empty Truck Trips. Lecture Notes in Computer Science. pp. 495-509 (2015)
12. Caballini, C., Rebecchi, I. and Sacone, S.: Combining Multiple Trips in a Port Environment for Empty
Movements Minimization. Transportation Research Procedia. vol. 10. pp. 694-703 (2015)
13. Mukai, N., Watanabe, T. and Feng, J.: Route Optimization Using Q-Learning for On-Demand Bus Systems.
Lecture Notes in Computer Science. pp. 567-574 (2008)
14. Heilig, L., Lalla-Ruiz, E. and Voß, S.: port-IO: an integrative mobile cloud platform for real-time inter-terminal
truck routing optimization. Flexible Services and Manufacturing Journal. vol. 29, no. 3-4, pp. 504-534, (2017)
15. Bellman, R.E.: Dynamic programming, Proc. Natl. Acad. Sci. USA 42 (10) (1957)
16. Pashenkova, E., Rish, I., Dechter, R.: Value iteration and policy iteration algorithms for Markov decision
problem. AAAI 96: in: Workshop on Structural Issues in Planning and Temporal Reasoning. (1996)
17. Watkins, C.J., Dayan, P., Technical note: Q-learning, Mach. Learn. 8 (3–4) (1992)