RailGoerl24: Görlitz Rail Test Center CV Dataset 2024

Rustam Tagiew1, Ilkay Wunderlich2, Mark Sastuba1, Kilian Göller3 and Steffen Seitz3 1Rustam Tagiew and Mark Sastuba are with the German Centre for Rail Traffic Research at the Federal Railway Authority (DZSF), Dresden, Germany. This is not an official statement, guideline or directive of the German Federal Railway Authority. [email protected]2Ilkay Wunderlich is with EYYES GmbH, Gedersdorf, Austria [email protected]3Steffen Seitz and Kilian Göller are with the Chair of Fundamentals of Electrical Engineering of Dresden University of Technology, Dresden, Germany. Steffen Seitz is also with the Conrad Zuse School of Embedded Composite AI (SECAI). [email protected]©2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. DOI: : 10.1109/ERAS63351.2025.11135724
Abstract

Driverless train operation for open tracks on urban guided transport and mainline railways requires, among other things automatic detection of actual and potential obstacles, especially humans, in the danger zone of the train’s path. Machine learning algorithms have proven to be powerful state-of-the-art tools for this task. However, these algorithms require large amounts of high-quality annotated data containing human beings in railway-specific environments as training data. Unfortunately, the amount of publicly available datasets is not yet sufficient and is significantly inferior to the datasets in the road domain. Therefore, this paper presents RailGoerl24, an on-board visual light Full HD camera dataset of 12 205 frames recorded in a railway test center of TÜV SÜD Rail, in Görlitz, Germany. Its main purpose is to support the development of driverless train operation for guided transport. RailGoerl24 also includes a terrestrial LiDAR scan covering parts of the area used to acquire the RGB data. In addition to the raw data, the dataset contains 33 556 boxwise annotations in total for the object class ‘person’. The faces of recorded actors are not blurred or altered in any other way. RailGoerl24, available at data.fid-move.de/dataset/railgoerl24, can also be used for tasks beyond collision prediction.

I INTRODUCTION

Refer to caption
Figure 1: Examples of recorded and annotated scenarios from RailGoerl24. The top left and center corners show several persons without warning jackets. The top right corner shows several persons wearing warning jackets. In the bottom left corner is a person in a kangaroo costume. The bottom center presents a person lying on the tracks and the bottom right corner shows a person with a child. Annotating rectangles are orange.

In automatic train operation (ATO), technical systems replace the functions of the train driver and other operating staff. ATO includes different grades of automation (GoA) [1]. The higher the GoA, the more functions are replaced, up to GoA4 with no staff on-board. At least in the European context, regulations and technical standards differ significantly for mainline railways and urban guided transport like subways, light rail, and trams. In the case of GoA2, this leads to the parallel development of similar systems such as the European Train Control System (ETCS) for mainline railways and Communication-Based Train Control (CBTC) for urban systems. However, for the person detection function from GoA3 onwards, it is expected that related system components will be developed by applying machine learning (ML) to shared datasets.

Monitoring the danger zone of the train’s path is one of the main tasks of the train driver, who predicts collisions and acts accordingly. This task is the most challenging as it impedes an upgrade to GoA3 and GoA4 (GoA3+) where no driver is required [2]. In this regard, GoA3+ of mainline trains differs from fully automated metros, such as the Nuremberg U-Bahn. They are also known as “horizontal lifts”, closed systems with traffic running in isolated and mostly enclosed environments, such as tunnels. As a result, no person detection is needed and, consequently, no computer vision (CV) systems are required onboard. In contrast, open mainline and open urban rail systems are subject to environmental interferences, e.g., at level crossings or on unfenced tracks.

The CV dataset RailGoerl24[3], presented in this paper, is proposed to facilitate the ML-based development of onboard person detection systems, as well as stationary automated security surveillance and occupational safety systems. Additionally, the few currently available related real-world datasets including objects in the railway environment are listed in Sec.II.

II Existing Datasets

To the best of the authors’ knowledge, datasets listed in Tab.I are the CV datasets recorded by real-world frontal on-board sensors of a mainline or urban railway train including object annotations that have explicitly been published for free use by the research community. Purely synthetic or datasets of artificially injected objects like RailFOD23[4] and SRLC[5] are not included in Tab.I. The listed datasets contain annotated single RGB-camera frames from video sequences, LiDAR recordings and rarely recordings from other sensors.

The datasets RailSem19[6], RAWPED[7] and rail-hp8ij[8] contain single RGB camera images of mainline or urban railway scenes with polygonal annotations of persons. RailSem19[6] additionally provides dense pixel-wise semantic segmentation of persons. RailEye3D[9] is a stereo RGB HD dataset containing polygonal annotations of persons on platforms mainly for the task of safe door operation. ESRORAD[10] is a multi-sensor dataset recorded by a car driving on paved tram rails. RailVID[11] is an open polygonal annotated infrared dataset for urban railways. The first open annotated multi-sensor dataset for mainline railways is OSDaR23[2]. RailEye3D[9], ESRORAD[10], RailVD[11] and OSDaR23[2] contain cuboid and polygonal annotations of persons.

In summary, there are only 77 relatively small annotated open CV datasets for the class ‘person’ from the years 2019–2023. The growth of their number is low and not significantly increasing over the last years.

III Motivation for RailGoerl24

The danger zone of a train can be entered with or without suicidal intent. Suicidal intent is excluded from the safety assessment by EN 62267 [12] and CSM-RA[13] - only accidents are considered. It is therefore not mentioned in the safety requirements. However, it should be noted that suicidal intent can only be determined after the incident, if at all. Therefore, this difference is irrelevant for person detection developers. RailGoerl24 was designed to contain also scenes with certain suicidal appearances, some of which are shown in Fig.1.

IV Choice of Scenarios

One of the main goals in the choice of scenarios for RailGoerl24 was achieving representative diversity of recorded human beings for a railway operational design domain in Germany. The diversity is defined in multiple dimensions. The human amateur actors were recorded with and without the obligatory warning jackets. Male and female actors were involved, their age varied between children and elderly. Variations of medical conditions included walking disability and pregnancy. Clothing styles, body positions and crowd sizes were varied as well. The desired diversity makes our dataset ideal for evaluating trained ML models since train-related edge cases (e.g. a person lying on the tracks) are not represented in previous open datasets - yet they could still occur in practice.

The most important concern was the safety of the human actors. To prevent accidents, the train was always moved away from human actors, never towards them. All recordings took place on April 24th, 2024.

Additionally, human actors signed an agreement that their faces would not be altered due to data protection reasons in the final dataset publication. This prevents inducing possible biases in the data, which could be exploited by ”Clever Hans” predictors. Such a ”Clever Hans” predictor otherwise learns to detect altered areas in the frames instead of human features, as all pictures containing humans are altered as well. Explainable artificial intelligence (XAI) methods would detect such a predictor. They keep humans in the loop by enabling verification of ML algorithms by human experts via providing saliency maps [14].

TABLE I: Existing real-world sensor data in open datasets for mainline and urban GoA3+
Dataset Pub. Size (frs.–frames, Sensors Data format Annotation Main- Urban Class
year pts.– 3D points) line ‘person’
RailSem19[6] 2019 8 500\numprint{8500} frs. RGB variable polygon,
2D semantic,
splines
FRSign[15] 2020 105 352\numprint{105352} frs. RGB 2 048×1 536\numprint{2048}\times\numprint{1536}, polygon
1 920×1 200\numprint{1920}\times\numprint{1200}
RAWPED[7] 2020 26 000\numprint{26000} frs. RGB variable polygon
Catenary Arch[16] 2021 55×106\numprint{55}\times 10^{6} pts. LiDAR 3D semantic
RailEye3D[9] 2021 11 867\numprint{11867} frs. 2×2\timesRGB 1 088×1 920\numprint{1088}\times\numprint{1920} polygon
Rail-DB[17] 2022 7 432\numprint{7432} frs. RGB 800×288800\times 288 polyline
*RailSet[18] 2022 6 600\numprint{6600} frs. RGB variable polygon,
polyline
*ESRORAD[10] 2022 **100 000\numprint{100000} frs. 2×2\timesRGB, LiDAR 1 920×1 080\numprint{1920}\times\numprint{1080} cuboid
RailVID[11] 2022 1 071\numprint{1071} frs. Infrared 640×512640\times 512 2D semantic
rail-hp8ij[8] 2022 578578 frs. RGB variable polygon
GERALD[19] 2023 5 000\numprint{5000} frs. RGB 1 920×1 080\numprint{1920}\times\numprint{1080}, polygon
1 280×720\numprint{1280}\times\numprint{720}
OSDaR23[2] 2023 1 534\numprint{1534} m-frs. 6×6\timesRGB, 4 112×2 504\numprint{4112}\times\numprint{2504}, polygon,
2 464×1 600\numprint{2464}\times\numprint{1600}, cuboid,
3×3\timesInfrared, 640×480\numprint{640}\times\numprint{480}, polyline
Radar, 2 856×1 428\numprint{2856}\times\numprint{1428}
LiDAR
WHU-Railway3D[20] 2023 4 600×106\numprint{4600}\times 10^{6} pts. LiDAR 3D semantic
Rail3D[21] 2024 288×106\numprint{288}\times 10^{6} pts. LiDAR 3D semantic
RailPC[22] 2024 3 000×106\numprint{3000}\times 10^{6} pts. LiDAR 3D semantic
RailCloud-HdF[23] 2024 8 060×106\numprint{8060}\times 10^{6} pts. LiDAR 3D semantic
*These datasets additionally contain artificially generated data.
**Not all frames depict urban street running rails; Recorded from a road car making use of the pavement on the rails, not from a railway vehicle.

V Recording Area and Sensor System

Acquiring access to railway infrastructure to record CV data closest possible to realistic hazardous scenarios is much more complicated than for road-based cases. This has been achieved in the case of RailGoerl24 for mainline railways. Fig.2 shows the map of the recorded area near Görlitz, Germany, where trains are operated at GoA0.

Refer to caption
Figure 2: Red arrow in the top left map shows the location of TÜV SÜD rail test center in Görlitz, Germany [openstreetmap.org]. The area is marked orange for RGB and red rectangle for the LiDAR recordings.

In Annex I of CSM-RA[13] points 2.1.4(b), 2.4.2(b) and 2.4.3(b) allow a simplified approach for safety approval of CV systems for driverless trains, if they have “similar functions and interfaces” to the replaced human driver. According to DIN SPEC 91516 [24], an RGB camera as an interface is more similar to a human eye than a LiDAR sensor. Fig.3 shows the on-board sensor setup with a single RGB 1 920×1 080\numprint{1920}\times\numprint{1080} camera for recording RailGoerl24. It is the model Axis P3925-R with a \qty3.6mm\qty{3.6}{mm} lens covering a \qty85.7°\qty{85.7}{\text{\textdegree}} horizontal and a \qty46.0°\qty{46.0}{\text{\textdegree}} vertical field of view. The orientation of the camera is tilted slightly downwards. The bottom edge of the image is the first visible point at a distance of approx. \qty3m\qty{3}{m} in front of the vehicle.

Refer to caption
Figure 3: Full HD camera on a V 22 locomotive (SN:0262.6.620) at \qty2.45m\qty{2.45}{m} from the track surface, mounted on the front or rear, marked red.

Additionally to RGB, LiDAR data for a share of the surrounding area was acquired. The accompanying 3D data were captured using a Leica RTC360 terrestrial laser scanner and had multiple motivations. It can be used to mimic train driver’s route knowledge, create a digital map, employ background subtraction algorithms or simply make a better picture of geometric conditions in the area of recordings. The recorded colored point cloud consists of 383 922 305\numprint{383922305} points as Fig.5 depicts and was merged from 4040 individual scan positions.

In addition to the raw sensor data, RailGoerl24 is a semi-automatically annotated by rectangular bounding boxes enclosing all visible persons (Fig.1). The annotations were initially prelabeled by an algorithm developed by EYYES GmbH. Later, they were reviewed and refined manually. There is no annotation for the point cloud data.

Refer to caption
Figure 4: Scatter plots of annotations. The left plot shows the distribution of object positions in the camera image, the right the distributions of height HH and width WW.

VI RailGoerl24 Statistics

Refer to caption
Figure 5: LiDAR point cloud of the recorded area, the red rectangle in Fig.2.

The annotated dataset comprises 6161 video sequences. These sequences include 12 205\numprint{12205} frames with 33 556\numprint{33556} bounding box annotations of persons. The video sequences were recorded at frame rate of \qty25\qty{25}{} and every \nth15\nth{15} frame starting from frame 0 was extracted for annotation purposes. Annotations’ scatter plots are depicted in Fig.4 showing a broad range of various person sizes as well as person positions. There were 66 scenes with people lying down, 22 fell, 2 between rails, 11 next to the rails and 11 across the rails.

VII Limitations

In contrast to human vision, RailGoerl24 is not a stereo data set as it contains multiple sensor types. Additionally, it cannot be used to train a sequence-based CV for a forward-moving train scenario, since due to safety reasons, actors were recorded from a reversing train. However, it can be used to evaluate the safety of models for hazard detection using XAI. Further, night-light recordings are not included. The fact that the recording area was a German rail test center limits its applicability to more general settings.

VIII Conclusion and Future Work

As trespassing on railways is life-threatening and disrupts train operations, person detection is critical. RailGoerl24 is one of the 55 open annotated datasets recorded on-board for mainline railways, which also can be used for urban railways. Further datasets similar to RailGoerl24 will be required to develop CV systems enabling the development of GoA3+. They can include additional sensors and an increase of data in sense of quantity. RailGoerl24 can serve as a reference as well as a basis for extensions, which will fill the gaps described in Sec.VII.

Developers of CV systems that work in related fields of research like automated security surveillance performed on-board and off-board might also take advantage of RailGoerl24 and contribute to the development of further datasets. Similarly, developers of data generation systems [25] for railways might also improve the performance of their system by using RailGoerl24. All stakeholders in the rail sector and beyond are invited to participate in this effort and, if possible, publish new datasets to achieve a broad research and development community.

ACKNOWLEDGMENT

For their support in the project, the authors thank their DZSF colleagues K. Hofmann, K. Mühl, all amateur actors and manual data annotation correctors. This work was funded by the DZSF as part of the XRAISE project, in-house research within the BMDV Network of Experts and also partly by the SECAI (BMBF Project Nr. 57616814).

References

  • [1] DIN DKE SPEC 99002:2025-03, Terminology – AI in railway applications.
  • [2] R. Tagiew, P. Klasek, R. Tilly, M. Köppel, P. Denzler, P. Neumaier, T. Klockau, M. Boekhoff, and K. Schwalbe, “OSDaR23: Open Sensor Data for Rail 2023,” in ICRAE, 2023, pp. 270–276.
  • [3] R. Tagiew, I. Wunderlich, P. Zanitzer, M. Sastuba, C. Knoll, K. Göller, H. Amjad, and S. Seitz, “Görlitz Rail Test Center CV Dataset 2024 (RailGoerl24),” 2025, TIB, DOI:10.57806/4d2fpj1y.
  • [4] Z. Chen, J. Yang, Z. Feng, and H. Zhu, “RailFOD23: A dataset for foreign object detection on railroad transmission lines,” Scientific Data, vol. 11, no. 1, p. 72, 2024.
  • [5] G. Uggla and M. Horemuz, “Towards synthesized training data for semantic segmentation of mobile laser scanning point clouds: Generating level crossings from real and synthetic point cloud samples,” Automation in Construction, vol. 130, 2021.
  • [6] O. Zendel, M. Murschitz, M. Zeilinger, D. Steininger, S. Abbasi, and C. Beleznai, “RailSem19: A Dataset for Semantic Rail Scene Understanding,” in CVPRW, 2019, pp. 1221–1229.
  • [7] T. Toprak, B. Belenlioglu, B. Aydın, C. Guzelis, and M. A. Selver, “Conditional Weighted Ensemble of Transferred Models for Camera Based Onboard Pedestrian Detection in Railway Driver Support Systems,” IEEE TVT, vol. 69, no. 5, pp. 5041–5054, 2020.
  • [8] “Rail computer vision project.” [Online]. Available: universe.roboflow.com/rail-psseq/rail-hp8ij
  • [9] M. Wallner, D. Steininger, V. Widhalm, M. Schoerghuber, and C. Beleznai, “RGB-D Railway Platform Monitoring and Scene Understanding for Enhanced Passenger Safety,” in Pattern Recognition. ICPR International Workshops and Challenges, 2021.
  • [10] R. Khemmar, A. Mauri, C. Dulompont, J. Gajula, V. Vauchey, M. Haddad, and R. Boutteau, “Road and Railway Smart Mobility: A High-Definition Ground Truth Hybrid Dataset,” Sensors, vol. 22, no. 10, p. 3922, 2022.
  • [11] H. Yuan, Z. Mei, Y. Chen, W. Niu, and C. Wu, “Railvid: A dataset for rail environment semantic,” ICONS, vol. 2022, p. 17th, 2022.
  • [12] EN 62267:2009, Railway applications – Automated urban guided transport (AUGT) – Safety requirements.
  • [13] IMPLEMENTING REGULATION (EU) No 402/2013, Common Safety Method for Risk evaluation and Assessment (CSM-RA).
  • [14] R. Müller, M. Dürschmidt, J. Ullrich, C. Knoll, S. Weber, and S. Seitz, “Do humans and convolutional neural networks attend to similar areas during scene classification: Effects of task and image type,” Applied Sciences, vol. 14, no. 6, 2024.
  • [15] J. Harb, N. Rébéna, R. Chosidow, G. Roblin, R. Potarusov, and H. Hajri, “FRSign: A Large-Scale Traffic Light Dataset for Autonomous Trains,” CoRR, vol. 2002.05665, 2020.
  • [16] B. Ton, F. Ahmed, and J. Linssen, “Semantic Segmentation of Terrestrial Laser Scans of Railway Catenary Arches: A Use Case Perspective,” Sensors, vol. 23, no. 1, 2023.
  • [17] X. Li and X. Peng, “Rail Detection: An Efficient Row-Based Network and a New Benchmark,” in ACM MM, 2022, p. 6455–6463.
  • [18] A. Zouaoui, A. Mahtani, M. A. Hadded, S. Ambellouis, J. Boonaert, and H. Wannous, “RailSet: A Unique Dataset for Railway Anomaly Detection,” in IEEE IPAS, vol. Five, 2022, pp. 1–6.
  • [19] P. Leibner, F. Hampel, and C. Schindler, “GERALD: A novel dataset for the detection of German mainline railway signals,” Journal of Rail and Rapid Transit, 2023.
  • [20] B. Qiu, Y. Zhou, L. Dai, B. Wang, J. Li, Z. Dong, C. Wen, Z. Ma, and B. Yang, “WHU-Railway3D: A Diverse Dataset and Benchmark for Railway Point Cloud Semantic Segmentation,” IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 12, 2024.
  • [21] A. Kharroubi, Z. Ballouch, R. Hajji, A. Yarroudh, and R. Billen, “Multi-Context Point Cloud Dataset and Machine Learning for Railway Semantic Segmentation,” Infrastructures, vol. 9, no. 4, 2024.
  • [22] D. Ai, S. Qin, S. Gao, H. Yuan, and Y. Liu, “Key technology for digital twins in the architecture, engineering, and construction industry: new advances in point cloud semantic segmentation algorithms for buildings,” Journal of Electronic Imaging, vol. 33, no. 5, 2024.
  • [23] M. Abid, M. Teixeira, A. Mahtani, and T. Laurent, “RailCloud-HdF: A Large-Scale Point Cloud Dataset for Railway Scene Semantic Segmentation,” in VISIGRAPP, vol. 159, 2024, p. 170.
  • [24] DIN SPEC 91516, Human performance regarding the dynamic driving task for the specification of AI in ATO.
  • [25] G. D’Amico, M. Marinoni, F. Nesti, G. Rossolini, G. Buttazzo, S. Sabina, and G. Lauro, “TrainSim: A Railway Simulation Framework for LiDAR and Camera Dataset Generation,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 12, 2023.