All aboard - Data Science of Public Transport Punctuality figures

 
 
Authors:  Dieter De Witte, Jolien Vanaelst, Ben Ooghe & Kurt Buhler

Authors: Dieter De Witte, Jolien Vanaelst, Ben Ooghe & Kurt Buhler

 
 

The Dashboard can be found at the following link: https://public.tableau.com/shared/WJKXK45FN?:display_count=yes

 
 

On January 28 an article was published in De Standaard in which De Lijn stated that it is currently not possible to publish a report with regard to it’s punctuality figures.  

Source: http://www.standaard.be/cnt/dmf20190127_04134369

 

Can we analyse and visualize punctuality figures of the Belgian public transport service, De Lijn?

 
Delijn-logo.png

At Ordina, we were surprised by this statement. Even more since De Lijn offers an Open Data portal (https://data.delijn.be/) in which a wealth of information is made available about our bus and tram network.

Several data scientists from the business unit VisionWorks of Ordina decided to take the test and use the available data to make such a report.  The result herein is the above dashboard the provides an overview of the delays in the region of Mechelen, for approximately 27 bus lines on February 5.

 
 

What insights can we derive from this visual dashboard?

At the top of the dashboard you can find the total number of routes, stops and rides.  We find that ~19% of the stops have buses that arrive more than 5 minutes late, with an average lateness of 2 minutes and 40 seconds, while the maximum lateness during the period we selected was nearly 25 minutes.

In the middle of the dashboard is a map showing each bus stop, with the lateness represented as circle size and color, the more orange circles having more late arrivals. Beside this map is another graph showing the average lateness per hour investigated. This entire dashboard can be filtered by either highlighting or selecting a specific bus line on the left side of the screen.

In the map, we find that the most delays occur in the region of Aarschot, Southeast of Mechelen, or in the city of Mechelen, itself. As one might expect, the greatest delays occurred around 17:00, a time known for higher traffic volumes due to the completion of the workday. The line #532 had significant delays between 17:00 - 18:00. One might speculate that this route suffers most during the rush-hour traffic, and could provide insights as to possible changes in scheduling to mitigate these delays and result in less waiting (and more satisfied) passengers.

Below we see the lines that incurred the greatest and least delays, respectively, the dashboard allowing the view of the top 1-5 in each category. The line #558 "Mechelen - Waver Scholen" performs the worst in terms of punctuality. 80% of the stops were visited with a greater than 5 minute delay.

 

… lateness represented as circle size and color, the more orange circles having more late arrivals

 
 
 

…line 532 had significant delays only between 17:00 - 18:00. One might speculate that this route suffers most during the rush-hour traffic…

 

Conclusion?

A number of both interesting and valuable insights can be derived from this analysis. The data was easy to collect, understand and visualize, and when combined with other available data such as traffic density during the day, weather data and more, there are a number of innovative possibilities that De Lijn may pursue to optimize their routes. The Open Data that they make available is accurate and easily retrievable and makes it possible to continue innovating. To say it with an Ordina term: de Lijn is ready to become an 'Intelligent Data-driven Organization'!