Big Data and Differential Privacy: Analysis Strategies for Railway Track Engineering: An interview with author Nii O. Attoh-Okine

Featuring a practical introduction to state-of-the-art data analysis for railway track engineering, Big Data and Differential Privacy: Analysis Strategies for Railway Track Engineering addresses common issues with the implementation of big data applications while exploring the limitations, advantages, and disadvantages of more conventional methods. In addition, the book provides a unifying approach to analyzing large volumes of data in railway track engineering using an array of proven methods and software technologies.

Dr. Attoh-Okine considers some of today’s most notable applications and implementations and highlights when a particular method or algorithm is most appropriate. Throughout, the book presents numerous real-world examples to illustrate the latest railway engineering big data applications of predictive analytics, such as the Union Pacific Railroad’s use of big data to reduce train derailments, increase the velocity of shipments, and reduce emissions.

In addition to providing an overview of the latest software tools used to analyze the large amount of data obtained by railways, Big Data and Differential Privacy: Analysis Strategies for Railway Track Engineering:

• Features a unified framework for handling large volumes of data in railway track engineering using predictive analytics, machine learning, and data mining

• Explores issues of big data and differential privacy and discusses the various advantages and disadvantages of more conventional data analysis techniques

• Implements big data applications while addressing common issues in railway track maintenance

• Explores the advantages and pitfalls of data analysis software such as R and Spark, as well as the Apache™ Hadoop® data collection database and its popular implementation MapReduce

Big Data and Differential Privacy is a valuable resource for researchers and professionals in transportation science, railway track engineering, design engineering, operations research, and railway planning and management. The book is also appropriate for graduate courses on data analysis and data mining, transportation science, operations research, and infrastructure management.

Alison Oliver talks to Dr Attoh-Okine about writing the book and his own career.


1. Congratulations on the recent publication of your book Big Data and Differential Privacy: Analysis Strategies for Railway Track Engineering which is described as ‘a comprehensive introduction to the theory and practice of contemporary data science analysis for railway track engineering.’ How did the writing process begin?

I wrote Theory and Practice of Contemporary Analysis for Railway Track Engineering and after that I observed that in railway track engineering there were huge amounts of data which did not make use of emerging data science techniques. Furthermore, different fields are making use data science techniques to address important problems and coming out with excellent results. So that led me to write this book.

2. What were your main objectives during the writing process?

My main objectives in writing the book were three: first, to introduce railway track engineers and graduate students in railway engineering programs to big data and data science. Second, to show how these data science paradigms are applicable to all railway track engineering. And third, to develop something like a “handbook”, for the application of correct data science techniques to different railway track problems. So, the idea was to put all this in a book format so that if a railway engineer is confronted with, “How can I use data science or big data?” I have this small handbook with data for them to refer to and then apply to the problem.

3. Throughout the book, you address common issues with the implementation of big data applications while exploring the limitations, advantages, and disadvantages of more conventional methods. Was it always your intention to write the book with this approach showing both sides of the story?

I wanted to be clear and honest about the application of big data/data science in railway engineering. Therefore, I made an effort to highlight some of the limitations and other roadblocks that can be encountered while using these techniques. As a researcher, you have to show and demonstrate both sides of the projects and research techniques for analysis. What are the advantages, disadvantages, and limitations? I think this is the only way one can advise and advance the knowledge in the field.

4. You also provide a unifying approach to analysing the big data sets that can arise with railway engineering, identifying useful tools, using real-life examples and explaining what current software can help. Could you please describe to us one of the real-life examples that you use and the tools involved?

Generally, railway track engineering forecasting is one of the most common issues. For example, in order to avoid derailment, we collect a lot of geometric and defect data, using various sensors. With this massive amount of data, one can use the big data paradigm to see where there is an anomaly in the data, and if the anomaly is due to a specific issue, this then can be corrected before loss of life or equipment. With so much data collected, we have to get an idea of what type of information we can observe from the data; in some cases, we are dealing with a stream of data. We need to make decisions as soon as possible; we can’t wait for six months or even three to come and do this. Sometimes we need to do it on demand. So, issues like geometry safety or railroad defects, sometimes is not easy to observe. Furthermore, the data collection process will give us some idea about how the defects are changing and thereby enable us to provide a solution to update and avert disaster or specific issues.

5. If there is one piece of information or advice that you would want your reader to take away and remember after reading your book, what would that be?

It’s going to be very difficult for one person to understand every chapter in the book, but I believe that the initial chapters, for example “Data Exploration, Data Analysis”, where one attempts to look at the data distribution, the different parameters of the data which include the means, the deviation, the range, and other things—I think that every railway track engineer should be able to understand that and to do that. Understanding other sections of the book depends on the background of the individual. Sometimes they may need a specialized training to understand and apply the information. In general, the first few chapters, like “Exploratory Data Analysis,” are something I expect most engineers to be able to do. Going forward, engineers need to work with data scientists who at least do understand basic track engineering so that they can apply the techniques.

Professor Attoh-Okine


6. Who should read the book and why?

The book should be read by railway track engineers, railway engineering graduate students and general civil engineering students, data scientists, and computer scientists who are interested in working in railway engineering.

7. Why is this book of particular interest now?

This is the first book in the field trying to address the implementation of big data in railway track engineering. I think this book is going to be classic. I can tell maybe in the coming years there will be more books, bigger volumes on the topic, but this book opened the door for how this is going to work.

8. Were there areas of the book that you found more challenging to write, and if so, why?

I’m a civil engineer who started in transportation engineering, highway engineering. But I’ve had a strong background in probability and statistics. And I tried to merge these together. And my background in highway engineering led me to railway track engineering when I realized that compared to other fields like electrical engineering, computer engineering, there is what I’ll call lagged research. Railway engineering tends to do a production type of research. I think if we need to train new students for the future, we need to bring the science to a different level in railway engineering. That’s what propels me to work in this area and to take a new direction in the railway industry.

9. What will be your next book-length undertaking?

Railway track engineering belongs to a new technology: smart cities. Smart cities is gaining momentum all over the world. Smart transportation is part of it, smart railway systems is part of it. So, one of the things I’m looking at in the future is to see how railway track engineering fits into the overall framework of the smart cities paradigm going around the world now. One has to look at how the data is going to be shared or interact within that overall umbrella of smart cities. So, this is something I’m looking to, maybe in a couple of years, try to work on. This has led me to the use of Blockchain technology in railway track engineering.

10. What was it that introduced you each to statistics as a discipline and what was it that led you to pursue engineering as a career?

Growing up, I’ve always been very interested in statistics. I decided to do civil engineering as a degree because my initial idea was not to be an academic but a practicing engineer. But as I started becoming more involved, I realized the importance of statistics in all engineering disciplines. For example, I’d been working on neural networks in the 1990s when only a handful of people knew what a neural network was. I’ve tried to introduce that in the transportation infrastructure. So, I’ll be very curious about how statistics can change civil infrastructure modelling and decision-making. I don’t think engineers have a very strong background in the area. So I like to bring it down so that young engineers or experienced engineers will appreciate the use of statistics in everyday engineering and decision-making.

I’ve been working with different countries that are implementing this work. I’ve given seminars in almost all parts of the world.

11. What is it about this area of engineering that fascinates you?

One of the things which fascinates me in engineering is the deep thinking and data analytics you need to use. That’s my passion in engineering. The deep thinking, the different analytics you need to use to come out with optimal decisions. For me, with engineering, you should be able to work with the data and then come up with very appropriate decisions in order to save people’s lives and for society to benefit from the analysis. Transforming the data to a “story” in engineering is a what fascinates me a lot.