The main objective of this paper is to address our research question on “whether a particular flight arrival at its destination will be delayed or not?”. Based upon the initial review of the dataset, a supervised learning approach will be considered to answer this question. This means we will segregate our dataset into training and testing components.
The model we develop will be trained on the Airline OnTime Performance dataset which includes all commercial flight arrival and departure details in the USA, between October 1987 and February 2019. This is a relatively large dataset with almost 186 million records. Given the timeline and scale of the Capstone project, we will elect a subset of the dataset to perform our analysis and work with.
Logistic regression, Naïve Bayes and SVM models are some of the techniques we will consider using for training and testing our model in R programming language.
The following questions will also be considered using descriptive statistical methods:
 Best day of week/time of year to fly to minimize delays?
 Carrier suffering from more delays?
 How well does departure delays predict arrival delays?
The dataset is publicly hosted at Stat Computing and is originally sourced from RITA, a unit of U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS).
The inconveniences resulted from flight delays have been a longtime challenge for passengers, airports and airlines. According to the study conducted by the U.S. Federal Aviation Administration (FAA) in 2010, the data from 2007 was analyzed in order to quantify the economic impact of flight delays. It was found that 32.9 billion USD was borne by the American passengers and airlines. You can review the study here.
Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Over the years, numerous papers and studies have been devoted to address this challenge. The purpose of this paper is to use the dataset that is thoroughly explained here to train and test a predicting Machine Learning model to predict arrival flight delays based on the features with the highest relevance to the topic. This will be decided based on descriptive statistical analysis on the data.
The aim is to predict whether a flight will arrive at the destination with delay or not, given the circumstances.
A preliminary research of the literature review from various publications and papers on the topic is summarized below:
1. Predicting Airline Delays^{1} (Bandyopadhyay & Guerrero, 2012)
The paper uses dataset originally sourced from the Bureau of Transportation Statistics. The objective is to analyze and predict flight departure delays for a sample of flights in the USA, the main goals being:
 Identify the most influencing factors causing flight delays
 Predict if a specific flight will be delayed or not,
 Estimate the magnitude and impact in case of a delay.
Linear regression is used to identify the most influencing factors causing flight delays. Next, an SVM (Support Vector Machine) classifier is used to predict if there will be any delays. Finally, a nonparametric quadratic regression algorithm is proposed to estimate the magnitude of delays.
2. Estimating Flight Departure Delay Distributions^{2} (TU, Ball, & Jank, 2006)
The paper aims to develop a strategic departure delay prediction model for estimating flight departure delays, required as part of air traffic congestion prediction models based on the identification of major factors influencing flight departure delays. The model employs nonparametric methods for daily and seasonal trends, and uses a mixture distribution to estimate the residual error. To overcome problems with local optima in the mixture distribution, a global optimization version of the Expectation Maximization algorithm borrowing ideas from Genetic Algorithms is developed. The model demonstrates reasonable goodness of fit, robustness and predictive capabilities. Flight data from Denver International Airport in the year 2000 was used.
3. Predicting Departure Delays of US Domestic Flights^{3} (Cole & Donoghue, 2017)
This project trains a logistic regression model to predict flight delays of more than 15 minutes, based on statistics of past flights. Features of the flights known at the time of booking, such as the airline, month, week, and hour of departure were used to train the model. The best algorithm trained separate models for each airport and achieved an accuracy of 0.689 (area under the receiver operating characteristic curve).
4. Characterization and Prediction of Air Traffic Delays^{4} (Rebollo & Balakrishnan, 2014)
This paper proposes a new set of models predicting flight delays over a 2 years’ period (2007 and 2008) in the USA, using the 100 mostdelayed links in the system. The primary objective is to predict departure delays on a specific link (network of flights) or at a particular airport, sometime in the future. The models include temporal and spatial_delay_states as explanatory variables. Random Forest algorithms were adapted, to predict departure delays between 2 and 24 hours ahead (in the future). In addition to local delay variables, the paper proposes incorporating new network delay variables, which characterize the global delay state of the entire National Airspace System at the time of prediction. The proposed prediction models’ performance is analyzed in classifying delays as above or below a certain threshold, including prediction of delay values. For a 2hour forecast, the average test error across 100 links is 19% in the case of classifying delays as above or below the 60 minutes threshold.
5. Predictive Modeling of Aircraft Flight Delay^{5} (Kalliguddi, Leboulluec, 2017)
This paper investigates the significant factors responsible for flight delays in 2016 based on data extracted from the Bureau of Transportation Statistics (BTS) comprising one million instances across 8 attributes. Machine learning techniques and statistical models such as Decision Trees, Random Forest and Multiple Linear Regressions were used to develop a predictive model in order to identify delays in advance. By identifying critical parameters responsible for flight delay, the model attempts to put forth a solution to the delay losses incurred by the airline industry.
6. Analysis of Aircraft Arrival Delay and Airport OnTime Performance^{6} (Bai, 2006)
This research paper develops statistical models of airport delay and single flight arrival delay, using data sourced from the Federal Aviation Administration (FAA). Multivariate regression, ANOVA, neural networks and logistic regression were used to detect the pattern of airport delay, aircraft arrival delay and schedule performance. These models are then integrated in the form of a system for aircraft delay analysis and airport delay assessment. The assessment of an airport’s schedule performance is discussed.
7. MultiFactor Model for Predicting Delays at U.S. Airports^{7} (Xu, Sherry, & Laskey, 2008)
This project uses multifactor models to predict airport delays in 15minute periods across thirtyfour U.S. airports. The models are developed with linear regressions (piecewise) and MultiAdaptive Regression Splines (MARS) for generated delays and absorbed delays at each airport. The models were generated based on historic data for each airport. After application of several test datasets, accuracy evaluation shows mean absolute prediction error of 5.3 minutes for generated delay and 2.2 minutes for absorbed delay across all the airports. A summary of the factors that influence the performance of each airport is provided and the implications of each is discussed.
8. Flight Delay Prediction^{8} (Martinez, 2012)
The project proposes estimate the probability distribution of flight delays using kernel density estimation models. It does not try to model the underlying processes, rather only analyzes past observations. The models, of increasing complexity, have been implemented, optimized and evaluated on a large scale, using several years of US domestic flights delay records. As part of the evaluation, the performance of some of the models to predict delay distributions are analyzed.
9. Modeling Flight Delays^{9} (Sauvestre, Duperier & Leaf, 2016)
Using publicly available flight information and weather data, the paper aims to predict whether a flight will be delayed by more than 15 minutes across the 40 largest airports in the United States. A flight’s delay can arise as a result of a previous flight’s delay, hence features to capture these secondorder behaviors were incorporated in the analysis. Data was classified using Random Forest, Gaussian Naive Bayes, Logistic Regression, and Neural Networks, and achieved a best overall F1score of 82% using a Random Forest classifier.
10. Application of ML Algorithms to Predict Flight Arrival Delays^{10} (Kuhn, Jamadagni, 2017)
Recognizing the harmful economic and environmental impact of the growth in aviation industry, this paper applies machine learning algorithms like decision tree, logistic regression and neural networks classifiers to predict if a given flight’s arrival will be delayed or not. It simplifies the analysis and predicts with a test accuracy of approximately 91% for all three classifiers, using only 3 critical attributes from a selection of attributes such as departure date, departure delay, distance between the two airports, scheduled arrival time etc. A comparison of the decision tree classifier with logistic regression and a simple neural network for various figures of merit is also provided.
As mentioned in the abstract, the Airline OnTime Performance dataset includes all commercial flight arrival and departure details in the USA, between October 1987 and February 2019. This is a relatively large dataset with almost 186 million records. Given the timeline and scale of the Capstone project, I have elected to use 2 years’ worth of data from 2007 and 2008.
The dataset is publicly hosted at Stat Computing and is originally sourced from RITA, and comprises the following 29 features^{11} with 14,462,943 observations prior to data cleaning.
S. No 
Name 
Description 
1 
Year 
20072008 
2 
Month 
112 
3 
DayofMonth 
131 
4 
DayOfWeek 
1 (Monday) – 7 (Sunday) 
5 
DepTime 
actual departure time (local, hhmm) 
6 
CRSDepTime 
scheduled departure time (local, hhmm) 
7 
ArrTime 
actual arrival time (local, hhmm) 
8 
CRSArrTime 
scheduled arrival time (local, hhmm) 
9 
UniqueCarrier 

10 
FlightNum 
flight number 
11 
TailNum 
plane tail number 
12 
ActualElapsedTime 
in minutes 
13 
CRSElapsedTime 
in minutes 
14 
AirTime 
in minutes 
15 
ArrDelay 
arrival delay, in minutes 
16 
DepDelay 
departure delay, in minutes 
17 
Origin 

18 
Dest 

19 
Distance 
in miles 
20 
TaxiIn 
taxi in time, in minutes 
21 
TaxiOut 
taxi out time in minutes 
22 
Cancelled 
was the flight cancelled? 
23 
CancellationCode 
reason for cancellation (A = carrier, B = weather, C = NAS, D = security) 
24 
Diverted 
1 = yes, 0 = no 
25 
CarrierDelay 
in minutes 
26 
WeatherDelay 
in minutes 
27 
NASDelay 
in minutes 
28 
SecurityDelay 
in minutes 
29 
LateAircraftDelay 
in minutes 
Table 1 – Dataset Variables Description
Our approach will follow a simple 5 step process, with each stage building upon the one before.
 In the first step we will examine and clean the data.
 Next, we will perform descriptive analysis to better understand the salient features of the data and answer our research questions.
 Our features selection for data modelling will depend on the most influencing attributes.
 Our next aim will be to test three different algorithms to identify the best performing model.
 Finally, we will train, tune and test our model for the chosen machine learning algorithm.
We will utilize R, Excel and Tableau as the main tools to help with our analytics.
The focus here in this step is to remove the ‘noise’ in the data and make it ready for our analysis:
 Features were converted to the most appropriate and relevant format.
 Records comprising NAs were removed.
 Delays with negative values were converted to zero.
 “Arrival delay” and “Departure delay” were transformed to binary variables making it easier to identify if an aircraft was delayed or not.
R code for the data cleaning process can be accessed here.
After cleaning the data and removing features that are not relevant and records with “NA” values, the dataset is reduced to 18 features and 14,130,317 records with the following data types:
A more detailed and thorough description of the other remaining steps in our approach and our findings will follow in the Final Project Report.
Step 2 – Descriptive Analytics
As part of our exploratory data analysis, we performed numerous data visualizations to understand the most influencing factors impacting arrival delays and identify any hidden patterns. In addition, this analysis will further help answer the following research questions introduced in our abstract:
 Best day of week/time of year to fly to minimize delays?
 Which carrier suffers from more delays?
 How well does departure delays predict arrival delays?
R code for the descriptive analytics process can be accessed here.
Boruta package in R was used to perform the feature selection. Boruta is an all relevant feature selection wrapper algorithm, capable of working with any classification method that output variable importance measure (VIM); by default, Boruta uses Random Forest. The method performs a topdown search for relevant features by comparing original attributes’ importance with importance achievable at random, estimated using their permuted copies, and progressively eliminating irrelevant features to stabilize that test.
In other words, the algorithm iteratively compares the importance of attributes with the importance of shadow attributes, created by shuffling original ones. Attributes that have significantly worse importance than shadow ones are being consecutively dropped. On the other hand, attributes that are significantly better than shadows are admitted to be confirmed.
Out of 9 initial features, 2 were rejected and 7 were confirmed. The most influencing factors on “Arrival delays” are:
 Departure Delay
 Arrival Time
 Departure Time
 Distance
 Carrier
 Month
 Destination
R code for the feature selection process can be accessed here.
The primary objective on this research project is to classify future flight arrivals as either “Arrive ontime” or “Arrive with delay”. To that end, 3 different algorithms were tested accordingly and the corresponding findings documented for:
 Logistic Regression,
 Naïve Bayes, and
 Support Vector Model (SVM)
SVM turns out to be the highest performing model with an accuracy of 76.7%.
R code for the model selection process can be accessed here (in progress)
Step 5 – Tuning, Training, Testing
Our final step of the approach involved performing a tuning of the SVM model to decide the best performing parameters of gamma and cost. Testing the model on different sizes of the test datasets varying between 12,500 and 125,000 did not have a significant impact on the accuracy of the model. Introduction of new variables and adjustment of cost improved the accuracy significantly.
A summary of the different tests and trainings executed using SVM and the results is available here.
Step 2 – Descriptive Analysis
R code for descriptive analytics can be accessed here. The overall delay time (in minutes) for arrival flights and departure flights in 2008 versus 2007 improved by 24% and 18% respectively.
Even though the number of flights in 2008 fell 6% from 2007, total delays improved by 21%. That is an average of 21.5 minutes’ delay per flight in 2007 versus 18.1 minutes in 2008.
Figure 4 – Annual Number of Flights
Most arrival delays appear to happen on Fridays, Thursdays, Mondays and Sundays. It is therefore likely that passengers face less delays on Tuesdays, Wednesdays and Saturdays.
The total frequency of flights per weekday follows the same pattern as delays (in mins). Looking at flight arrivals and departures, flights on average are delayed 20.5 minutes on Mondays, 21.9 on Thursdays, 24.9 on Fridays and 21.7 on Sundays.
Figure 6 – Daily Number of Flights
December, June, July, February and August experience the longest delays. These are primarily high travel seasons with highest number of flights.
July, August, May and March saw the highest frequency of flights, which is a bit out of sync in comparison with the total monthly delayed minutes.
Figure 8 – Monthly Number of Flights
Based on the above analysis, we can now address the first research question, “When is the best day of week/time of year to fly to minimize delays?”
Passengers who travel on Tuesdays, Wednesdays and Saturdays are less likely to experience flight delays compared to other day in the week. The same applies to months: passengers travelling in April, May, September, October and November are less likely to experience flight delays compared to other months in the year.
To address the second research question, “Which carrier suffers from more delays?”, we first look at the top 10 airlines with the greatest number of delayed arrival and departure minutes. We then separately look at the overall outlook of the top 10 carriers with the most delayed minutes. AA (American Airlines) and WN (Northwest Airline) are carries at the top of this list.
Figure 9 – Top 10 Carriers Delays
Looking at the number of flights operated by each carrier, we note that WN (Northwest Airline), AA (American Airlines) and OO (SkyWest Airlines) stand out.
Figure 10 – Total Number of Flights per Carrier
Atlanta, Orlando and Dallas are among the top 3 high traffic airport destinations as illustrated by the number of flights in Figure 11.
Figure 11 – Highest Traffic Destinations
Phoenix, Atlanta and Kentucky are the top 3 airports with the greatest number of ontime flights as illustrated in Figure 12.
Figure 12 – Airports with the highest number of ontime flights
An illustration of the 2007 Arrival and Departure delays per airport, indicates that Orlando, Atlanta, Dallas and Newark are the most congested airports.
Figure 13 – Arrival and Departure Delay per Airport
Airports with the highest number of delays in 2007.
 Colors represent total departure delay.
 Size of the bubbles represent total arrival delay.
Analyzing Airlines, Airports and Arrival Delays simultaneously, we can see from Figure 14 that EV (ExpressJet Airlines) experiences its longest delays at Atlanta, AA (American Airlines) and MQ (Envoy Air) at Orlando and Dallas. XE (ExpressJet) experiences its longest delays at Newark.
Figure 14 – Arrival Delay per Airport
 Colors represent carriers.
 Size of bubbles represent total arrival
delay per destination
The number of arrival and departure delays in excess of 15 minutes decreased by ~16% in 2008.
Figure 15 – Sum of Annual Delays exceeding 15 minutes
Such delays mostly tend to happen on Fridays, Thursdays and Mondays.
Figure 16 – Sum of Weekday Delays exceeding 15 minutes
From a monthly perspective, passengers traveling in December, June and July are more likely to face arrival and departure delays over 15 minutes.
Figure 17 – Sum of Monthly Delays exceeding 15 minutes
WN (Southwest Airlines), AA (American Airlines), MQ (Envoy Air), UA (United Airlines) and OO (SkyWest Airlines) are carriers with the greatest number of arrival delays exceeding 15 minutes.
Figure 18 – Airline Carriers with most no. of delays exceeding 15 minutes
Flights with no delays in 2008 amounted to 24,576 which is a 17% decline from 29,620 flights in 2007.
Figure 19 – Annual OnTime Flights
WN (Southwest Airline), YV (Mesa Airlines) and OH (Comair) are the top 3 carriers with no delays.
Figure 20 – Airlines with most OnTime Flights
Figure 21 illustrates that Late Aircraft delays category is the biggest contributor for all delays, followed by National Air System (NAS), Carrier, Weather and Security delays. Late Aircraft delays corresponds to an aircraft with a previous flight delay.
Figure 21 – Most Influential Delay Causes
An analysis of the Arrival and Departure times illustration reveals that it is better to avoid flights departing between 0600 and 2000 as these are more likely to arrive with delays, with peak delays occurring for flights departing at 1700. Figure 22b shows the number of departing flights (with arrival delays) increase between 0600 and 1700. This makes sense as most of the air traffic for departing flights is also experienced between these hours, as illustrated in Figure 22a. The best hours to fly with the lowest probability of running into arrival delays are between 2000 and 0400.
Figure 22 a – Arrivals and Departures by Hour
Figure 22 b – Best hours of the day to fly
Our last research question looks to determine how well do departure delays predict arrival delays? To address this, we draw out different combinations of “Arrival delays” and “Departure delays”.
The pie chart in Figure 23 below shows that in almost 77% of the cases, if there is a departure delay i.e. (DepDelL = ‘1’), then there is an arrival delay (ArrDelL = ‘1’) or if the departure is ontime (DepDelL = ‘0’), there is no arrival delay (ArrDelL = ‘0’).
Figure 23 – Arrival and Departure Delays Dependency
Interpreting the Phi Coefficient
To understand and answer our last research question better, we calculated something called the phi coefficient, first introduced by Karl Pearson.
The phi coefficient is a measure of the degree of association between two binary variables, in our case: Arrival delays and Departure delays. This measure is similar to the correlation coefficient in its interpretation.
The phi coefficient is a symmetrical statistic, which means the independent variable (Departure delays) and dependent variables (Arrival delays) are interchangeable.
The interpretation for the phi coefficient is similar to the Pearson Correlation Coefficient. The range is from 1 to 1, where:
 0 is no relationship.
 1 is a perfect positive relationship: most of our data falls along the diagonal cells.
 1 is a perfect negative relationship: most of our data is not on the diagonal.
In this case our phi coefficient is determined as 0.53 which shows a weak positive association between Arrival delays and Departure delays. As such, we can say Departure delays is one of the positive influencing factors on Arrival delays.
The first step towards building our model is to start with feature selection. As mentioned before, Boruta package in R was used to perform the feature selection. Please refer to Step 3 – Feature Selection, for more details on the Boruta Package. R code for feature selection is available here.
In order to predict Arrival Delays, we consider the following attributes in feature selection out of which “DayofWeek” and “Origin” appears to be the least influential on delays. These are therefore excluded from the model.
The charts below illustrate and summarize the Boruta feature selection results. Figure 24a demonstrates all the features we ran the feature selection test on. “Departure delay” by far has the greatest impact on “Arrival Delay”. To have better visibility over all other attributes and their level of importance, Figure 24b shows a zoomed in version of Figure 24a. The green boxplots are our confirmed features and the red ones are the rejected features. “Day of week” and “Origin” are not significantly affecting “Arrival delay”.
Figure 24a – Boruta Result Plot
Figure 24b – Boruta Result Plot (Zoomed In)
At this stage of our analysis and model selection, we aim to predict whether an aircraft will be delayed or not at the destination airport given the selected features:
 Departure Delay
 Arrival Time
 Departure Time
 Distance
 Carrier
 Month
 Destination
Put differently, we want to know if an aircraft will be delayed on “Arrival” or not. Since this is a classification problem, we have chosen the following models:
 Logistic Regression
 Naïve Bayes
 SVM (Support Vector Machines)
Step 4 – Model Selection
R code for model selection is available here.
Logistic Regression
Running the logistic regression on our dataset with 7 features and 12,500 training and testing datasets, gives us around 50% accuracy which is pretty low. Before tuning and testing different features, we try Naïve Bayes and SVM models as well.
Method 
Accuracy 
Sensitivity 
Specificity 
Logistic Regression 
50.09% 
57.64% 
41.06% 
Table 2 – Logit Regression Model Results
Naïve Bayes
Running Naïve Bayes on the same dataset, resulted in a much lower accuracy around 37%, which is not good enough
Method 
Accuracy 
Sensitivity 
Specificity 
Naïve Bayes 
36.99% 
38.05% 
35.73% 
Table 3 – Naïve Bayes Model Results
Support Vector Machines (SVMs)
SupportVector Machines (SVMs) is a supervised learning model with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a nonprobabilistic binary linear classifier.
Find Out How UKEssays.com Can Help You!
Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.
View our academic writing services
An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap (known as hyperplane) that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.
The first step is to define the best gamma and cost values for our SVM model. To do so, we run the tune.svm command on the selected parameters in a 10fold cross validation on a sample of 12,500 data points in our scenario. In order to select the best kernel, we ran SVM with all 4 kernel options and “Radial” proved to be the best kernel.
Based on the results, the best gamma and cost values are both equal to 0.01 which gives us the performance of 0.2018586. The SVM model is now run with the best selected parameters.
Refer to Figure 25 for details of the output results above.
Figure 25 – SVM Performance on Parameter Selection Plot (10fold Cross Validation)
Figure 25 above shows the result of training our SVM model on 12,500 training dataset and testing the model on 12,500 test datapoints. Training and testing datasets are sampled independently and there is no overlap between the two sets.
SVM results in an accuracy of 76.68% i.e. the prediction of aircrafts arriving ontime or with delay in about 77% of the cases is correctly done.
Method 
Accuracy 
Sensitivity 
Specificity 
SVM 
76.68% 
83.21% 
68.74% 
Our prediction accuracy could potentially improve if we include other influencing factors such as “weather”. This influencing factor has been studied in “Predicting Airline Delays” and “MultiFactor Model for Predicting Delays at U.S. Airports” papers as mentioned in the literature review section.
Step 5 – Tuning, Training and Testing
Comparing the three methods accuracy rates, SVM proves to be the leading model with an accuracy of 76.68%. This number is higher than the accuracy obtained for Logistic Regression and Naïve Bayes models, hence making SVM our preferred classifying model of choice. The results summarized below were all based on training and testing datasets size of 12,500 records.
Method 
Accuracy 
Sensitivity 
Specificity 
SVM 
76.68% 
83.21% 
68.74% 
Logistic Regression 
50.09% 
57.64% 
41.06% 
Naïve Bayes 
36.99% 
38.05% 
35.73% 
Table 5 – Overall Model Performance Comparison
Testing the model on different sizes of the test datasets varying between 12,500 and 125,000 does not have a significant impact on the accuracy of the model. Table 6 summarizes the different tests and trainings that were run using SVM. Introduction of new variables and adjustment of cost improved the accuracy significantly.
Table 6 – Summary of SVM Tests
Based on our assessment of the descriptive analytics performed, we can conclude the following:
Tuesdays, Wednesdays and Saturdays are the best days to take a flight.
April, May, September, October and November are months that experience a significantly smaller number of flight delays compared to other months.
Flights departing between 2000 and 0400 are less likely to arrive with delay.
Most of the above time factors are influenced by air traffic, so if the air traffic patterns shift, there is a likelihood these timings could be affected.
AA (American Airlines), WN (Northwest Airline) and MQ (Envoy Air) are airline carriers with the most delayed minutes.
These airlines also carry the most traffic and have the highest frequency of flights.
In almost 77% of the cases, if there is a departure delay, then there is an arrival delay or if the departure is ontime, there is no arrival delay. The phi coefficient determined is 0.53 which shows a weak positive association between Arrival delays and Departure delays. As such, we can say Departure delays is one of the positive influencing factors on Arrival delays.
After assessing the 3 classification models, SVM is the preferred method of choice. The SVM model was ran on 12,500 training and testing datasets, resulting in an accuracy of 76.68% i.e. the prediction of aircrafts arriving ontime or with delay in about 77% of the cases is correctly done. The model parameters gamma and cost are 0.01 and the kernel used, is radial.
Our prediction accuracy could potentially improve if we include other strong influencing factors such as “weather” in our model.
[1] Predicting Airline Delays (Bandyopadhyay & Guerrero, 2012)
http://cs229.stanford.edu/proj2012/BandyopadhyayGuerreroPredictingFlightDelays.pdf
[2] Estimating Flight Departure Delay Distributions (TU, Ball, & Jank, 2006)
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.132.1147&rep=rep1&type=pdf
[3] Predicting Departure Delays of US Domestic Flights (Cole & Donoghue, 2017)
https://srcole.github.io/assets/flight_delay/report.pdf
[4] Characterization and Prediction of Air Traffic Delays (Rebollo & Balakrishnan, 2014)
http://www.mit.edu/~hamsa/pubs/RebolloBalakrishnanTRC2014.pdf
[5] Predictive Modeling of Aircraft Flight Delay (Kalliguddi, Leboulluec, 2017)
http://www.hrpub.org/download/20171130/UJM312110417.pdf
[6] Analysis of Aircraft Arrival Delay and Airport OnTime Performance (Bai, 2006)
http://etd.fcla.edu/CF/CFE0001049/Bai_Yuqiong_200605_MS.pdf
[7] MultiFactor Model for Predicting Delays at U.S. Airports (Xu, Sherry, & Laskey, 2008)
http://catsr.ite.gmu.edu/pubs/XuMultiFactorModelAirportDelaysTRBv6.pdf
[8] Flight Delay Prediction (Martinez, 2012)
https://www.researchcollection.ethz.ch/bitstream/handle/20.500.11850/153312/eth540401.pdf
[9] Modeling Flight Delays (Sauvestre, Duperier & Leaf, 2016)
http://cs229.stanford.edu/proj2016/report/DuperierSauvestreLeafModelingFlightDelaysreport.pdf
[10] Application of ML Algorithms to Predict Flight Arrival Delays (Kuhn & Jamadagni, 2017)
http://cs229.stanford.edu/proj2017/finalreports/5243248.pdf
[11] http://statcomputing.org/dataexpo/2009/thedata.html
Cite This Work
To export a reference to this article please select a referencing style below: