Using Machine Learning to Detect Anomalies in Embedded Networks in Heavy Vehicles

Abstract

Consumer vehicles have been demonstrated to be insecure; the addition of electronics to monitor and control vehicle capacities has included complexity resulting in security basic vulnerabilities. Although academic research has shown vulnerabilities in consumer automobiles long back, the general public has only recently been made aware of such vulnerabilities. Modern Automobiles have more than 70 electronic control units (ECU's).

This paper proposes to use machine learning to support domain-experts by avoiding them from contemplating irrelevant data and rather pointing them to the important parts within the recordings. The basic idea is to learn the typical behavior from the accessible timing Analysis and then to independently identify unexpected deviations and report them as anomalies. Our proposed model's main motive is to try to find the better architecture model and Hyperparameters for the model. We used LSTM auto encoder technique to find sophisticated anomalies with varied hyper-parameters.

Keywords: Anomaly Detection. SAE-J1939. Heavy Vehicle Security

1. INTRODUCTION

Vehicles are an integral part of our life and automobile technology has evolved over the past century to address our growing needs. Earlier, a driver had to manually control various functions in a vehicle, but now a lot of these tasks have been delegated to various micro-controllers and electronic chips attached to the vehicle [8]. Modern vehicles are a collection of various Electronic Control Units (ECU), Sensor and Actuators. These ECU's get input from different sensors and perform various mechanical actions using actuators. CAN bus is a broadcast bus, where each connected ECU pushes broadcast messages on it. These broadcast CAN messages don't have explicit information about which ECU generated the message and any message available on the network will considered as 'trusted' by default. As a result if any malicious message is introduced into the network, either by a malicious ECU or an attacker, will also be considered as valid and can result in abnormal behavior Apart from doing their own functions, ECU's must communicate between each other so as to efficiently perform their functions [5].

Get Help With Your Essay

If you need assistance with writing your essay, our professional essay writing service is here to help!

Essay Writing Service

The past studies analyzed multiple attack vectors in vehicles and showed that Electronic Control Units (ECUs) could be compromised. A compromised ECU can be used by an attacker to inject malicious messages into the in-vehicle network through a physical connection [9]. Heavy vehicles use a standardized protocol known as SAE J1939 implemented over a Controller Area Network (CAN) bus using which the ECUs communicate with each other. The use of standardized protocols makes heavy vehicles susceptible to attacks.

Two different countermeasures have been introduced against these attacks: proactive and reactive. Proactive mechanism focuses on improving protocols and they are not fool proof but can be remarkably effective. There have been techniques proposed to include message authentication on the protocol. Reactive mechanisms detect an attack or an impending attack and reduce its impact on the victim's vehicle at the earliest and provide a response mechanism to either block or alert other systems [6].

The uses of SAE-J1939 makes it possible to convert raw transmitted messages on the CAN bus to specific parameters of the vehicle. Thus, we define a machine learning model based on low-level vehicular parameters. While each message contains information about the current state of the vehicle, it does not give any information about the previous state. To solve this limitation, we added the history of previous values to each parameter value to leverage the learning model. In addition, some statistical derivative features have been added to give even deeper clues to the model [8].

A vehicle's parameters are categorized in particular groups in the SAE-J1939 based on, for example, frequency and sender. Thus, we created multiple models based on each group of parameter, referred to as Parameter Group Number (PGN) in the standard. The learning algorithms create a behavioral profile for each PGN that will be used later to compare with its current behavior to detect any deviation from the regular pattern. We used a wide range of learning algorithms to train models and studied their performance [8].

The proposed approach integrates four modules to detect anomalies. BusSniffer connects to the bus and sniffs the messages. Message Decoder gets messages from BusSniffer and converts them into raw messages that characterize the vehicle's parameters. AttackDetector compares the current state with the appropriately trained model and triggers the AlarmGenerator if a threat exists. Based on these modules we can generate real time alarms and thereby providing security to the protocol [6].

Fig 1: Example SPN layout for the "Engine Temperature" PGN [7]

The rest of the paper is organized as follows:

Section 2 – Background which contains CAN and SAE J1939 protocols and the defense mechanisms.

Section 3 – Adversary Threat Model of modern attacks.

Section 4 – Features that we use.

Section 5 – Detection Mechanism Architecture.

Section 6 – Building of Machine Learning Model.

Section 7 – Conclusion and Future work.

2. BACKGROUND

In this Section, we discusses about CAN and SAE J1939 protocols and how they are introduces and evaluated. Also discusses about the defense mechanisms.

Controller Area Network (CAN) is a serial organized innovation that was initially outlined for the car industry, particularly for European cars, but has also become a prevalent bus in industrial automation as well as other applications. The CAN bus is basically utilized in embedded systems, and as its title suggests, is a network innovation that gives fast communication among microcontrollers up to real-time requirements. CAN 1.0 was introduced in the time that neither Internet nor any evidences of virus were seen and security is not at all a concern at that time. This indicated that CAN protocol cannot address security concerns [5].

SAE-J1939 Standard: SAE J1939 is the open standard for networking and communication in the commercial vehicle sector. There are a number of measures which are determined from SAE J1939. These guidelines utilize the fundamental portrayal of J1939 and regularly contrast as it were in their data definition and adaptations of the physical layer.SAE-J1939 characterizes five layers within the seven-layer OSI network model including the CAN ISO 11898 specification and employments as it were expanded outlines with a 29-bit identifier for the physical and data-link layers [8]. Each PDU in the SAE-J1939 protocol consists of seven fields: priority (P), extended data page, data page (DP), PDU format (PF), PDU specific (PS) (which can be a destination address, group extension, or proprietary), source address (SA), and data field. There is also a reserved field Reserved(R) with one bit length for further usage [2].

Fig 2: Architecture of CAN and SAE J1939 protocols.

Proactive mechanism: Proactive mechanism focuses on improving protocols but the CAN and SAE-J1939 protocols do not support any authentications which lead to wide attacks .However even with an authentication mechanism on the CAN bus the maximum payload length would be just 8 bytes so the space for MAC (Message Authentication Code) is so limited [6].

Reactive Mechanism: Reactive mechanisms distinguish an attack or an impending attack and diminish its effect on the victim's vehicle at the earliest and provide a response mechanism to either block the attack or alert other frameworks [6].

The use of different machine learning algorithms came into existence in detecting anomalies through packets and packet sequences. Usage of Long Short Term Memory (LSTM) came to consideration that is used for the sequence of inputs for the datasets. One layer of LSTM has as many cells as the time steps. The objective of the Autoencoder network in is to reconstruct the input and classify the poorly reconstructed samples as a rare event.

3. THREAT MODEL

Attackers can easily compromise ECU's and thereby exploiting new vulnerabilities. There's more motivating force for an enemy to attack the heavy vehicle industry due to the size of the vehicles and the assortment of goods they carry. Our adversary can be anybody who might stand to make a profit on controlling the vehicles, be it from hijacking their merchandise, adversely controlling a competition's fleet, extorting fleet proprietors and drivers, or offering their tools and administrations on the black market. Another sort of adversary we consider is one who wishes to cause the most harm and harm as possible, such as a terrorists. We expect that our adversary has the ability to transmit selfassertive messages on the vehicle's J1939 bus. This is often most promptly accomplished with physical access to the vehicle through the OBD port [5]. We assume that the adversary will receive messages on the CAN bus and can generate SAE J1939 compatible messages with the frequency including the data and priority [9]. Attackers will take control over the Message priority and can block the messages with lowest priorities on the bus. This affects the functions and integrity of the system in the exploitation.

On the other hand, a more sophisticated attacker could inject malware into other ECUs. These attacks will reflect on the CAN level and apply to both regular and heavy vehicles. The most common attack against the CAN network is a DoS attack [6]. In this attack, the adversary will send unauthorized messages with the most elevated priority and frequency to dominate the bus. Thus, sending or accepting messages will be deferred or indeed inconceivable. In a different attack, an adversary may monitor the CAN bus and target a specific activity of the vehicle. At whatever point the adversary sees a message related to that specific activity, it sends a counter message to make the past action ineffective. In this case, an attacker can either dominate the initial engine's ECUs with a higher priority message or can send an incorrect value for a particular parameter after seeing it on the bus.

There's not any attack data freely accessible to be utilized as a benchmark. So, we simulated modern attack messages and injected them into the logged file to check whether our detection component could find them [7]. During our proposed attack, we malevolently changed the vehicle's parameters (such as current speed) multiple times.

4. DEFENCE MECHANSIM

In our proposed model, the performance relies on the choice of features and how to implement them. We define three features in our paper: SPN values, History values and Derivative features.

SPN Values: Features in SPN Values are obtained from deciphering messages on the CAN bus. We convert the raw messages to the SPN values [6].

History of values: The value of each SPN depends on both the current vehicle's parameters and their past values. The classifier would need to use past samples to create a more exact choice [6]. Towards the conclusion, we include past SPN values of each vector to overcome this challenge. As such, each vector will presently have values of the current state and will moreover include the final detailed values for each SPN.

Derivative Features: To give more detailed insight, we add multiple derivative features to the vector. We also added average, standard deviation and slope to the last n values. We add history for these features as well. The new derivative features will help classifiers to get more precise predictions.

Fig 3: SPN and PGN Data bit fields

5. Detection Mechanism Architecture

The proposed architecture consists of four separate modules: BusSniffer, Message Decoder, AttackDetector, and AlarmGenerator.

BusSniffer interfaces to the CAN bus using an access point like the OBD-II port. This port connects specifically to the CAN bus and generates all transmitted messages on the CAN bus.

MessageDecoder utilizes the SAE-J1939 standard to convert the raw messages to the SPN values thereby creating an initial vector of the vehicle's parameters. This module includes other meta-data fields including time-stamp, length of the data field, source address, destination address, and previously defined features such as derivative features and history of feature values.

AttackDetector consists of two phases: Training and Detecting. The training phase requires preparing a dataset of regular and abnormal messages for every PGN. Multiple classifiers can be trained on the dataset, and the classifier that performs the best will be used. The training phase may take a long time; the trained classifiers can be used countless times without the need to retrain them.

In the detection phase, whenever a new vector comes in, the AttackDetector fetches the PGN value from the vector and sends it to the designated classifier object. The classifier then tests whether it is a normal vector. If the classifier detects an abnormal message, it will produce the AlarmGenerator module. AlarmGenerator is responsible for preparing alarm messages using SAE-J1939 and transmits it over the CAN bus. The message will be generated in the form of a Broadcast message, and all connected nodes will be aware of this abnormal situation. This can also include turning on a warning light on the dashboard to notify the driver [6].

Fig 4: Architecture of proposed detection mechanism

6. Building Machine Learning Model

Building of our experiment takes place in five different phases to get the desired outcome. They are:

Gathering the Datasets
Data Pre-processing
Building the Machine Learning Model
Building the Architecture
Evaluations

6.1 Gathering the Datasets

We used several CAN bus log messages that were generated previously. We required a lot of data with many messages in the log and we also require PGNs. A Parameter Group Number (PGN) is a part of the 29-bit identifier sent with each log message. The PGN is a combination of the Reserved bit, the data page bit, the PDU Format (PF) and PDU Specific (PS). We also need SPNs in our project. A Suspect Parameter Number (SPN) is a number that has been assigned to a specific parameter within a parameter group [9]. SPNs that have similar characteristics will be grouped into PGNs. Since the log messages had more PGNs we used more instances for the training and testing phases. Developing profiles of PGNs by using machine learning techniques can be generated by ECUs.

6.2 Data Pre-processing

Data pre-processing takes place in three different phases. They are Training Datasets, Validation sets and Testing.

Training datasets: The sample of data that we use to fit into the model is the training dataset. We train the sample so as to pre-process the data that is required for our model.

Validation set: We use the validation set to fine tune the model hyper-parameters.

Testing datasets: The testing data set is the final data that we use to get the desired output in the model which considers both training set and validation set [1].

Fig 5: Training set, Validation set, Test set Process

6.3 Building the Machine Learning Model

In building our model we use LSTMs. Long Short term memory (LSTM) is an artificial RNN used in the field of deep learning which are capable of learning long-term dependencies. Unlike others LSTM has feedback connections. LSTMs are used in image, speech and video sequence data. Our proposed model is a sequential model so we chose LSTM. They can remember things for a long duration of time. The LSTM have the capacity to expel or include data to the cell state, carefully controlled by structures called gates [1].

Find Out How UKEssays.com Can Help You!

Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.

View our academic writing services

Our Proposed Model has sequential data, so we used Encoder-Decoder LSTM architecture. e. Our method uses a multi layered Long Short-Term Memory (LSTM) to phrase the input to a vector and then deep LSTM to decode the output from the vector. The core of our experiments involved training a large deep LSTM auto-encoder. The LSTM is capable of solving long term dependencies but it works efficiently when the source is reversed [1]. The LSTM auto encoder first compresses the input data and then uses repeat vector layer. The final output layer gets back the reconstructed input data. LSTMs trained on reversed source data did much superior on long sentences than LSTMs trained on the raw data. We found that LSTM models are easy to train with more effective results.

Our LSTM decides what information we're going to put away from the cell state. This decision is made by a sigmoid layer called the "forget gate layer." It looks at ht−1 and xt, and outputs a number between 0 and 1 for each number in the cell state Ct−1 [2] (htt).

6.4 Building the Architecture

In our architecture we use 3 LSTMs, one input layer and one output layer. We use sigmoid function in the LSTMs specifically because it is used as the gating function for the 3 gates (in, out, forget), since its outputs are always a value between 0 and 1, it can either let no flow through or complete the flow of information throughout the gates. The activation function we use is the ReLU activation function. ReLU stands for rectified linear unit.

Fig 6: LSTM Layers and their functions.

Mathematically, it is defined as y = max(0, x) [10]. We use this activation function because it allows our model to run or train easily.

Fig 7: Activation Function of ReLU

In the compilation, we use Loss function and optimizers. The loss function use used is Mean Square Error (mse). The groups of functions that are minimized are } called "loss functions". A loss function is a degree of how great a prediction model does in terms of being able to foresee the anticipated outcome. It depends on a number of variables counting the presence of outliers, choice of machine learning algorithm, time effectiveness of gradient descent, ease of finding the derivatives and certainty of predictions. MSE is the sum of squared distances between our target variable and predicted values.

Fig 8: Loss function of MSE

The optimizer we used in the model is 'Adam'. Adam is an adaptive learning rate method; it computes learning rates for different parameters. Adam uses estimations of first and second moments of gradient to adjust the learning rate for each weight of the neural network. Adam is an optimization algorithm that can be used to update network weights in training data. Using of Adam makes the model to present results in a quick an effective way.

6.5 Evaluations

In This phase, we start fitting the data we collected. The challenge here is we should not over fit the model so we use Hyper-parameter Tuning. Hyper-parameter tuning is nothing but setting a value to the absolute learning process evaluation module when it begins. Hyper-parameters are passed in as arguments to the constructor of the model classes. With this, the values of other parameters are learned. Hyperparameter Tuning finds a tuple of hyperparameters that yields an ideal model which minimizes a predefined loss function on given autonomous data. Too many epochs can lead to overfitting of the training dataset, so we used Early Stopping function.

The number of epochs and the batch size determines the accuracy and performance of the model. So we carefully adjusted the batch sizes and epochs accordingly. We used limited epochs and the with sophisticated batch sizes. In our Model, we considered min-delta because of the overfitting problem.

For each dataset that includes several PGNs, we trained multiple datasets for each PGN given in the model. With the help of LSTM auto-encoder, it is easy to remember the past data which saves a lot of time in evaluation and the accuracy is also increased. With the help of the data we considered in various aspects became easy to find minor anomalies and the security bleaches are covered properly. The rate of false positives is significantly very low in our model which benefits the accuracy and consistency. LSTM auto-encoder can tests many kinds of datasets and parameters that can contribute towards the present machine learning scenario.

7. CONCLUSION AND FUTURE WORK

In this work, we appeared that a large deep LSTM auto encoder with a limited datasets can beat a standard SMT-based framework whose results are much more diverse and approximate. The success of our simple LSTM-based approach on the sequential data provided confirmations that it can be used to get good outputs with other sequence learning problems, provided that they have enough data to train with.

The spotting of normal behavior of devices is an important step in finding anomalies in heavy vehicles. With the results we got there is still a lot diverse modifications that should be implemented to get much better results and this is possible by training and testing different kinds of datasets in all kinds of aspects. We should try with different possibilities and also with fine tuning of different hyper-parameters. Usage of LSTMs made our experiment succeed in a different level and there's lot of work to be inherited in the usage to get many impossible tasks to come possible.

It is sensible to expect that with more time many adversaries could make an indeed more sophisticated attack. With Bluetooth, cellular, and Wi-Fi, advanced trucks are getting to be much more connected to the exterior world, which present new attack vectors. So, I suggest these ideas are to be implemented effectively in order to stop huge attacks on the heavy vehicles securities.

REFERENCES

[1] Ilya Sutskever, O. V. (n.d.). Sequence to Sequence Learning with Neural Networks.

[2] Kyunghyun Cho, B. v. (n.d.). Learning Phrase Representations using RNN EncoderDecoder for Statistical Machine Translation., (p. 11).

[3] Macher, G. M. (n.d.). Integrated Safety and Security Development in the Automotive Domain.

[4] SAE J1939, Digital Annex, 201. (n.d.).

[5] Sandeep Nair Narayanan, S. M. (n.d.). OBD_SecureAlert: An Anomaly Detection System for Vehicles.

[6] Shirazi, H. (n.d.). Using Machine Learning to Detect Anomalies in Embedded Networks In Heavy Vehicles.

[7] Theissler, A. (2014). Anomaly detection in recordings from in-vehicle networks.

[8] Yelizaveta Burakova, B. H. (n.d.). Truck Hacking: An Experimental Analysis of the SAE J1939 Standard.

[9] Zhang, M. &. (2017). afeDrive: Online Driving Anomaly Detection From Large-Scale Vehicle Data.

[10] https://machinelearningmastery.com/rectifiedlinear-activation-function-for-deep-learning-neuralnetworks/

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Cite This Work

To export a reference to this article please select a referencing style below: