Mạng thần kinh hiện đang được ứng dụng rất phổ biến trong các ngành khoa
học nhằm cải thiện năng suất và chất lượng của cuộc sống. Một trong những ứng dụng phổ
biến là dự đoán kết quả của quá trình sản xuất dựa vào dữ liệu được thu thập trong thời gian
trước đó. Bài báo sử dụng dữ liệu sản lượng sữa được cập nhật hàng tháng của một nhà máy
sản xuất sữa và dựa vào đó, dự đoán sản lượng sữa trong khoảng thời gian trong tương lại. Dữ
liệu được chia thành 2 phần: Một phần dùng cho việc tập huấn dữ liệu (training data) và một
phần dùng để kiểm thử (testing data), dữ liệu được sắp xếp theo thứ tự thời gian nên testing
data sẽ bao gồm các mốc thời gian trong tương lai so với training data. Tác giả sau đó sử dụng
mạng thần kinh LSTM, được hỗ trợ bởi gói sklearn và Keras – bộ kit rất nổi tiếng trong việc hỗ
trợ các thuật toán liên quan đến học máy – để dự đoán sản lượng của testing data. Hiệu quả
của việc phỏng đoán được đánh giá dựa vào hàm tính sai số RMSE. Sai sô càng thấp thì dự
đoán càng chính xác. Để việc so sánh trở nên dễ nhìn, tác giả đã mô phỏng kết quả thành dạng
biểu đồ so sánh giá trị thực và giá trị phỏng đoán. Kết quả cho thấy giá trị thực và giá trị phỏng
đoán xấp xỉ bằng nhau. Chứng tỏ việc sử dụng LSTM cho dữ liệu dạng chuỗi sắp xếp theo thứ
tự thời gian là hiệu quả và là tiền đề tốt để phát triển các bài toán tương tự mang tính ứng dụng
cao hơn.
8 trang |
Chia sẻ: Thục Anh | Lượt xem: 552 | Lượt tải: 0
Nội dung tài liệu Dự đoán dữ liệu dạng chuỗi sử dụng mạng thần kinh LSTM, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
DỰ ĐOÁN DỮ LIỆU DẠNG CHUỖI SỬ DỤNG
MẠNG THẦN KINH LSTM
Bùi Quốc Khánh*
*Trường Đại học Hà Nội
Tóm tắt: Mạng thần kinh hiện đang được ứng dụng rất phổ biến trong các ngành khoa
học nhằm cải thiện năng suất và chất lượng của cuộc sống. Một trong những ứng dụng phổ
biến là dự đoán kết quả của quá trình sản xuất dựa vào dữ liệu được thu thập trong thời gian
trước đó. Bài báo sử dụng dữ liệu sản lượng sữa được cập nhật hàng tháng của một nhà máy
sản xuất sữa và dựa vào đó, dự đoán sản lượng sữa trong khoảng thời gian trong tương lại. Dữ
liệu được chia thành 2 phần: Một phần dùng cho việc tập huấn dữ liệu (training data) và một
phần dùng để kiểm thử (testing data), dữ liệu được sắp xếp theo thứ tự thời gian nên testing
data sẽ bao gồm các mốc thời gian trong tương lai so với training data. Tác giả sau đó sử dụng
mạng thần kinh LSTM, được hỗ trợ bởi gói sklearn và Keras – bộ kit rất nổi tiếng trong việc hỗ
trợ các thuật toán liên quan đến học máy – để dự đoán sản lượng của testing data. Hiệu quả
của việc phỏng đoán được đánh giá dựa vào hàm tính sai số RMSE. Sai sô càng thấp thì dự
đoán càng chính xác. Để việc so sánh trở nên dễ nhìn, tác giả đã mô phỏng kết quả thành dạng
biểu đồ so sánh giá trị thực và giá trị phỏng đoán. Kết quả cho thấy giá trị thực và giá trị phỏng
đoán xấp xỉ bằng nhau. Chứng tỏ việc sử dụng LSTM cho dữ liệu dạng chuỗi sắp xếp theo thứ
tự thời gian là hiệu quả và là tiền đề tốt để phát triển các bài toán tương tự mang tính ứng dụng
cao hơn.
Từ khóa: Artificial Neural Networks (ANN), Sequential Data, Long Short-Term
Memory (LSTM), Keras
Abstract: This paper examines the outstanding application of Long Short-Term Memory
(LSTM) Neural Network in predicting temporal data using Keras. The performance of the
prediction is then evaluated by Root Mean Squared Error (RMSE) and the visualization of the
result is also presented.
Keywords: Artificial Neural Networks (ANN), Sequential Data, Long Short-Term Memory
(LSTM), Keras
SEQUENCE PREDICTION USING LONG SHORT-
TERM MEMORY NEURAL NETWORK
I. INTRODUCTION
With the development of computer science in the last two decades, the Artificial
Neural Network models have been widely used in various aspects of science and
engineering because of the simplicity of its model structure. Researchers have applied
various neural network model techniques such as using them alone or in combination
with process-based models to reduce errors and improve the models’ prediction
5
accuracy. Philip Doganis [1] reviewed papers on the application of Artificial
Intelligence (AI) based on the hybrid model of the radial basis function (RBF) neural
network architecture and a specially designed genetic algorithm indicated that the AI
had exhibited significant progress in forecasting and modeling sales data of fresh milk
provided by a major manufacturing company of dairy products. This paper also
recommends the potential application of ANNs in modeling and forecasting milk
production given a time series data.
Recently, owing to the breakthrough in the field of computational science, deep
learning or deep neural network (DNN) methods based on ANNs have received a
growing interest both academically and practicality from scientists [2]. Moreover, the
Long Short-Term Memory (LSTM) neural network, one of the state-of-the-art
applications of DNN, has been successfully applied in various fields (especially for time
sequence problems) such as: Speech recognition [3], machine translation [4] [5],
language modeling [6], tourism field [7] [8], stock prediction [9], and rainfall-runoff
simulation [10] [11]. Several LSTM studies listed above suggest that LSTM-based
models have been successfully used in various fields and can be applicable to milk
production forecasting.
II. METHODOLOGY
A. Dataset
The dataset used in this paper is the report of monthly milk production from
January 1962 to December 1975 in UK. There are 2 features: Month which is in yyyy-
mm form and Production which is calculated in pound. The dataset is then split into
training set and test set to served for the training and test phases, respectively. The data
has total 168 observation and is clean, which is suitable for research purpose.
B. Technology
The encoder-decoder architecture has been implemented using Googles
framework for distributed machine learning TensorFlow. We used Keras on top of
TensorFlow. Keras was chosen as it is designed for fast prototyping and
experimentation with a simple API. It allows to configure NNs in a modular way by
combining different layers, activation functions, loss functions, and optimizers, etc.
Keras provides out of the box solutions for most of the standard deep learning building
blocks. However, if someone wants to build a custom or novel implementation, Keras
API could be quite limited, and libraries like TensorFlow will be a better choice. For
evaluation, different metrics have been used, that were partly taken from the scikit-learn
library for Python.
C. Long Short-Term Memory (LSTM) Neural Networks
A typical LSTM network is comprised of memory blocks called cells. Two states
are being transferred to the next cell, the cell state and the hidden state. The cell state is
the main chain of data flow, which allows the data to flow forward essentially
6
unchanged. However, some linear transformations may occur. The data can be added to
or removed from the cell state via sigmoid gates. A gate is similar to a layer or a series
of matrix operations, which contain different individual weights. LSTMs are designed to
avoid the long-term dependency problem because it uses gates to control the
memorizing process.
Figure 1. The structure of the Long Short-Term Memory (LSTM) neural network.
Reproduced from Yan [12]
The first step in constructing an LSTM network is to identify information that is
not required and will be omitted from the cell in that step. This process of identifying
and excluding data is decided by the sigmoid function, which takes the output of the last
LSTM unit (ht−1) at time t−1 and the current input (Xt) at time t. Additionally, the
sigmoid function determines which part from the old output should be eliminated. This
gate is called the forget gate (or ft); where ft is a vector with values ranging from 0 to 1,
corresponding to each number in the cell state, Ct−1.
ft = σ(Wf [ht−1,Xt] +bf)
7
Herein, σ is the sigmoid function, and Wf and bf are the weight matrices and bias,
respectively, of the forget gate.
The following step is deciding and storing information from the new input (Xt) in
the cell state as well as to update the cell state. This step contains two parts, the sigmoid
layer and second the tanh layer. First, the sigmoid layer decides whether the new
information should be updated or ignored (0 or 1), and second, the tanh function gives
weight to the values which passed by, deciding their level of importance (−1 to 1). The
two values are multiplied to update the new cell state. This new memory is then added
to old memory Ct−1 resulting in Ct.
it = σ(Wi [ht−1, Xt] +bi)
Nt = tanh(Wn [ht−1, Xt] +bn)
Ct = Ct−1 ft +Nt it
Here, Ct−1 and Ct are the cell states at time t−1 and t, while W and b are the weight
matrices and bias, respectively, of the cell state.
In the final step, the output values (ht) is based on the output cell state (Ot) but is a
filtered version. First, a sigmoid layer decides which parts of the cell state make it to the
output. Next, the output of the sigmoid gate (Ot) is multiplied by the new values created
by the tanh layer from the cell state (Ct), with a value ranging between −1 and 1.
Ot = σ(Wo [ht−1, Xt] +bo)
ht = Ot tanh(Ct)
Here, Wo and bo are the weight matrices and bias, respectively, of the output gate.
D. Model Evaluation Criteria
To evaluate the performance of forecasting models, NSE and RMSE are statistical
methods often used to compare predicted values with observed values. The NSE
measures the ability to predict variables different from the mean and gives the
proportion of the initial variance accounted for by the model. The RMSE is frequently
used to evaluate how closely the predicted values match the observed values, based on
the relative range of the data.
𝑁𝑆𝐸 = (1 −
∑ (𝑂𝑖 − 𝑃𝑖)
2𝑛
𝑖=1
∑ (𝑂𝑖 − �̅�𝑖)2
𝑛
𝑖=1
) × 100
8
𝑅𝑀𝑆𝐸 = √
1
𝑛
∑(𝑂𝑖 − 𝑃𝑖)2
𝑛
𝑖=1
In the above equations, Oi and Pi are observed discharges and simulated
discharges at time t, respectively; Oi is the mean of observed discharges; and n is the
total number of observations. The NSE values range from −∞ to 1, and an RMSE equal
to zero implies a perfect fit. The LSTM model produces reliable results when the RMSE
values are small, and the NSE values are approximately 1.IMPLEMENTATION
We create a data set function that takes two arguments: the dataset, which is a
NumPy array that we want to convert into a dataset, and the look_back, which is the
number of previous time steps to use as input variables to predict the next time period,
in this case defaulted to 1.
This default will create a dataset where X is the quantity of the item at a given
time (t) and Y is quantity of the item at the next time (t + 1).
LSTMs are sensitive to the scale of the input data, specifically when the sigmoid
or tanh activation functions are used. We rescale the data to the range of 0-to-1. This is
also called normalizing. We will normalize the dataset using the MinMaxScaler
preprocessing class from the scikit-learn library.
After we model our data and estimate the accuracy of our model on the training
dataset, we need to get an idea of the skill of the model on new unseen data. For a
normal classification or regression problem, we would do this using cross validation.
With time series data, the sequence of values is important. A simple method that we
used is to split the ordered dataset into train and test datasets. We calculate the index of
the split point and separates the data into the training datasets with 67% of the
observations that we can use to train our model, leaving the remaining 33% for testing
the model. The LSTM network expects the input data (X) to be provided with a specific
array structure in the form of: [samples, time steps, features]. Currently, our data is in
the form: [samples, features] and we are framing the problem as one-time step for each
sample. We can transform the prepared train and test input data into the expected
structure using numpy.reshape()
9
We build the LSTM network by creating one visible layer with one input, one
hidden layer with four LSTM blocks or neurons and an output layer that makes a single
value prediction. Once the model is fit, we can estimate the performance of the model
on the train and test datasets. After that, we invert the predictions before calculating
error scores to ensure that performance is reported in the same units as the original data.
The Error Score – RMSE is then calculated for the model.
We can see from result that Train Score and Test Score have the approximate
Error Score, which means that our model is quite good. To present the visualization of
the predicted values and the trained ones, we use matplotlib library to show the
comparison among them, in which, the original dataset in blue, the predictions for the
training dataset in orange, and the predictions on the unseen test dataset in green.
10
Figure 2. The comparison of predicted data versus the original and training ones
III. CONCLUSION AND FUTURE WORK
The result of the experiment has proved the efficiency of LSTM in predicting the
sequential data. More challenging tasks with LSTM networks such as anomaly detection
and text prediction are more of interest and require the variations of the LSTM for
optimal results. In the future, the hybrid of neural networks will be introduced so that
we can get the better insight and hopefully the research models can be applied in the
empirical affair.
REFERENCE
[1] Philip Doganis, Alex Alexandridis, Panagiotis Patrinos, Haralambos
Sarimveis, "Time series sales forecasting for short shelf-life food products based on
artificial neural networks and evolutionary computing," Journal of Food and
Engineering, vol. 75, no. 2, pp. 196-204, 2006.
[2] Y. B. a. G. H. Y. LeCun, "Deep Learning," Nature, vol. 521, no. 7553, pp.
436-444, 2015.
[3] Graves,A.;Mohamed,A.;Hinton,G, "Speech recognition with deep recurrent
neural networks," in IEEE International Conference on Acoustics, Speech and Signal
Processing, Vancouver, BC, Canada, 26–31 May 2013.
[4] Cho, K.; van Merrienboer, B.; Gülçehre, Ç.; Bougares, F.; Schwenk, H.;
Bengio, Y, "Learning phrase representations using RNN encoder-decoder for statistical
machine translation," arXiv:1406.1078, 2014.
[5] Sutskever, I.; Vinyals, O.; Le, Q.V, "Sequence to sequence learning with
neural networks," in 27th International Conference on Neural Information Processing
Systems, Montreal, QC, Canada, 8–13 December 2014.
[6] Mikolov, T.; Joulin, A.; Chopra, S.; Mathieu, M.; Ranzato, M.A, " Learning
11
longer memory in recurrent neural networks," arXiv:1412.7753, arXiv 2014.
[7] Li, Y.; Cao, H, "Prediction for tourism flow based on LSTM neural network,"
Procedia Comput. Sci, pp. 129, 277–283, 2018.
[8] Yanjie, D.; Yisheng, L.; Fei-Yue, W, "Travel time prediction with LSTM
neural network," in the IEEE 19th International Conference on Intelligent
Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016.
[9] Nelson, D.M.Q.; Pereira, A.C.M.; de Oliveira, R.A, "Stock market’s price
movement prediction with LSTM neural networks," in the International Joint
Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017.
[10] Hu, C.; Wu, Q.; Li, H.; Jian, S.; Li, N.; Lou, Z, "Deep learning with a long
short-term memory networks approach for rainfall-runoff simulation," Water, vol. 10, p.
1543, 2018.
[11] Le, X.-H.; Ho, V.H.; Lee, G.; Jung, S, "A deep neural network application for
forecasting the inflow into the Hoa Binh reservoir inVietnam," in the11th International
Symposium on Lowland Technology (ISLT 2018), Hanoi, Vietnam, 26–28 September
2018.
[12] S. Yan, "Understanding LSTM and Its Diagrams," 14 3 2016. [Online].
Available: https://medium.com/mlreview/ understanding-lstm-and-its-diagrams-
37e2f46f1714. [Accessed 21 3 2020].
[13] N. D. a. S. P. H. R. J. Frank, "Time Series Prediction and Neural Networks,"
Journal of Intelligent & Robotic Systems, vol. 31, no. 1, pp. 99-103, 2001.
[14] G. E. D. a. G. H. A.-r. Mohamed, "Acoustic Modeling using Deep Belief
Networks," in EEE Transactions on Audio, Speech, and Language Processing, 2012.
[15] L. R. a. B. Juang, "An Introduction to Hidden Markov Models," IEEE ASSP
Magazine, vol. 3, no. 1, pp. 4-16, 1986.
[16] P. J. Werbos, "Backpropagation Through Time: What It Does and How to Do
It," in Proceedings of the IEEE, 1990.
[17] Y. B. P. F. a. J. S. S. Hochreiter, "Gradient Flow in Recurrent Nets: The
Difficulty of Learning Long-Term Dependencies," 2001. [Online]. Available:
[18] P. S. a. P. F. Y. Bengio, "Learning Long-Term Dependencies with Gradient
Descent is Difficult," IEEE transactions on neural networks, vol. 5, no. 2, pp. 157-166,
1994.
[19] T. M. a. Y. B. R. Pascanu, "On the Difficulty of Training Recurrent Neural
Networks," ICML, vol. 28, no. 3, p. 1310–1318, 2013.
[20] J. S. a. F. C. F. A. Gers, "Learning to forget: Continual Prediction with
LSTM," Neural computation, pp. 2451-2471, 2000.
[21] LeCun, Y.; Bengio, Y.; Hinton, G, "Deep learning," Nature, vol. 521, p. 436,
2015.
Các file đính kèm theo tài liệu này:
- du_doan_du_lieu_dang_chuoi_su_dung_mang_than_kinh_lstm.pdf