Dự đoán dữ liệu dạng chuỗi sử dụng mạng thần kinh LSTM

Mạng thần kinh hiện đang được ứng dụng rất phổ biến trong các ngành khoa

học nhằm cải thiện năng suất và chất lượng của cuộc sống. Một trong những ứng dụng phổ

biến là dự đoán kết quả của quá trình sản xuất dựa vào dữ liệu được thu thập trong thời gian

trước đó. Bài báo sử dụng dữ liệu sản lượng sữa được cập nhật hàng tháng của một nhà máy

sản xuất sữa và dựa vào đó, dự đoán sản lượng sữa trong khoảng thời gian trong tương lại. Dữ

liệu được chia thành 2 phần: Một phần dùng cho việc tập huấn dữ liệu (training data) và một

phần dùng để kiểm thử (testing data), dữ liệu được sắp xếp theo thứ tự thời gian nên testing

data sẽ bao gồm các mốc thời gian trong tương lai so với training data. Tác giả sau đó sử dụng

mạng thần kinh LSTM, được hỗ trợ bởi gói sklearn và Keras – bộ kit rất nổi tiếng trong việc hỗ

trợ các thuật toán liên quan đến học máy – để dự đoán sản lượng của testing data. Hiệu quả

của việc phỏng đoán được đánh giá dựa vào hàm tính sai số RMSE. Sai sô càng thấp thì dự

đoán càng chính xác. Để việc so sánh trở nên dễ nhìn, tác giả đã mô phỏng kết quả thành dạng

biểu đồ so sánh giá trị thực và giá trị phỏng đoán. Kết quả cho thấy giá trị thực và giá trị phỏng

đoán xấp xỉ bằng nhau. Chứng tỏ việc sử dụng LSTM cho dữ liệu dạng chuỗi sắp xếp theo thứ

tự thời gian là hiệu quả và là tiền đề tốt để phát triển các bài toán tương tự mang tính ứng dụng

cao hơn.

8 trang | Chia sẻ: Thục Anh | Lượt xem: 728 | Lượt tải: 0

Nội dung tài liệu Dự đoán dữ liệu dạng chuỗi sử dụng mạng thần kinh LSTM, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên

DỰ ĐOÁN DỮ LIỆU DẠNG CHUỖI SỬ DỤNG MẠNG THẦN KINH LSTM Bùi Quốc Khánh* *Trường Đại học Hà Nội Tóm tắt: Mạng thần kinh hiện đang được ứng dụng rất phổ biến trong các ngành khoa học nhằm cải thiện năng suất và chất lượng của cuộc sống. Một trong những ứng dụng phổ biến là dự đoán kết quả của quá trình sản xuất dựa vào dữ liệu được thu thập trong thời gian trước đó. Bài báo sử dụng dữ liệu sản lượng sữa được cập nhật hàng tháng của một nhà máy sản xuất sữa và dựa vào đó, dự đoán sản lượng sữa trong khoảng thời gian trong tương lại. Dữ liệu được chia thành 2 phần: Một phần dùng cho việc tập huấn dữ liệu (training data) và một phần dùng để kiểm thử (testing data), dữ liệu được sắp xếp theo thứ tự thời gian nên testing data sẽ bao gồm các mốc thời gian trong tương lai so với training data. Tác giả sau đó sử dụng mạng thần kinh LSTM, được hỗ trợ bởi gói sklearn và Keras – bộ kit rất nổi tiếng trong việc hỗ trợ các thuật toán liên quan đến học máy – để dự đoán sản lượng của testing data. Hiệu quả của việc phỏng đoán được đánh giá dựa vào hàm tính sai số RMSE. Sai sô càng thấp thì dự đoán càng chính xác. Để việc so sánh trở nên dễ nhìn, tác giả đã mô phỏng kết quả thành dạng biểu đồ so sánh giá trị thực và giá trị phỏng đoán. Kết quả cho thấy giá trị thực và giá trị phỏng đoán xấp xỉ bằng nhau. Chứng tỏ việc sử dụng LSTM cho dữ liệu dạng chuỗi sắp xếp theo thứ tự thời gian là hiệu quả và là tiền đề tốt để phát triển các bài toán tương tự mang tính ứng dụng cao hơn. Từ khóa: Artificial Neural Networks (ANN), Sequential Data, Long Short-Term Memory (LSTM), Keras Abstract: This paper examines the outstanding application of Long Short-Term Memory (LSTM) Neural Network in predicting temporal data using Keras. The performance of the prediction is then evaluated by Root Mean Squared Error (RMSE) and the visualization of the result is also presented. Keywords: Artificial Neural Networks (ANN), Sequential Data, Long Short-Term Memory (LSTM), Keras SEQUENCE PREDICTION USING LONG SHORT- TERM MEMORY NEURAL NETWORK I. INTRODUCTION With the development of computer science in the last two decades, the Artificial Neural Network models have been widely used in various aspects of science and engineering because of the simplicity of its model structure. Researchers have applied various neural network model techniques such as using them alone or in combination with process-based models to reduce errors and improve the models’ prediction 5 accuracy. Philip Doganis [1] reviewed papers on the application of Artiﬁcial Intelligence (AI) based on the hybrid model of the radial basis function (RBF) neural network architecture and a specially designed genetic algorithm indicated that the AI had exhibited signiﬁcant progress in forecasting and modeling sales data of fresh milk provided by a major manufacturing company of dairy products. This paper also recommends the potential application of ANNs in modeling and forecasting milk production given a time series data. Recently, owing to the breakthrough in the ﬁeld of computational science, deep learning or deep neural network (DNN) methods based on ANNs have received a growing interest both academically and practicality from scientists [2]. Moreover, the Long Short-Term Memory (LSTM) neural network, one of the state-of-the-art applications of DNN, has been successfully applied in various ﬁelds (especially for time sequence problems) such as: Speech recognition [3], machine translation [4] [5], language modeling [6], tourism ﬁeld [7] [8], stock prediction [9], and rainfall-runoﬀ simulation [10] [11]. Several LSTM studies listed above suggest that LSTM-based models have been successfully used in various ﬁelds and can be applicable to milk production forecasting. II. METHODOLOGY A. Dataset The dataset used in this paper is the report of monthly milk production from January 1962 to December 1975 in UK. There are 2 features: Month which is in yyyy- mm form and Production which is calculated in pound. The dataset is then split into training set and test set to served for the training and test phases, respectively. The data has total 168 observation and is clean, which is suitable for research purpose. B. Technology The encoder-decoder architecture has been implemented using Googles framework for distributed machine learning TensorFlow. We used Keras on top of TensorFlow. Keras was chosen as it is designed for fast prototyping and experimentation with a simple API. It allows to configure NNs in a modular way by combining different layers, activation functions, loss functions, and optimizers, etc. Keras provides out of the box solutions for most of the standard deep learning building blocks. However, if someone wants to build a custom or novel implementation, Keras API could be quite limited, and libraries like TensorFlow will be a better choice. For evaluation, different metrics have been used, that were partly taken from the scikit-learn library for Python. C. Long Short-Term Memory (LSTM) Neural Networks A typical LSTM network is comprised of memory blocks called cells. Two states are being transferred to the next cell, the cell state and the hidden state. The cell state is the main chain of data ﬂow, which allows the data to ﬂow forward essentially 6 unchanged. However, some linear transformations may occur. The data can be added to or removed from the cell state via sigmoid gates. A gate is similar to a layer or a series of matrix operations, which contain diﬀerent individual weights. LSTMs are designed to avoid the long-term dependency problem because it uses gates to control the memorizing process. Figure 1. The structure of the Long Short-Term Memory (LSTM) neural network. Reproduced from Yan [12] The ﬁrst step in constructing an LSTM network is to identify information that is not required and will be omitted from the cell in that step. This process of identifying and excluding data is decided by the sigmoid function, which takes the output of the last LSTM unit (ht−1) at time t−1 and the current input (Xt) at time t. Additionally, the sigmoid function determines which part from the old output should be eliminated. This gate is called the forget gate (or ft); where ft is a vector with values ranging from 0 to 1, corresponding to each number in the cell state, Ct−1. ft = σ(Wf [ht−1,Xt] +bf) 7 Herein, σ is the sigmoid function, and Wf and bf are the weight matrices and bias, respectively, of the forget gate. The following step is deciding and storing information from the new input (Xt) in the cell state as well as to update the cell state. This step contains two parts, the sigmoid layer and second the tanh layer. First, the sigmoid layer decides whether the new information should be updated or ignored (0 or 1), and second, the tanh function gives weight to the values which passed by, deciding their level of importance (−1 to 1). The two values are multiplied to update the new cell state. This new memory is then added to old memory Ct−1 resulting in Ct. it = σ(Wi [ht−1, Xt] +bi) Nt = tanh(Wn [ht−1, Xt] +bn) Ct = Ct−1 ft +Nt it Here, Ct−1 and Ct are the cell states at time t−1 and t, while W and b are the weight matrices and bias, respectively, of the cell state. In the ﬁnal step, the output values (ht) is based on the output cell state (Ot) but is a ﬁltered version. First, a sigmoid layer decides which parts of the cell state make it to the output. Next, the output of the sigmoid gate (Ot) is multiplied by the new values created by the tanh layer from the cell state (Ct), with a value ranging between −1 and 1. Ot = σ(Wo [ht−1, Xt] +bo) ht = Ot tanh(Ct) Here, Wo and bo are the weight matrices and bias, respectively, of the output gate. D. Model Evaluation Criteria To evaluate the performance of forecasting models, NSE and RMSE are statistical methods often used to compare predicted values with observed values. The NSE measures the ability to predict variables diﬀerent from the mean and gives the proportion of the initial variance accounted for by the model. The RMSE is frequently used to evaluate how closely the predicted values match the observed values, based on the relative range of the data. 𝑁𝑆𝐸 = (1 − ∑ (𝑂𝑖 − 𝑃𝑖) 2𝑛 𝑖=1 ∑ (𝑂𝑖 − �̅�𝑖)2 𝑛 𝑖=1 ) × 100 8 𝑅𝑀𝑆𝐸 = √ 1 𝑛 ∑(𝑂𝑖 − 𝑃𝑖)2 𝑛 𝑖=1 In the above equations, Oi and Pi are observed discharges and simulated discharges at time t, respectively; Oi is the mean of observed discharges; and n is the total number of observations. The NSE values range from −∞ to 1, and an RMSE equal to zero implies a perfect ﬁt. The LSTM model produces reliable results when the RMSE values are small, and the NSE values are approximately 1.IMPLEMENTATION We create a data set function that takes two arguments: the dataset, which is a NumPy array that we want to convert into a dataset, and the look_back, which is the number of previous time steps to use as input variables to predict the next time period, in this case defaulted to 1. This default will create a dataset where X is the quantity of the item at a given time (t) and Y is quantity of the item at the next time (t + 1). LSTMs are sensitive to the scale of the input data, specifically when the sigmoid or tanh activation functions are used. We rescale the data to the range of 0-to-1. This is also called normalizing. We will normalize the dataset using the MinMaxScaler preprocessing class from the scikit-learn library. After we model our data and estimate the accuracy of our model on the training dataset, we need to get an idea of the skill of the model on new unseen data. For a normal classification or regression problem, we would do this using cross validation. With time series data, the sequence of values is important. A simple method that we used is to split the ordered dataset into train and test datasets. We calculate the index of the split point and separates the data into the training datasets with 67% of the observations that we can use to train our model, leaving the remaining 33% for testing the model. The LSTM network expects the input data (X) to be provided with a specific array structure in the form of: [samples, time steps, features]. Currently, our data is in the form: [samples, features] and we are framing the problem as one-time step for each sample. We can transform the prepared train and test input data into the expected structure using numpy.reshape() 9 We build the LSTM network by creating one visible layer with one input, one hidden layer with four LSTM blocks or neurons and an output layer that makes a single value prediction. Once the model is fit, we can estimate the performance of the model on the train and test datasets. After that, we invert the predictions before calculating error scores to ensure that performance is reported in the same units as the original data. The Error Score – RMSE is then calculated for the model. We can see from result that Train Score and Test Score have the approximate Error Score, which means that our model is quite good. To present the visualization of the predicted values and the trained ones, we use matplotlib library to show the comparison among them, in which, the original dataset in blue, the predictions for the training dataset in orange, and the predictions on the unseen test dataset in green. 10 Figure 2. The comparison of predicted data versus the original and training ones III. CONCLUSION AND FUTURE WORK The result of the experiment has proved the efficiency of LSTM in predicting the sequential data. More challenging tasks with LSTM networks such as anomaly detection and text prediction are more of interest and require the variations of the LSTM for optimal results. In the future, the hybrid of neural networks will be introduced so that we can get the better insight and hopefully the research models can be applied in the empirical affair. REFERENCE [1] Philip Doganis, Alex Alexandridis, Panagiotis Patrinos, Haralambos Sarimveis, "Time series sales forecasting for short shelf-life food products based on artificial neural networks and evolutionary computing," Journal of Food and Engineering, vol. 75, no. 2, pp. 196-204, 2006. [2] Y. B. a. G. H. Y. LeCun, "Deep Learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015. [3] Graves,A.;Mohamed,A.;Hinton,G, "Speech recognition with deep recurrent neural networks," in IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013. [4] Cho, K.; van Merrienboer, B.; Gülçehre, Ç.; Bougares, F.; Schwenk, H.; Bengio, Y, "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv:1406.1078, 2014. [5] Sutskever, I.; Vinyals, O.; Le, Q.V, "Sequence to sequence learning with neural networks," in 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014. [6] Mikolov, T.; Joulin, A.; Chopra, S.; Mathieu, M.; Ranzato, M.A, " Learning 11 longer memory in recurrent neural networks," arXiv:1412.7753, arXiv 2014. [7] Li, Y.; Cao, H, "Prediction for tourism ﬂow based on LSTM neural network," Procedia Comput. Sci, pp. 129, 277–283, 2018. [8] Yanjie, D.; Yisheng, L.; Fei-Yue, W, "Travel time prediction with LSTM neural network," in the IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016. [9] Nelson, D.M.Q.; Pereira, A.C.M.; de Oliveira, R.A, "Stock market’s price movement prediction with LSTM neural networks," in the International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017. [10] Hu, C.; Wu, Q.; Li, H.; Jian, S.; Li, N.; Lou, Z, "Deep learning with a long short-term memory networks approach for rainfall-runoﬀ simulation," Water, vol. 10, p. 1543, 2018. [11] Le, X.-H.; Ho, V.H.; Lee, G.; Jung, S, "A deep neural network application for forecasting the inﬂow into the Hoa Binh reservoir inVietnam," in the11th International Symposium on Lowland Technology (ISLT 2018), Hanoi, Vietnam, 26–28 September 2018. [12] S. Yan, "Understanding LSTM and Its Diagrams," 14 3 2016. [Online]. Available: https://medium.com/mlreview/ understanding-lstm-and-its-diagrams- 37e2f46f1714. [Accessed 21 3 2020]. [13] N. D. a. S. P. H. R. J. Frank, "Time Series Prediction and Neural Networks," Journal of Intelligent & Robotic Systems, vol. 31, no. 1, pp. 99-103, 2001. [14] G. E. D. a. G. H. A.-r. Mohamed, "Acoustic Modeling using Deep Belief Networks," in EEE Transactions on Audio, Speech, and Language Processing, 2012. [15] L. R. a. B. Juang, "An Introduction to Hidden Markov Models," IEEE ASSP Magazine, vol. 3, no. 1, pp. 4-16, 1986. [16] P. J. Werbos, "Backpropagation Through Time: What It Does and How to Do It," in Proceedings of the IEEE, 1990. [17] Y. B. P. F. a. J. S. S. Hochreiter, "Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies," 2001. [Online]. Available: [18] P. S. a. P. F. Y. Bengio, "Learning Long-Term Dependencies with Gradient Descent is Difficult," IEEE transactions on neural networks, vol. 5, no. 2, pp. 157-166, 1994. [19] T. M. a. Y. B. R. Pascanu, "On the Difficulty of Training Recurrent Neural Networks," ICML, vol. 28, no. 3, p. 1310–1318, 2013. [20] J. S. a. F. C. F. A. Gers, "Learning to forget: Continual Prediction with LSTM," Neural computation, pp. 2451-2471, 2000. [21] LeCun, Y.; Bengio, Y.; Hinton, G, "Deep learning," Nature, vol. 521, p. 436, 2015.

Các file đính kèm theo tài liệu này:

du_doan_du_lieu_dang_chuoi_su_dung_mang_than_kinh_lstm.pdf