A Transformer Self-attention Model for Time Series Forecasting

Mohammadi Farsani, R.; Pazouki, E.

doi:10.22061/jecei.2020.7426.391

Document Type : Original Research Paper

Authors

Artificial Intelligence Department, Faculty of Computer Engineering , Shahid RajaeeTeacher Training University, Tehran, Iran.

https://doi.org/10.22061/jecei.2020.7426.391

Abstract

Background and Objectives: Many real-world problems are time series forecasting (TSF) problem. Therefore, providing more accurate and flexible forecasting methods have always been a matter of interest to researchers. An important issue in forecasting the time series is the predicated time interval.
Methods: In this paper, a new method is proposed for time series forecasting that can make more accurate predictions at larger intervals than other existing methods. Neural networks are an effective tool for estimating time series due to their nonlinearity and their ability to be used for different time series without specific information of those. A variety of neural networks have been introduced so far, some of which have been used in forecasting time series. Encoder decoder Networks are an example of networks that can be used in time series forcasting. an encoder network encodes the input data based on a particular pattern and then a decoder network decodes the output based on the encoded input to produce the desired output. Since these networks have a better understanding of the context, they provide a better performance. An example of this type of network is transformer. A transformer neural network based on the self-attention is presented that has special capability in forecasting time series problems.
Results: The proposed model has been evaluated through experimental results on two benchmark real-world TSF datasets from different domain. The experimental results states that, in terms of long-term estimation Up to eight times more resistant and in terms of estimation accuracy about 20 percent improvement, compare to other well-known methods, is obtained. Computational complexity has also been significantly reduced.
Conclusion: The proposed tool could perform better or compete with other introduced methods with less computational complexity and longer estimation intervals. It was also found that with better configuration of the network and better adjustment of attention, it is possible to obtain more desirable results in any specific problem.

Keywords

20.1001.1.23223952.2021.9.1.1.7

Main Subjects

Data mining

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit: http://creativecommons.org/licenses/by/4.0/

Publisher’s Note

JECEI Publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Publisher

Shahid Rajaee Teacher Training University

References

[1] R. Adhikari, R.K. Agrawal, “An introductory study on time series modeling and forecasting,” ArXiv Prepr. ArXiv13026613, 2013.

[2] G. E. Box, G. M. Jenkins, G. Reinsel, “Time series analysis: forecasting and control Holden-day San Francisco,” BoxTime Ser. Anal. Forecast. Control Holden Day1970, 1970.

[3] K.W. Hipel, A.I. McLeod, Time series modelling of water resources and environmental systems, 45, Elsevier, 1994.

[4] G.P. Zhang, “Time series forecasting using a hybrid ARIMA and neural network model,” Neurocomputing, 50: 159–175, 2003.

[5] F. Rosenblatt, “Principles of neurodynamics. perceptrons and the theory of brain mechanisms,” Cornell Aeronautical Lab Inc Buffalo NY, 1961.

[6] A. Krizhevsky, I. Sutskever, G.E. Hinton, “Imagenet classification with deep convolutional neural networks,” Comm. ACM, 60(6): 1097–1105, 2012.

[7] F.A. Gers, J. Schmidhuber, F. Cummins, “Learning to forget: continual prediction with LSTM,” Neural Comput., 12(10): 2451–2471, 2000.

[8] L.-J. Cao, F.E.H. Tay, “Support vector machine with adaptive parameters in financial time series forecasting,” IEEE Trans. Neural Netw., 14: 1506–1518, 2003.

[9] J. Brownlee, Deep Learning for Time Series Forecasting: Predict the Future with MLPs, CNNs and LSTMs in Python. Machine Learning Mastery, 2018.

[10] J. Faraway, C. Chatfield, “Time series forecasting with neural networks: a comparative study using the airline data,” J. R. Stat. Soc. Ser. C Appl. Stat., 47: 231–250, 1998.

[11] A. Borovykh, S. Bohte, C.W. Oosterlee, “Conditional time series forecasting with convolutional neural networks,” ArXiv Prepr. ArXiv170304691, 2017.

[12] R. Mittelman, “Time-series modeling with undecimated fully convolutional neural networks,” ArXiv Prepr. ArXiv150800317, 2015.

[13] O.B. Sezer, M.U. Gudelek, A.M. Ozbayoglu, “Financial time series forecasting with deep learning : A systematic literature review: 2005–2019,” Appl. Soft Comput., 90: 106181, 2020.

[14] S. Hochreiter, J. Schmidhuber, “Long short-term memory,” Neural Comput., 9: 1735–1780, 1997.

[15] H. Hewamalage, C. Bergmeir, K. Bandara, “Recurrent Neural networks for time series forecasting: current status and future directions,” Int. J. Forecast., 37(1): 388–427, 2021.

[16] Z. Cui, R. Ke, Y. Wang, “Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction,” ArXiv Prepr. ArXiv180102143, 2018.

[17] M. Schuster, K.K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Trans. Signal Process., 45: 2673–2681, 1997.

[18] Z. Cui, R. Ke, Y. Wang, “Deep stacked bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction,” presented at the 6th International Workshop on Urban Computing (UrbComp 2017), Nova Scotia, Canada, 2016.

[19] H. Yao, X. Tang, H. Wei, G. Zheng, Y. Yu, Z. Li, “Modeling spatial-temporal dynamics for traffic prediction,” ArXiv Prepr. ArXiv180301254, 2018.

[20] J. Ke, H. Zheng, H. Yang, X. M. Chen, “Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach,” Transp. Res. Part C Emerg. Technol., 85: 591–608, 2017.

[21] Y. Liu, H. Zheng, X. Feng, Z. Chen, “Short-term traffic flow prediction with Conv-LSTM,” presented at International Conference on Wireless Communications and Signal Processing, Nanjing, China, 2017.

[22] S. Shastri, K. Singh, S. Kumar, P. Kour, V. Mansotra, “Time series forecasting of Covid-19 using deep learning models: India-USA comparative case study,” Chaos Solitons Fractals, 140: 110227, 2020.

[23] R.K. Pathan, M. Biswas, M.U. Khandaker, “Time series prediction of COVID-19 by mutation rate analysis using recurrent neural network-based LSTM model,” Chaos Solitons Fractals, 138: 1-7, 2020,

[24] N. Wu, B. Green, X. Ben, S. O’Banion, “Deep transformer models for time series forecasting: the influenza prevalence case,” ArXiv200108317 Cs Stat, 2020.

[25] M.T. Luong, H. Pham, C.D. Manning, “Effective approaches to attention-based neural machine translation,” ArXiv Prepr. ArXiv150804025, 2015.

[26] Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, G. Cottrell, “A dual-stage attention-based recurrent neural network for time series prediction,” ArXiv Prepr. ArXiv170402971, 2017.

[27] Y. Liang, S. Ke, J. Zhang, X. Yi, Y. Zheng, “GeoMAN: multi-level attention networks for geo-sensory time series prediction,” in Proc. International Joint Conference on Artificial Intelligence: 3428-3434, 2018.

[28] N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, “Attention is all you need,” ArXiv Prepr. ArXiv1706.03762, 2017.

[29] N. Parmar, A. Vaswani, J. Uszkoreit, Ł. Kaiser, N. Shazeer, A. Ku , D. Tran, “Image transformer,” ArXiv Prepr. ArXiv180205751, 2018.

[30] D. Povey, H. Hadian, P. Ghahremani, K. Li, S. Khudanpur, “A time-restricted self-attention layer for asr,” presented at IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, AB, Canada, 2018.

[31] P. J. Liu, M. Saleh, E. Pot, B. Goodrich, R. Sepassi, L. Kaiser, N. Shazeer, “Generating wikipedia by summarizing long sequences,” ArXiv Prepr. ArXiv180110198, 2018.

[32] C.Z.A. Huang, A. Vaswani, J. Uszkoreit, I. Simon, C. Hawthorne, N. Shazeer, A.M. Dai, M. D. Hoffman, M. Dinculescu, D. Eck, “Music transformer: Generating music with long-term structure,” ArXiv Prepr. ArXiv180904281, 2018.

[33] P. Shaw, J. Uszkoreit, A. Vaswani, “Self-attention with relative position representations,” ArXiv Prepr. ArXiv180302155, 2018.

[34] A. Graves, “Generating sequences with recurrent neural networks,” ArXiv Prepr. ArXiv13080850, 2013.

[35] D. Salinas, V. Flunkert, J. Gasthaus, “DeepAR: Probabilistic forecasting with autoregressive recurrent networks,” ArXiv Prepr. ArXiv170404110, 2017.

[36] H.-F. Yu, N. Rao, I. S. Dhillon, “Temporal regularized matrix factorization for high-dimensional time series prediction,” in Advances in neural information processing systems, 29: 847–855, 2016.

[37] S. Li, X. Jin, Y. Xuan, X. Zhou, W. Chen, Y. X. Wang, X. Yan, “Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting,” ArXiv Prepr. ArXiv190700235, 2019.

[38] D.P. Kingma, J. Ba, “Adam: A method for stochastic optimization,” ArXiv Prepr. ArXiv14126980, 2014.

LETTERS TO EDITOR

Journal of Electrical and Computer Engineering Innovations (JECEI) welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in JECEI should be sent to the editorial office of JECEI within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.

[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.

[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.

[3] Letters can be no more than 300 words in length.

[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.

[5] Anonymous letters will not be considered.

[6] Letter writers must include their city and state of residence or work.

[7] Letters will be edited for clarity and length.

Name *

Email Address *

Affiliation *

Comments *

Security Code *

Journal of Electrical and Computer Engineering Innovations (JECEI)

A Transformer Self-attention Model for Time Series Forecasting

References

References

Send comment about this article

Volume 9, Issue 1
January 2021
Pages 1-10

A Transformer Self-attention Model for Time Series Forecasting

References

References

Send comment about this article

Volume 9, Issue 1January 2021Pages 1-10

Volume 9, Issue 1
January 2021
Pages 1-10