ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Article 13 December 2022

A new deep reinforcement learning model for dynamic portfolio optimization

Cite this:
https://doi.org/10.52396/JUSTC-2022-0072
More Information
  • Author Bio:

    Weiwei Zhuang received her Ph.D. degree in Probability and Statistics from the University of Science and Technology of China (USTC) in 2006. She is an Associate Professor with the Department of Statistics and Finance, USTC. Her research interests include statistical dependence, stochastic comparisons, semiparametric model, and their applications

    Guoxin Qiu received his Ph.D. degree in Statistics from the University of Science and Technology of China (USTC) in 2017. He is currently a Professor with the Business School, Anhui Xinhua University. His research interests include information theory, stochastic comparisons, semiparametric model, and their applications

  • Corresponding author: E-mail: qiugx02@ustc.edu.cn
  • Received Date: 28 April 2022
  • Accepted Date: 22 May 2022
  • Available Online: 13 December 2022
  • There are many challenging problems for dynamic portfolio optimization using deep reinforcement learning, such as the high dimensions of the environmental and action spaces, as well as the extraction of useful information from a high-dimensional state space and noisy financial time-series data. To solve these problems, we propose a new model structure called the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) method with multi-head attention reinforcement learning. This new model integrates data processing methods, a deep learning model, and a reinforcement learning model to improve the perception and decision-making abilities of investors. Empirical analysis shows that our proposed model structure has some advantages in dynamic portfolio optimization. Moreover, we find another robust investment strategy in the process of experimental comparison, where each stock in the portfolio is given the same capital and the structure is applied separately.
    The overall framework of our research.
    There are many challenging problems for dynamic portfolio optimization using deep reinforcement learning, such as the high dimensions of the environmental and action spaces, as well as the extraction of useful information from a high-dimensional state space and noisy financial time-series data. To solve these problems, we propose a new model structure called the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) method with multi-head attention reinforcement learning. This new model integrates data processing methods, a deep learning model, and a reinforcement learning model to improve the perception and decision-making abilities of investors. Empirical analysis shows that our proposed model structure has some advantages in dynamic portfolio optimization. Moreover, we find another robust investment strategy in the process of experimental comparison, where each stock in the portfolio is given the same capital and the structure is applied separately.
    • The CEEMDAN-Multi-Att-RL structure advocated in this paper improves the trading ability of financial time series data compared with the deep reinforcement learning without any processing.
    • This paper explores two investment strategies of dynamic portfolio optimization using deep reinforcement learning. Investors can choose them according to their own risk preference.
    • Appropriate deep learning network and reward settings are configured according to the given stock market environment. These enhance learning effectiveness of deep reinforcement learning.

  • loading
  • [1]
    Neuneier R. Optimal asset allocation using adaptive dynamic programming. In: Proceedings of the 8th International Conference on Neural Information Processing Systems. New York: ACM, 1995: 952–958.
    [2]
    Nevmyvaka Y, Feng Y, Kearns M. Reinforcement learning for optimized trade execution. In: ICML '06: Proceedings of the 23rd International Conference on Machine Learning. New York: ACM Press, 2006: 673–680.
    [3]
    Meng T L, Khushi M. Reinforcement learning in financial markets. Data, 2019, 4: 110. doi: 10.3390/data4030110
    [4]
    Liu X, Xiong Z, Zhong S, et al. Practical deep reinforcement learning approach for stock trading. 2022. https://arxiv.org/abs/1811.07522. Accessed April 1, 2022.
    [5]
    Brim A. Deep reinforcement learning pairs trading with a double deep Q-network. In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2020: 222–227.
    [6]
    Gao Z, Gao Y, Hu Y, et al. Application of deep Q-network in portfolio management. In: 2020 5th IEEE International Conference on Big Data Analytics (ICBDA). IEEE, 2020: 268–275.
    [7]
    Lee J, Koh H, Choe H J. Learning to trade in financial time series using high-frequency through wavelet transformation and deep reinforcement learning. Applied Intelligence, 2021, 51: 6202–6223. doi: 10.1007/s10489-021-02218-4
    [8]
    Carta S, Corriga A, Ferreira A, et al. A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning. Applied Intelligence, 2021, 51: 889–905. doi: 10.1007/s10489-020-01839-5
    [9]
    Théate T, Ernst D. An application of deep reinforcement learning to algorithmic trading. Expert Systems with Applications, 2021, 173: 114632. doi: 10.1016/j.eswa.2021.114632
    [10]
    Lei K, Zhang B, Li Y, et al. Time-driven feature-aware jointly deep reinforcement learning for financial signal representation and algorithmic trading. Expert Systems with Applications, 2020, 140: 112872. doi: 10.1016/j.eswa.2019.112872
    [11]
    Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. n: Advances in Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000–6010.
    [12]
    Huang N E, Shen Z, Long S R, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London Series A: Mathematical, Physical and Engineering Sciences, 1998, 454: 903–995. doi: 10.1098/rspa.1998.0193
    [13]
    Torres M E, Colominas M A, Schlotthauer G, et al. A complete ensemble empirical mode decomposition with adaptive noise. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Prague, Czech Republic: IEEE, 2011: 4144–4147.
    [14]
    Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, Massachusetts: The MIT Press, 2018.
    [15]
    Bellman R. Dynamic Programming. Princeton: Princeton University Press, 1972.
  • 加载中

Catalog

    Figure  1.  Model structure.

    Figure  2.  Multi-head attention.

    Figure  3.  The structure diagram of the Q-network.

    Figure  4.  Principles of the two strategies.

    Figure  5.  Train set and test set.

    Figure  6.  The IMF components of code 600009, the left column for EMD and the right column for CEEMDAN.

    Figure  7.  The CEEMDAN and EMD method to process stock minute data.

    Figure  8.  Analysis of trading point.

    Figure  9.  Compound interest curve in 2018. The left subgraph represents the result of the first investment strategy, and the right subgraph represents the result of another investment strategy.

    Figure  11.  Compound interest curve in 2020. The left subgraph represents the result of the first investment strategy, and the right subgraph represents the result of another investment strategy.

    Figure  10.  Compound interest curve in 2019. The left subgraph represents the result of the first investment strategy, and the right subgraph represents the result of another investment strategy.

    [1]
    Neuneier R. Optimal asset allocation using adaptive dynamic programming. In: Proceedings of the 8th International Conference on Neural Information Processing Systems. New York: ACM, 1995: 952–958.
    [2]
    Nevmyvaka Y, Feng Y, Kearns M. Reinforcement learning for optimized trade execution. In: ICML '06: Proceedings of the 23rd International Conference on Machine Learning. New York: ACM Press, 2006: 673–680.
    [3]
    Meng T L, Khushi M. Reinforcement learning in financial markets. Data, 2019, 4: 110. doi: 10.3390/data4030110
    [4]
    Liu X, Xiong Z, Zhong S, et al. Practical deep reinforcement learning approach for stock trading. 2022. https://arxiv.org/abs/1811.07522. Accessed April 1, 2022.
    [5]
    Brim A. Deep reinforcement learning pairs trading with a double deep Q-network. In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2020: 222–227.
    [6]
    Gao Z, Gao Y, Hu Y, et al. Application of deep Q-network in portfolio management. In: 2020 5th IEEE International Conference on Big Data Analytics (ICBDA). IEEE, 2020: 268–275.
    [7]
    Lee J, Koh H, Choe H J. Learning to trade in financial time series using high-frequency through wavelet transformation and deep reinforcement learning. Applied Intelligence, 2021, 51: 6202–6223. doi: 10.1007/s10489-021-02218-4
    [8]
    Carta S, Corriga A, Ferreira A, et al. A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning. Applied Intelligence, 2021, 51: 889–905. doi: 10.1007/s10489-020-01839-5
    [9]
    Théate T, Ernst D. An application of deep reinforcement learning to algorithmic trading. Expert Systems with Applications, 2021, 173: 114632. doi: 10.1016/j.eswa.2021.114632
    [10]
    Lei K, Zhang B, Li Y, et al. Time-driven feature-aware jointly deep reinforcement learning for financial signal representation and algorithmic trading. Expert Systems with Applications, 2020, 140: 112872. doi: 10.1016/j.eswa.2019.112872
    [11]
    Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. n: Advances in Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000–6010.
    [12]
    Huang N E, Shen Z, Long S R, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London Series A: Mathematical, Physical and Engineering Sciences, 1998, 454: 903–995. doi: 10.1098/rspa.1998.0193
    [13]
    Torres M E, Colominas M A, Schlotthauer G, et al. A complete ensemble empirical mode decomposition with adaptive noise. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Prague, Czech Republic: IEEE, 2011: 4144–4147.
    [14]
    Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, Massachusetts: The MIT Press, 2018.
    [15]
    Bellman R. Dynamic Programming. Princeton: Princeton University Press, 1972.

    Article Metrics

    Article views (783) PDF downloads(729)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return