ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Mathematics/Management 11 May 2022

Sparse assortment personalization in high dimensions

Cite this:
https://doi.org/10.52396/JUSTC-2021-0214
More Information
  • Author Bio:

    Jingyu Shao is currently a graduate student under the tutelage of Professor Zemin Zheng at the University of Science and Technology of China. His research interests focus on statistical learning and variable selection

    Ruipeng Dong currently is a postdoctoral researcher at the Chinese University of Hong Kong, Shenzhen. He received his PhD degree in Statistics from the University of Science and Technology of China. His research interests focus on statistical learning and variable selection

  • Corresponding author: E-mail: drp@mail.ustc.edu.cn
  • Received Date: 29 November 2021
  • Accepted Date: 06 January 2022
  • Available Online: 11 May 2022
  • The data-driven conditional multinomial logit choice model with customer features performs well in the assortment personalization problem when the low-rank structure of the parameter matrix is considered. However, despite recent theoretical and algorithmic advances, parameter estimation in the choice model still poses a challenging task, especially when there are more predictors than observations. For this reason, we suggest a penalized likelihood approach based on a feature matrix to recover the sparse structure from populations and products toward the assortment. Our proposed method considers simultaneously low-rank and sparsity structures, which can further reduce model complexity and improve its estimation and prediction accuracy. A new algorithm, sparse factorial gradient descent (SFGD), was proposed to estimate the parameter matrix, which has high interpretability and efficient computing performance. As a first-order method, the SFGD works well in high-dimensional scenarios because of the absence of the Hessian matrix. Simulation studies show that the SFGD algorithm outperforms state-of-the-art methods in terms of estimation, sparsity recovery, and average regret. We also demonstrate the effectiveness of our proposed method using advertising behavior data analysis.

      A penalized likelihood approach is proposed, in which the low-rank and sparsity structure are considered simultaneously. New algorithm sparse factored gradient descent (SFGD) is proposed to estimate the parameter matrix.

    The data-driven conditional multinomial logit choice model with customer features performs well in the assortment personalization problem when the low-rank structure of the parameter matrix is considered. However, despite recent theoretical and algorithmic advances, parameter estimation in the choice model still poses a challenging task, especially when there are more predictors than observations. For this reason, we suggest a penalized likelihood approach based on a feature matrix to recover the sparse structure from populations and products toward the assortment. Our proposed method considers simultaneously low-rank and sparsity structures, which can further reduce model complexity and improve its estimation and prediction accuracy. A new algorithm, sparse factorial gradient descent (SFGD), was proposed to estimate the parameter matrix, which has high interpretability and efficient computing performance. As a first-order method, the SFGD works well in high-dimensional scenarios because of the absence of the Hessian matrix. Simulation studies show that the SFGD algorithm outperforms state-of-the-art methods in terms of estimation, sparsity recovery, and average regret. We also demonstrate the effectiveness of our proposed method using advertising behavior data analysis.

    • The data-driven conditional multinomial logit choice model with customer features has a good performance in assortment personalization problem when a low-rank structure of parameter matrix is considered.
    • Our proposed method considers both low-rank and sparsity structure, which can further reduce model complexity and improve estimation and prediction accuracy.
    • New algorithm sparse factored gradient descent (SFGD) is proposed to estimate the parameter matrix, which enjoys high interpretability and efficient performance in computing.

  • loading
  • [1]
    Kallus N, Udell M. Dynamic assortment personalization in high dimensions. Operations Research, 2020, 68 (4): 1020–1037. doi: 10.1287/opre.2019.1948
    [2]
    Bernstein F, Kök A G, Xie L. Dynamic assortment customization with limited inventories. Manufacturing & Service Operations Management, 2015, 17 (4): 538–553. doi: 10.1287/msom.2015.0544
    [3]
    Golrezaei N, Nazerzadeh H, Rusmevichientong P. Real-time optimization of personalized assortments. Management Science, 2014, 60 (6): 1532–1551. doi: 10.1287/mnsc.2014.1939
    [4]
    Chen X, Owen Z, Pixton C, et al. A statistical learning approach to personalization in revenue management. Management Science, 2021, 68 (3): 1923–1937. doi: 10.1287/mnsc.2020.3772
    [5]
    Luo X, Andrews M, Fang Z, et al. Mobile targeting. Management Science, 2014, 60 (7): 1738–1756. doi: 10.1287/mnsc.2013.1836
    [6]
    Xue Z, Wang Z, Ettl M. Pricing personalized bundles: A new approach and an empirical study. Manufacturing & Service Operations Management, 2016, 18 (1): 51–68. doi: 10.1287/msom.2015.0563
    [7]
    Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 1996, 58 (1): 267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x
    [8]
    Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2006, 68 (1): 49–67. doi: 10.1111/j.1467-9868.2005.00532.x
    [9]
    Meier L, Van De Geer S, Bühlmann P. The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2008, 70 (1): 53–71. doi: 10.1111/j.1467-9868.2007.00627.x
    [10]
    Negahban S, Wainwright M J. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics, 2011, 39 (2): 1069–1097. doi: 10.1214/10-AOS850
    [11]
    Chen K, Chan K S, Stenseth N C. Reduced rank stochastic regression with a sparse singular value decomposition. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2012, 74 (2): 203–221. doi: 10.1111/j.1467-9868.2011.01002.x
    [12]
    Chen L, Huang J Z. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association, 2012, 107 (500): 1533–1545. doi: 10.1080/01621459.2012.734178
    [13]
    Chen K, Dong H, Chan K S. Reduced rank regression via adaptive nuclear norm penalization. Biometrika, 2013, 100 (4): 901–920. doi: 10.1093/biomet/ast036
    [14]
    Zhou K, Zha H, Song L. Learning social infectivity in sparse low-rank networks using multi-dimensional hawkes processes. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics. PMLR, 2013: 641-649.
    [15]
    Wang Y X, Xu H, Leng C. Provable subspace clustering: When LRR meets SSC. In: Proceedings of the 26th International Conference on Neural Information Processing Systems: Volume 1. Red Hook, NY: Curran Associates Inc, 2013: 64–72.
    [16]
    Feng J, Lin Z, Xu H, et al. Robust subspace segmentation with block-diagonal prior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2014: 3818–3825.
    [17]
    Chen J, Liu J, Ye J. Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Transactions on Knowledge Discovery from Data (TKDD), 2012, 5 (4): 1–31. doi: 10.1145/2086737.2086742
    [18]
    Agarwal A, Negahban S, Wainwright M J. Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. The Annals of Statistics, 2012, 40 (2): 1171–1197. doi: 10.1214/12-AOS1000
    [19]
    Chen J, Zhou J, Ye J. Integrating low-rank and group-sparse structures for robust multi-task learning. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2011: 42–50.
    [20]
    Mishra A, Dey D K, Chen K. Sequential co-sparse factor regression. Journal of Computational and Graphical Statistics, 2017, 26 (4): 814–825. doi: 10.1080/10618600.2017.1340891
    [21]
    Mishra A, Dey D K, Chen Y, et al. Generalized co-sparse factor regression. Computational Statistics & Data Analysis, 2021, 157: 107127. doi: 10.1016/j.csda.2020.107127
    [22]
    Boyd S, Parikh N, Chu E. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 2011, 3 (1): 1–122. doi: 10.1561/2200000016
    [23]
    Zheng Z, Bahadori M T, Liu Y, et al. Scalable interpretable multi-response regression via SEED. Journal of Machine Learning Research, 2019, 20 (107): 1–34.
    [24]
    Jain P, Netrapalli P, Sanghavi S. Low-rank matrix completion using alternating minimization. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing. New York: Association for Computing Machinery, 2013: 665–674.
    [25]
    Bhojanapalli S, Kyrillidis A, Sanghavi S. Dropping convexity for faster semi-definite optimization. In: 29th Annual Conference on Learning Theory. PMLR, 2016, 49: 530-582.
    [26]
    Golrezaei N, Nazerzadeh H, Rusmevichientong P. Real-time optimization of personalized assortments. Management Science, 2014, 60 (6): 1532–1551. doi: 10.1287/mnsc.2014.1939
    [27]
    Train K E. Discrete Choice Methods with Simulation. Cambridge, UK: Cambridge University Press, 2009.
    [28]
    Song Z, Woodruff D P, Zhong P. Low rank approximation with entrywise l1-norm error. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. New York: Association for Computing Machinery, 2017: 688–701.
    [29]
    Klopp O, Lounici K, Tsybakov A B. Robust matrix completion. Probability Theory and Related Fields, 2017, 169 (1): 523–564. doi: 10.1007/s00440-016-0736-y
    [30]
    Gorski J, Pfeuffer F, Klamroth K. Biconvex sets and optimization with biconvex functions: A survey and extensions. Mathematical Methods of Operations Research, 2007, 66 (3): 373–407. doi: 10.1007/s00186-007-0161-1
    [31]
    Fan Y, Tang C Y. Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society: Series B: (Statistical Methodology), 2013, 75 (3): 531–552. doi: 10.1111/rssb.12001
    [32]
    Wang H, Li B, Leng C. Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2009, 71 (3): 671–683. doi: 10.1111/j.1467-9868.2008.00693.x
    [33]
    Chen X, Krishnamurthy A, Wang Y. Robust dynamic assortment optimization in the presence of outlier customers. https:// arxiv.org/abs/1910.04183.
    [34]
    Boerman S C, Kruikemeier S, Zuiderveen Borgesius F J. Online behavioral advertising: A literature review and research agenda. Journal of Advertising, 2017, 46 (3): 363–376. doi: 10.1080/00913367.2017.1339368
    [35]
    Mirsky L. A trace inequality of John von Neumann. Monatshefte für Mathematik, 1975, 79 (4): 303–306. doi: 10.1007/BF01647331
  • 加载中

Catalog

    Figure  1.  Comparison of average regret between our proposed methods OFGD, SFGD, and MLE. The time horizon $ T $ ranges from 1000 to 20000. The constant of the baseline is chosen as $ C = 0.12 $.

    Figure  2.  Model performance by dataset size.

    [1]
    Kallus N, Udell M. Dynamic assortment personalization in high dimensions. Operations Research, 2020, 68 (4): 1020–1037. doi: 10.1287/opre.2019.1948
    [2]
    Bernstein F, Kök A G, Xie L. Dynamic assortment customization with limited inventories. Manufacturing & Service Operations Management, 2015, 17 (4): 538–553. doi: 10.1287/msom.2015.0544
    [3]
    Golrezaei N, Nazerzadeh H, Rusmevichientong P. Real-time optimization of personalized assortments. Management Science, 2014, 60 (6): 1532–1551. doi: 10.1287/mnsc.2014.1939
    [4]
    Chen X, Owen Z, Pixton C, et al. A statistical learning approach to personalization in revenue management. Management Science, 2021, 68 (3): 1923–1937. doi: 10.1287/mnsc.2020.3772
    [5]
    Luo X, Andrews M, Fang Z, et al. Mobile targeting. Management Science, 2014, 60 (7): 1738–1756. doi: 10.1287/mnsc.2013.1836
    [6]
    Xue Z, Wang Z, Ettl M. Pricing personalized bundles: A new approach and an empirical study. Manufacturing & Service Operations Management, 2016, 18 (1): 51–68. doi: 10.1287/msom.2015.0563
    [7]
    Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 1996, 58 (1): 267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x
    [8]
    Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2006, 68 (1): 49–67. doi: 10.1111/j.1467-9868.2005.00532.x
    [9]
    Meier L, Van De Geer S, Bühlmann P. The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2008, 70 (1): 53–71. doi: 10.1111/j.1467-9868.2007.00627.x
    [10]
    Negahban S, Wainwright M J. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics, 2011, 39 (2): 1069–1097. doi: 10.1214/10-AOS850
    [11]
    Chen K, Chan K S, Stenseth N C. Reduced rank stochastic regression with a sparse singular value decomposition. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2012, 74 (2): 203–221. doi: 10.1111/j.1467-9868.2011.01002.x
    [12]
    Chen L, Huang J Z. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association, 2012, 107 (500): 1533–1545. doi: 10.1080/01621459.2012.734178
    [13]
    Chen K, Dong H, Chan K S. Reduced rank regression via adaptive nuclear norm penalization. Biometrika, 2013, 100 (4): 901–920. doi: 10.1093/biomet/ast036
    [14]
    Zhou K, Zha H, Song L. Learning social infectivity in sparse low-rank networks using multi-dimensional hawkes processes. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics. PMLR, 2013: 641-649.
    [15]
    Wang Y X, Xu H, Leng C. Provable subspace clustering: When LRR meets SSC. In: Proceedings of the 26th International Conference on Neural Information Processing Systems: Volume 1. Red Hook, NY: Curran Associates Inc, 2013: 64–72.
    [16]
    Feng J, Lin Z, Xu H, et al. Robust subspace segmentation with block-diagonal prior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2014: 3818–3825.
    [17]
    Chen J, Liu J, Ye J. Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Transactions on Knowledge Discovery from Data (TKDD), 2012, 5 (4): 1–31. doi: 10.1145/2086737.2086742
    [18]
    Agarwal A, Negahban S, Wainwright M J. Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. The Annals of Statistics, 2012, 40 (2): 1171–1197. doi: 10.1214/12-AOS1000
    [19]
    Chen J, Zhou J, Ye J. Integrating low-rank and group-sparse structures for robust multi-task learning. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2011: 42–50.
    [20]
    Mishra A, Dey D K, Chen K. Sequential co-sparse factor regression. Journal of Computational and Graphical Statistics, 2017, 26 (4): 814–825. doi: 10.1080/10618600.2017.1340891
    [21]
    Mishra A, Dey D K, Chen Y, et al. Generalized co-sparse factor regression. Computational Statistics & Data Analysis, 2021, 157: 107127. doi: 10.1016/j.csda.2020.107127
    [22]
    Boyd S, Parikh N, Chu E. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 2011, 3 (1): 1–122. doi: 10.1561/2200000016
    [23]
    Zheng Z, Bahadori M T, Liu Y, et al. Scalable interpretable multi-response regression via SEED. Journal of Machine Learning Research, 2019, 20 (107): 1–34.
    [24]
    Jain P, Netrapalli P, Sanghavi S. Low-rank matrix completion using alternating minimization. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing. New York: Association for Computing Machinery, 2013: 665–674.
    [25]
    Bhojanapalli S, Kyrillidis A, Sanghavi S. Dropping convexity for faster semi-definite optimization. In: 29th Annual Conference on Learning Theory. PMLR, 2016, 49: 530-582.
    [26]
    Golrezaei N, Nazerzadeh H, Rusmevichientong P. Real-time optimization of personalized assortments. Management Science, 2014, 60 (6): 1532–1551. doi: 10.1287/mnsc.2014.1939
    [27]
    Train K E. Discrete Choice Methods with Simulation. Cambridge, UK: Cambridge University Press, 2009.
    [28]
    Song Z, Woodruff D P, Zhong P. Low rank approximation with entrywise l1-norm error. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. New York: Association for Computing Machinery, 2017: 688–701.
    [29]
    Klopp O, Lounici K, Tsybakov A B. Robust matrix completion. Probability Theory and Related Fields, 2017, 169 (1): 523–564. doi: 10.1007/s00440-016-0736-y
    [30]
    Gorski J, Pfeuffer F, Klamroth K. Biconvex sets and optimization with biconvex functions: A survey and extensions. Mathematical Methods of Operations Research, 2007, 66 (3): 373–407. doi: 10.1007/s00186-007-0161-1
    [31]
    Fan Y, Tang C Y. Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society: Series B: (Statistical Methodology), 2013, 75 (3): 531–552. doi: 10.1111/rssb.12001
    [32]
    Wang H, Li B, Leng C. Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2009, 71 (3): 671–683. doi: 10.1111/j.1467-9868.2008.00693.x
    [33]
    Chen X, Krishnamurthy A, Wang Y. Robust dynamic assortment optimization in the presence of outlier customers. https:// arxiv.org/abs/1910.04183.
    [34]
    Boerman S C, Kruikemeier S, Zuiderveen Borgesius F J. Online behavioral advertising: A literature review and research agenda. Journal of Advertising, 2017, 46 (3): 363–376. doi: 10.1080/00913367.2017.1339368
    [35]
    Mirsky L. A trace inequality of John von Neumann. Monatshefte für Mathematik, 1975, 79 (4): 303–306. doi: 10.1007/BF01647331

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return