The data-driven conditional multinomial logit choice model with customer features performs well in the assortment personalization problem when the low-rank structure of the parameter matrix is considered. However, despite recent theoretical and algorithmic advances, parameter estimation in the choice model still poses a challenging task, especially when there are more predictors than observations. For this reason, we suggest a penalized likelihood approach based on a feature matrix to recover the sparse structure from populations and products toward the assortment. Our proposed method considers simultaneously low-rank and sparsity structures, which can further reduce model complexity and improve its estimation and prediction accuracy. A new algorithm, sparse factorial gradient descent (SFGD), was proposed to estimate the parameter matrix, which has high interpretability and efficient computing performance. As a first-order method, the SFGD works well in high-dimensional scenarios because of the absence of the Hessian matrix. Simulation studies show that the SFGD algorithm outperforms state-of-the-art methods in terms of estimation, sparsity recovery, and average regret. We also demonstrate the effectiveness of our proposed method using advertising behavior data analysis.
The data-driven conditional multinomial logit choice model with customer features performs well in the assortment personalization problem when the low-rank structure of the parameter matrix is considered. However, despite recent theoretical and algorithmic advances, parameter estimation in the choice model still poses a challenging task, especially when there are more predictors than observations. For this reason, we suggest a penalized likelihood approach based on a feature matrix to recover the sparse structure from populations and products toward the assortment. Our proposed method considers simultaneously low-rank and sparsity structures, which can further reduce model complexity and improve its estimation and prediction accuracy. A new algorithm, sparse factorial gradient descent (SFGD), was proposed to estimate the parameter matrix, which has high interpretability and efficient computing performance. As a first-order method, the SFGD works well in high-dimensional scenarios because of the absence of the Hessian matrix. Simulation studies show that the SFGD algorithm outperforms state-of-the-art methods in terms of estimation, sparsity recovery, and average regret. We also demonstrate the effectiveness of our proposed method using advertising behavior data analysis.
[1] |
Kallus N, Udell M. Dynamic assortment personalization in high dimensions. Operations Research, 2020, 68 (4): 1020–1037. doi: 10.1287/opre.2019.1948
|
[2] |
Bernstein F, Kök A G, Xie L. Dynamic assortment customization with limited inventories. Manufacturing & Service Operations Management, 2015, 17 (4): 538–553. doi: 10.1287/msom.2015.0544
|
[3] |
Golrezaei N, Nazerzadeh H, Rusmevichientong P. Real-time optimization of personalized assortments. Management Science, 2014, 60 (6): 1532–1551. doi: 10.1287/mnsc.2014.1939
|
[4] |
Chen X, Owen Z, Pixton C, et al. A statistical learning approach to personalization in revenue management. Management Science, 2021, 68 (3): 1923–1937. doi: 10.1287/mnsc.2020.3772
|
[5] |
Luo X, Andrews M, Fang Z, et al. Mobile targeting. Management Science, 2014, 60 (7): 1738–1756. doi: 10.1287/mnsc.2013.1836
|
[6] |
Xue Z, Wang Z, Ettl M. Pricing personalized bundles: A new approach and an empirical study. Manufacturing & Service Operations Management, 2016, 18 (1): 51–68. doi: 10.1287/msom.2015.0563
|
[7] |
Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 1996, 58 (1): 267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x
|
[8] |
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2006, 68 (1): 49–67. doi: 10.1111/j.1467-9868.2005.00532.x
|
[9] |
Meier L, Van De Geer S, Bühlmann P. The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2008, 70 (1): 53–71. doi: 10.1111/j.1467-9868.2007.00627.x
|
[10] |
Negahban S, Wainwright M J. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics, 2011, 39 (2): 1069–1097. doi: 10.1214/10-AOS850
|
[11] |
Chen K, Chan K S, Stenseth N C. Reduced rank stochastic regression with a sparse singular value decomposition. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2012, 74 (2): 203–221. doi: 10.1111/j.1467-9868.2011.01002.x
|
[12] |
Chen L, Huang J Z. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association, 2012, 107 (500): 1533–1545. doi: 10.1080/01621459.2012.734178
|
[13] |
Chen K, Dong H, Chan K S. Reduced rank regression via adaptive nuclear norm penalization. Biometrika, 2013, 100 (4): 901–920. doi: 10.1093/biomet/ast036
|
[14] |
Zhou K, Zha H, Song L. Learning social infectivity in sparse low-rank networks using multi-dimensional hawkes processes. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics. PMLR, 2013: 641-649.
|
[15] |
Wang Y X, Xu H, Leng C. Provable subspace clustering: When LRR meets SSC. In: Proceedings of the 26th International Conference on Neural Information Processing Systems: Volume 1. Red Hook, NY: Curran Associates Inc, 2013: 64–72.
|
[16] |
Feng J, Lin Z, Xu H, et al. Robust subspace segmentation with block-diagonal prior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2014: 3818–3825.
|
[17] |
Chen J, Liu J, Ye J. Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Transactions on Knowledge Discovery from Data (TKDD), 2012, 5 (4): 1–31. doi: 10.1145/2086737.2086742
|
[18] |
Agarwal A, Negahban S, Wainwright M J. Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. The Annals of Statistics, 2012, 40 (2): 1171–1197. doi: 10.1214/12-AOS1000
|
[19] |
Chen J, Zhou J, Ye J. Integrating low-rank and group-sparse structures for robust multi-task learning. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2011: 42–50.
|
[20] |
Mishra A, Dey D K, Chen K. Sequential co-sparse factor regression. Journal of Computational and Graphical Statistics, 2017, 26 (4): 814–825. doi: 10.1080/10618600.2017.1340891
|
[21] |
Mishra A, Dey D K, Chen Y, et al. Generalized co-sparse factor regression. Computational Statistics & Data Analysis, 2021, 157: 107127. doi: 10.1016/j.csda.2020.107127
|
[22] |
Boyd S, Parikh N, Chu E. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 2011, 3 (1): 1–122. doi: 10.1561/2200000016
|
[23] |
Zheng Z, Bahadori M T, Liu Y, et al. Scalable interpretable multi-response regression via SEED. Journal of Machine Learning Research, 2019, 20 (107): 1–34.
|
[24] |
Jain P, Netrapalli P, Sanghavi S. Low-rank matrix completion using alternating minimization. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing. New York: Association for Computing Machinery, 2013: 665–674.
|
[25] |
Bhojanapalli S, Kyrillidis A, Sanghavi S. Dropping convexity for faster semi-definite optimization. In: 29th Annual Conference on Learning Theory. PMLR, 2016, 49: 530-582.
|
[26] |
Golrezaei N, Nazerzadeh H, Rusmevichientong P. Real-time optimization of personalized assortments. Management Science, 2014, 60 (6): 1532–1551. doi: 10.1287/mnsc.2014.1939
|
[27] |
Train K E. Discrete Choice Methods with Simulation. Cambridge, UK: Cambridge University Press, 2009.
|
[28] |
Song Z, Woodruff D P, Zhong P. Low rank approximation with entrywise l1-norm error. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. New York: Association for Computing Machinery, 2017: 688–701.
|
[29] |
Klopp O, Lounici K, Tsybakov A B. Robust matrix completion. Probability Theory and Related Fields, 2017, 169 (1): 523–564. doi: 10.1007/s00440-016-0736-y
|
[30] |
Gorski J, Pfeuffer F, Klamroth K. Biconvex sets and optimization with biconvex functions: A survey and extensions. Mathematical Methods of Operations Research, 2007, 66 (3): 373–407. doi: 10.1007/s00186-007-0161-1
|
[31] |
Fan Y, Tang C Y. Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society: Series B: (Statistical Methodology), 2013, 75 (3): 531–552. doi: 10.1111/rssb.12001
|
[32] |
Wang H, Li B, Leng C. Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2009, 71 (3): 671–683. doi: 10.1111/j.1467-9868.2008.00693.x
|
[33] |
Chen X, Krishnamurthy A, Wang Y. Robust dynamic assortment optimization in the presence of outlier customers. https:// arxiv.org/abs/1910.04183.
|
[34] |
Boerman S C, Kruikemeier S, Zuiderveen Borgesius F J. Online behavioral advertising: A literature review and research agenda. Journal of Advertising, 2017, 46 (3): 363–376. doi: 10.1080/00913367.2017.1339368
|
[35] |
Mirsky L. A trace inequality of John von Neumann. Monatshefte für Mathematik, 1975, 79 (4): 303–306. doi: 10.1007/BF01647331
|
[1] |
Kallus N, Udell M. Dynamic assortment personalization in high dimensions. Operations Research, 2020, 68 (4): 1020–1037. doi: 10.1287/opre.2019.1948
|
[2] |
Bernstein F, Kök A G, Xie L. Dynamic assortment customization with limited inventories. Manufacturing & Service Operations Management, 2015, 17 (4): 538–553. doi: 10.1287/msom.2015.0544
|
[3] |
Golrezaei N, Nazerzadeh H, Rusmevichientong P. Real-time optimization of personalized assortments. Management Science, 2014, 60 (6): 1532–1551. doi: 10.1287/mnsc.2014.1939
|
[4] |
Chen X, Owen Z, Pixton C, et al. A statistical learning approach to personalization in revenue management. Management Science, 2021, 68 (3): 1923–1937. doi: 10.1287/mnsc.2020.3772
|
[5] |
Luo X, Andrews M, Fang Z, et al. Mobile targeting. Management Science, 2014, 60 (7): 1738–1756. doi: 10.1287/mnsc.2013.1836
|
[6] |
Xue Z, Wang Z, Ettl M. Pricing personalized bundles: A new approach and an empirical study. Manufacturing & Service Operations Management, 2016, 18 (1): 51–68. doi: 10.1287/msom.2015.0563
|
[7] |
Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 1996, 58 (1): 267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x
|
[8] |
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2006, 68 (1): 49–67. doi: 10.1111/j.1467-9868.2005.00532.x
|
[9] |
Meier L, Van De Geer S, Bühlmann P. The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2008, 70 (1): 53–71. doi: 10.1111/j.1467-9868.2007.00627.x
|
[10] |
Negahban S, Wainwright M J. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics, 2011, 39 (2): 1069–1097. doi: 10.1214/10-AOS850
|
[11] |
Chen K, Chan K S, Stenseth N C. Reduced rank stochastic regression with a sparse singular value decomposition. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2012, 74 (2): 203–221. doi: 10.1111/j.1467-9868.2011.01002.x
|
[12] |
Chen L, Huang J Z. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association, 2012, 107 (500): 1533–1545. doi: 10.1080/01621459.2012.734178
|
[13] |
Chen K, Dong H, Chan K S. Reduced rank regression via adaptive nuclear norm penalization. Biometrika, 2013, 100 (4): 901–920. doi: 10.1093/biomet/ast036
|
[14] |
Zhou K, Zha H, Song L. Learning social infectivity in sparse low-rank networks using multi-dimensional hawkes processes. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics. PMLR, 2013: 641-649.
|
[15] |
Wang Y X, Xu H, Leng C. Provable subspace clustering: When LRR meets SSC. In: Proceedings of the 26th International Conference on Neural Information Processing Systems: Volume 1. Red Hook, NY: Curran Associates Inc, 2013: 64–72.
|
[16] |
Feng J, Lin Z, Xu H, et al. Robust subspace segmentation with block-diagonal prior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2014: 3818–3825.
|
[17] |
Chen J, Liu J, Ye J. Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Transactions on Knowledge Discovery from Data (TKDD), 2012, 5 (4): 1–31. doi: 10.1145/2086737.2086742
|
[18] |
Agarwal A, Negahban S, Wainwright M J. Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. The Annals of Statistics, 2012, 40 (2): 1171–1197. doi: 10.1214/12-AOS1000
|
[19] |
Chen J, Zhou J, Ye J. Integrating low-rank and group-sparse structures for robust multi-task learning. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2011: 42–50.
|
[20] |
Mishra A, Dey D K, Chen K. Sequential co-sparse factor regression. Journal of Computational and Graphical Statistics, 2017, 26 (4): 814–825. doi: 10.1080/10618600.2017.1340891
|
[21] |
Mishra A, Dey D K, Chen Y, et al. Generalized co-sparse factor regression. Computational Statistics & Data Analysis, 2021, 157: 107127. doi: 10.1016/j.csda.2020.107127
|
[22] |
Boyd S, Parikh N, Chu E. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 2011, 3 (1): 1–122. doi: 10.1561/2200000016
|
[23] |
Zheng Z, Bahadori M T, Liu Y, et al. Scalable interpretable multi-response regression via SEED. Journal of Machine Learning Research, 2019, 20 (107): 1–34.
|
[24] |
Jain P, Netrapalli P, Sanghavi S. Low-rank matrix completion using alternating minimization. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing. New York: Association for Computing Machinery, 2013: 665–674.
|
[25] |
Bhojanapalli S, Kyrillidis A, Sanghavi S. Dropping convexity for faster semi-definite optimization. In: 29th Annual Conference on Learning Theory. PMLR, 2016, 49: 530-582.
|
[26] |
Golrezaei N, Nazerzadeh H, Rusmevichientong P. Real-time optimization of personalized assortments. Management Science, 2014, 60 (6): 1532–1551. doi: 10.1287/mnsc.2014.1939
|
[27] |
Train K E. Discrete Choice Methods with Simulation. Cambridge, UK: Cambridge University Press, 2009.
|
[28] |
Song Z, Woodruff D P, Zhong P. Low rank approximation with entrywise l1-norm error. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. New York: Association for Computing Machinery, 2017: 688–701.
|
[29] |
Klopp O, Lounici K, Tsybakov A B. Robust matrix completion. Probability Theory and Related Fields, 2017, 169 (1): 523–564. doi: 10.1007/s00440-016-0736-y
|
[30] |
Gorski J, Pfeuffer F, Klamroth K. Biconvex sets and optimization with biconvex functions: A survey and extensions. Mathematical Methods of Operations Research, 2007, 66 (3): 373–407. doi: 10.1007/s00186-007-0161-1
|
[31] |
Fan Y, Tang C Y. Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society: Series B: (Statistical Methodology), 2013, 75 (3): 531–552. doi: 10.1111/rssb.12001
|
[32] |
Wang H, Li B, Leng C. Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2009, 71 (3): 671–683. doi: 10.1111/j.1467-9868.2008.00693.x
|
[33] |
Chen X, Krishnamurthy A, Wang Y. Robust dynamic assortment optimization in the presence of outlier customers. https:// arxiv.org/abs/1910.04183.
|
[34] |
Boerman S C, Kruikemeier S, Zuiderveen Borgesius F J. Online behavioral advertising: A literature review and research agenda. Journal of Advertising, 2017, 46 (3): 363–376. doi: 10.1080/00913367.2017.1339368
|
[35] |
Mirsky L. A trace inequality of John von Neumann. Monatshefte für Mathematik, 1975, 79 (4): 303–306. doi: 10.1007/BF01647331
|