ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Mathematics 18 August 2023

Inference of subgroup-level treatment effects via generic causal tree in observational studies

Cite this:
https://doi.org/10.52396/JUSTC-2022-0054
More Information
  • Author Bio:

    Caiwei Zhang is a graduate student under the supervision of Prof. Zemin Zheng at the University of Science and Technology of China. Her research focus on causal inference based on machine learning

    Zemin Zheng is a Full Professor at the Department of Management, University of Science and Technology of China (USTC). He received his B.S. degree from the USTC in 2010 and Ph.D. degree from the University of Southern California in 2015. Afterwards, he has been working in the Department of Management, USTC. His research mainly focuses on high dimensional statistical inference and big data problems. He has published some articles in top journals of Statistics, including Journal of Royal Statistical Society Series B, The Annals of Statistics, Operations Research, and Journal of Machine Learning Research. In addition, he has also presided over several scientific research projects, including the Youth Project of the National Natural Foundation of China (NSFC) and General Projects of the NSFC

  • Corresponding author: E-mail: zhengzm@ustc.edu.cn
  • Received Date: 21 March 2022
  • Accepted Date: 25 May 2022
  • Available Online: 18 August 2023
  • Exploring heterogeneity in causal effects has wide applications in the field of policy evaluation and decision-making. In recent years, researchers have begun employing machine learning methods to study causality, among which the most popular methods generally estimate heterogeneous treatment effects at the individual level. However, we argue that in large sample cases, identifying heterogeneity at the subgroup level is more intuitive and intelligble from a decision-making perspective. In this paper, we provide a tree-based method, called the generic causal tree (GCT), to identify the subgroup-level treatment effects in observational studies. The tree is designed to split by maximizing the disparity of treatment effects between subgroups, embedding a semiparametric framework for the improvement of treatment effect estimation. To accomplish valid statistical inference of the tree-based estimators of treatment effects, we adopt honest estimation to separate tree-building process and inference process. In the simulation, we show that the GCT algorithm has distinct advantages in subgroup identification and gives estimation with higher accuracy compared with the other two benchmark methods. Additionally, we verify the effectiveness of statistical inference by GCT.
    A tree-based algorithm for subgroup identification with high interpretability allows valid inference for tree estimators.
    Exploring heterogeneity in causal effects has wide applications in the field of policy evaluation and decision-making. In recent years, researchers have begun employing machine learning methods to study causality, among which the most popular methods generally estimate heterogeneous treatment effects at the individual level. However, we argue that in large sample cases, identifying heterogeneity at the subgroup level is more intuitive and intelligble from a decision-making perspective. In this paper, we provide a tree-based method, called the generic causal tree (GCT), to identify the subgroup-level treatment effects in observational studies. The tree is designed to split by maximizing the disparity of treatment effects between subgroups, embedding a semiparametric framework for the improvement of treatment effect estimation. To accomplish valid statistical inference of the tree-based estimators of treatment effects, we adopt honest estimation to separate tree-building process and inference process. In the simulation, we show that the GCT algorithm has distinct advantages in subgroup identification and gives estimation with higher accuracy compared with the other two benchmark methods. Additionally, we verify the effectiveness of statistical inference by GCT.
    • We provide a tree-based algorithm for subgroup identification that embeds a Robinson-style semiparametric model to estimate subgroup-level treatment effects.
    • The method is stepwise convex, computationally stable, efficient and scalable.
    • Both theoretical and simulation results verify the feasibility of our method.

  • loading
  • [1]
    Altman N, Krzywinski M. Association, correlation and causation. Nature Methods, 2015, 12 (10): 899–900. doi: 10.1038/nmeth.3587
    [2]
    Zhang L, Zou H, Zhao Y, et al. Association between blood circulating vitamin D and colorectal cancer risk in Asian countries: A systematic review and dose-response meta-analysis. BMJ Open, 2019, 9 (12): e030513. doi: 10.1136/bmjopen-2019-030513
    [3]
    Athey S, Tibshirani J, Wager S. Generalized random forests. The Annals of Statistics, 2019, 47 (2): 1148–1178. doi: 10.1214/18-AOS1709
    [4]
    Künzel S R, Sekhon J S, Bickel P J, et al. Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National Academy of Sciences, 2019, 116 (10): 4156–4165. doi: 10.1073/pnas.1804597116
    [5]
    Athey S, Imbens G. Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 2016, 113 (27): 7353–7360. doi: 10.1073/pnas.1510489113
    [6]
    Robinson P M. Root-N-consistent semiparametric regression. Econometrica, 1988: 931–954. doi: 10.2307/1912705
    [7]
    Wager S, Athey S. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 2018, 113 (523): 1228–1242. doi: 10.1080/01621459.2017.1319839
    [8]
    Fan Y, Lv J, Wang J. DNN: A two-scale distributional tale of heterogeneous treatment effect inference. SSRN 3238897, 2018.
    [9]
    Johansson F, Shalit U, Sontag D. Learning representations for counterfactual inference. In: Proceedings of the 33rd International Conference on Machine Learning. New York: PMLR, 2016: 3020–3029.
    [10]
    Shalit U, Johansson F D, Sontag D. Estimating individual treatment effect: Generalization bounds and algorithms. In: Proceedings of the 34th International Conference on Machine Learning. Sydney: PMLR, 2017: 3076–3085.
    [11]
    Zhang Z, Lan Q, Ding L, et al. Reducing selection bias in counterfactual reasoning for individual treatment effects estimation. arXiv: 1912.09040, 2019.
    [12]
    Atan O, Jordon J, van der Schaar M. Deep-treat: Learning optimal personalized treatments from observational data using neural networks. In: Thirty-Second AAAI Conference on Artificial Intelligence. Palo Alto, CA: Association for the Advancement of Artificial Intelligence, 2018: 2071–2078.
    [13]
    Su X, Tsai C L, Wang H, et al. Subgroup analysis via recursive partitioning. Journal of Machine Learning Research, 2009, 10: 141–158. doi: 10.5555/1577069.1577074
    [14]
    Yang J, Dahabreh I J, Steingrimsson J A. Causal interaction trees: Finding subgroups with heterogeneous treatment effects in observational data. Biometrics, 2022, 78 (2): 624–635. doi: 10.1111/biom.13432
    [15]
    Foster J C, Taylor J M, Ruberg S J. Subgroup identification from randomized clinical trial data. Statistics in Medicine, 2011, 30 (24): 2867–2880. doi: 10.1002/sim.4322
    [16]
    Breiman L, Friedman J, Olshen R, et al. Classification and regression trees. Belmont, CA: Wadsworth International Group, 1984, 37(15): 237–251.
    [17]
    Chernozhukov V, Demirer M, Duflo E, et al. Generic machine learning inference on heterogeneous treatment effects in randomized experiments, with an application to immunization in India. Cambridge, MA: National Bureau of Economic Research, 2018.
    [18]
    Park C, Kang H. A groupwise approach for inferring heterogeneous treatment effects in causal inference. arXiv: 1908.04427, 2019.
    [19]
    Rubin D B. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 1974, 66 (5): 688–701. doi: 10.1037/h0037350
    [20]
    Imbens G W, Rubin D B. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge, UK: Cambridge University Press, 2015.
    [21]
    Hernán M A, Robins J M. Causal Inference: What If. Boca Raton, FL: Chapman & Hall/CRC, 2020.
    [22]
    Chernozhukov V, Chetverikov D, Demirer M, et al. Double/ debiased/neyman machine learning of treatment effects. American Economic Review, 2017, 107 (5): 261–65. doi: 10.1257/aer.p20171038
    [23]
    Nie X, Wager S. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 2021, 108 (2): 299–319. doi: 10.1093/biomet/asaa076
    [24]
    Berk R, Brown L, Buja A, et al. Valid post-selection inference. The Annals of Statistics, 2013, 41 (2): 802–837. doi: 10.1214/12-AOS1077
    [25]
    Lee J D, Sun D L, Sun Y, et al. Exact post-selection inference, with application to the lasso. The Annals of Statistics, 2016, 44 (3): 907–927. doi: 10.1214/15-AOS1371
    [26]
    Fithian W, Sun D, Taylor J. Optimal inference after model selection. arXiv: 1410.2597, 2014.
    [27]
    Hothorn T, Bretz F, Westfall P. Simultaneous inference in general parametric models. Biometrical Journal, 2008, 50 (3): 346–363. doi: 10.1002/bimj.200810425
    [28]
    Guerrero E G. Enhancing access and retention in substance abuse treatment: the role of medicaid payment acceptance and cultural competence. Drug and Alcohol Dependence, 2013, 132 (3): 555–561. doi: 10.1016/j.drugalcdep.2013.04.005
    [29]
    Kong Y, Zhou J, Zheng Z, et al. Using machine learning to advance disparities research: Subgroup analyses of access to opioid treatment. Health Services Research, 2022, 57 (2): 411–421. doi: 10.1111/1475-6773.13896
  • 加载中

Catalog

    Figure  1.  The distribution of MSE over 1000 simulated datasets in heterogeneous (left) and homogeneous settings (right).

    Figure  2.  The MSE curves of three tree-based algorithms with sample sizes increased from 1000 to 20000. The curves of CIT-DR-fitBe and CIT-DR-fitIn almost overlap in a homogeneous setting.

    Figure  3.  The final tree model for TEDS-A (Maryland, 2015). The square bracket under each node is the confidence interval for this subgroup.

    Figure  4.  The distribution of days waiting to OUD treatment in four subgroups.

    [1]
    Altman N, Krzywinski M. Association, correlation and causation. Nature Methods, 2015, 12 (10): 899–900. doi: 10.1038/nmeth.3587
    [2]
    Zhang L, Zou H, Zhao Y, et al. Association between blood circulating vitamin D and colorectal cancer risk in Asian countries: A systematic review and dose-response meta-analysis. BMJ Open, 2019, 9 (12): e030513. doi: 10.1136/bmjopen-2019-030513
    [3]
    Athey S, Tibshirani J, Wager S. Generalized random forests. The Annals of Statistics, 2019, 47 (2): 1148–1178. doi: 10.1214/18-AOS1709
    [4]
    Künzel S R, Sekhon J S, Bickel P J, et al. Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National Academy of Sciences, 2019, 116 (10): 4156–4165. doi: 10.1073/pnas.1804597116
    [5]
    Athey S, Imbens G. Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 2016, 113 (27): 7353–7360. doi: 10.1073/pnas.1510489113
    [6]
    Robinson P M. Root-N-consistent semiparametric regression. Econometrica, 1988: 931–954. doi: 10.2307/1912705
    [7]
    Wager S, Athey S. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 2018, 113 (523): 1228–1242. doi: 10.1080/01621459.2017.1319839
    [8]
    Fan Y, Lv J, Wang J. DNN: A two-scale distributional tale of heterogeneous treatment effect inference. SSRN 3238897, 2018.
    [9]
    Johansson F, Shalit U, Sontag D. Learning representations for counterfactual inference. In: Proceedings of the 33rd International Conference on Machine Learning. New York: PMLR, 2016: 3020–3029.
    [10]
    Shalit U, Johansson F D, Sontag D. Estimating individual treatment effect: Generalization bounds and algorithms. In: Proceedings of the 34th International Conference on Machine Learning. Sydney: PMLR, 2017: 3076–3085.
    [11]
    Zhang Z, Lan Q, Ding L, et al. Reducing selection bias in counterfactual reasoning for individual treatment effects estimation. arXiv: 1912.09040, 2019.
    [12]
    Atan O, Jordon J, van der Schaar M. Deep-treat: Learning optimal personalized treatments from observational data using neural networks. In: Thirty-Second AAAI Conference on Artificial Intelligence. Palo Alto, CA: Association for the Advancement of Artificial Intelligence, 2018: 2071–2078.
    [13]
    Su X, Tsai C L, Wang H, et al. Subgroup analysis via recursive partitioning. Journal of Machine Learning Research, 2009, 10: 141–158. doi: 10.5555/1577069.1577074
    [14]
    Yang J, Dahabreh I J, Steingrimsson J A. Causal interaction trees: Finding subgroups with heterogeneous treatment effects in observational data. Biometrics, 2022, 78 (2): 624–635. doi: 10.1111/biom.13432
    [15]
    Foster J C, Taylor J M, Ruberg S J. Subgroup identification from randomized clinical trial data. Statistics in Medicine, 2011, 30 (24): 2867–2880. doi: 10.1002/sim.4322
    [16]
    Breiman L, Friedman J, Olshen R, et al. Classification and regression trees. Belmont, CA: Wadsworth International Group, 1984, 37(15): 237–251.
    [17]
    Chernozhukov V, Demirer M, Duflo E, et al. Generic machine learning inference on heterogeneous treatment effects in randomized experiments, with an application to immunization in India. Cambridge, MA: National Bureau of Economic Research, 2018.
    [18]
    Park C, Kang H. A groupwise approach for inferring heterogeneous treatment effects in causal inference. arXiv: 1908.04427, 2019.
    [19]
    Rubin D B. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 1974, 66 (5): 688–701. doi: 10.1037/h0037350
    [20]
    Imbens G W, Rubin D B. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge, UK: Cambridge University Press, 2015.
    [21]
    Hernán M A, Robins J M. Causal Inference: What If. Boca Raton, FL: Chapman & Hall/CRC, 2020.
    [22]
    Chernozhukov V, Chetverikov D, Demirer M, et al. Double/ debiased/neyman machine learning of treatment effects. American Economic Review, 2017, 107 (5): 261–65. doi: 10.1257/aer.p20171038
    [23]
    Nie X, Wager S. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 2021, 108 (2): 299–319. doi: 10.1093/biomet/asaa076
    [24]
    Berk R, Brown L, Buja A, et al. Valid post-selection inference. The Annals of Statistics, 2013, 41 (2): 802–837. doi: 10.1214/12-AOS1077
    [25]
    Lee J D, Sun D L, Sun Y, et al. Exact post-selection inference, with application to the lasso. The Annals of Statistics, 2016, 44 (3): 907–927. doi: 10.1214/15-AOS1371
    [26]
    Fithian W, Sun D, Taylor J. Optimal inference after model selection. arXiv: 1410.2597, 2014.
    [27]
    Hothorn T, Bretz F, Westfall P. Simultaneous inference in general parametric models. Biometrical Journal, 2008, 50 (3): 346–363. doi: 10.1002/bimj.200810425
    [28]
    Guerrero E G. Enhancing access and retention in substance abuse treatment: the role of medicaid payment acceptance and cultural competence. Drug and Alcohol Dependence, 2013, 132 (3): 555–561. doi: 10.1016/j.drugalcdep.2013.04.005
    [29]
    Kong Y, Zhou J, Zheng Z, et al. Using machine learning to advance disparities research: Subgroup analyses of access to opioid treatment. Health Services Research, 2022, 57 (2): 411–421. doi: 10.1111/1475-6773.13896

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return