| Title: | Propensity Score Predictive Inference for Generalizability |
|---|---|
| Description: | Provides a suite of Propensity Score Predictive Inference (PSPI) methods to generalize treatment effects in trials to target populations. The package includes an existing model Bayesian Causal Forest (BCF) and four PSPI models (BCF-PS, FullBART, SplineBART, DSplineBART). These methods leverage Bayesian Additive Regression Trees (BART) to adjust for high-dimensional covariates and nonlinear associations, while SplineBART and DSplineBART further use propensity score based splines to address covariate shift between trial data and target population. |
| Authors: | Jungang Zou [aut, cre], Qixuan Chen [aut], Joseph Schwartz [aut], Nathalie Moise [aut], Roderick Little [aut], Robert McCulloch [ctb], Rodney Sparapani [ctb], Charles Spanbauer [ctb], Robert Gramacy [ctb], Jean-Sebastien Roy [ctb] |
| Maintainer: | Jungang Zou <[email protected]> |
| License: | GPL-2 |
| Version: | 1.2 |
| Built: | 2026-06-01 06:47:46 UTC |
| Source: | https://github.com/zjg540066169/pspi |
Provides a suite of Propensity Score Predictive Inference (PSPI) methods to generalize treatment effects in trials to target populations. The package includes an existing model Bayesian Causal Forest (BCF) and four PSPI models (BCF-PS, FullBART, SplineBART, DSplineBART). These methods leverage Bayesian Additive Regression Trees (BART) to adjust for high-dimensional covariates and nonlinear associations, while SplineBART and DSplineBART further use propensity score based splines to address covariate shift between trial data and target population.
PSPI provides Bayesian methods for generalizing treatment effects from clinical trials to target populations. It implements five models-BCF, BCF_P, FullBART, SplineBART, and DSplineBART-built on Bayesian
Additive Regression Trees (BART). Spline-based variants (SplineBART and DSplineBART) use propensity score transformations and spline terms to handle covariate shift between datasets.
Core computations rely on efficient MCMC routines implemented in C++.
This package modifies and extends C++ code originally derived from the BART3 package, developed by Rodney Sparapani, which is licensed under the GNU General Public License version 2 (GPL-2).
The modified code is redistributed in accordance with the GPL-2 license. For more details on the modifications, see the package's documentation.
BART3 package: https://github.com/rsparapa/bnptools/tree/master, originally developed by Rodney Sparapani.
sim <- sim_data(scenario = "linear", n_trial = 60) fit <- PSPI_generalizability( X = as.matrix(sim$trials[, paste0("X", 1:10)]), Y = sim$trials$Y, A = sim$trials$A, pi = sim$population$ps[sim$population$selected], X_pop = as.matrix(sim$population[, paste0("X", 1:10)]), pi_pop = sim$population$ps, model = "SplineBART", transformation = "InvGumbel", verbose = FALSE, nburn = 1, npost = 1 ) str(fit)sim <- sim_data(scenario = "linear", n_trial = 60) fit <- PSPI_generalizability( X = as.matrix(sim$trials[, paste0("X", 1:10)]), Y = sim$trials$Y, A = sim$trials$A, pi = sim$population$ps[sim$population$selected], X_pop = as.matrix(sim$population[, paste0("X", 1:10)]), pi_pop = sim$population$ps, model = "SplineBART", transformation = "InvGumbel", verbose = FALSE, nburn = 1, npost = 1 ) str(fit)
This is the main function of the PSPI package. It runs Bayesian models that generalize findings from a clinical trial to a target population, estimating the average treatment effects and potential outcomes. Propensity scores of trial participation play the central role for generalizability analysis. When covariate shift is an issue, we recommend PSPI-SplineBART and PSPI-DSplineBART, which leveraging Bayesian Additive Regression Trees (BART) to model high-dimensional covariates, and propensity scores based splines to extrapolate smoothly.
Users provide trial data (covariates, outcomes, treatment, and propensity scores) along with population-level covariates and propensity scores. Propensity scores can be the true values or estimated from some models. The function then performs Monte Carlo Markov chain (MCMC) for the posterior inference.
PSPI_generalizability( X, Y, A, pi, X_pop, pi_pop, model, transformation = "InvGumbel", nburn = 4000, npost = 4000, n_knots_main = NULL, n_knots_inter = NULL, order_main = 3, order_inter = 3, ntrees_s = 200, verbose = FALSE, seed = NULL )PSPI_generalizability( X, Y, A, pi, X_pop, pi_pop, model, transformation = "InvGumbel", nburn = 4000, npost = 4000, n_knots_main = NULL, n_knots_inter = NULL, order_main = 3, order_inter = 3, ntrees_s = 200, verbose = FALSE, seed = NULL )
X |
Matrix of covariates for the trial data. |
Y |
Numeric vector of observed outcomes in the trial. |
A |
Binary vector of treatment assignments (0 = control, 1 = intervention). |
pi |
Numeric vector of trial propensity scores (probability of trial participation). |
X_pop |
Matrix of covariates for the target population data. |
pi_pop |
Numeric vector of the target population propensity scores. |
model |
Character string specifying which PSPI model to use (see Details). |
transformation |
Character string indicating the transformation applied to the
propensity scores. Options are |
nburn |
Number of burn-in iterations (default = 4000). |
npost |
Number of posterior iterations saved after burn-in (default = 4000). |
n_knots_main, n_knots_inter
|
Number of spline knots for main and interaction effects.
If |
order_main, order_inter
|
Order of spline basis functions (default = 3).
|
ntrees_s |
Number of trees used for the BART component (default = 200). |
verbose |
Logical; if TRUE, prints progress messages. |
seed |
Optional random seed for reproducibility. |
Model choices
The model argument selects the type of PSPI model to be fitted:
"BCF" – Bayesian Causal Forests (Hahn et al., 2020).
"BCF_P" – BCF with the propensity score as an additional predictor.
"FullBART" – Uses three BARTs to estimate treatment effects.
"SplineBART" – Incorporates a natural cubic spline for heterogeneous treatment effects.
"DSplineBART" – Adds another natural cubic spline for the prognostic score.
Propensity score transformations
Since splines are sensitive to scales of predictor, robust transformation is needed.
The propensity scores (pi for trial, pi_pop for population) can be
optionally transformed before modeling using one of the following:
"Identity" – uses the raw propensity scores directly (no transformation).
"Logit" – applies the logit transform: .
"Cloglog" – complementary log–log transform: .
"InvGumbel" – inverse Gumbel transform: . Default choice.
Users can experiment with different transformations to assess model sensitivity.
Spline settings
Spline-based models ("SplineBART" and "DSplineBART") allow flexible
extrapolation to address covariate shift. The number and order of spline basis functions can be
customized through the following parameters:
n_knots_inter, order_inter: number and order of spline knots for
treatment-interaction effects. Available for both SplineBART and
DSplineBART.
n_knots_main, order_main: number and order of spline knots for
main effects. Available only for DSplineBART.
If any of these are left as NULL, default values are chosen automatically based
on the cube root of the sample size (ensuring a reasonable smoothness level).
A list containing posterior samples and model summaries produced by the C++ sampler. Typical elements include:
Each row is a posterior draw for individual potential outcome under treatment
Each row is a posterior draw for individual potential outcome under control
Each row is a posterior draw for individual treatment effects
This function utilizes modified C++ code originally derived from the BART3 package (Bayesian Additive Regression Trees). The original package was developed by Rodney Sparapani and is licensed under GPL-2. Modifications were made by Jungang Zou, 2024. For more information about the original BART3 package, see: https://github.com/rsparapa/bnptools/tree/master/BART3
# Example with simulated data sim <- sim_data(scenario = "linear", n_trial = 60) fit <- PSPI_generalizability( X = as.matrix(sim$trials[, paste0("X", 1:10)]), Y = sim$trials$Y, A = sim$trials$A, pi = sim$population$ps[sim$population$selected], X_pop = as.matrix(sim$population[, paste0("X", 1:10)]), pi_pop = sim$population$ps, model = "SplineBART", transformation = "InvGumbel", verbose = FALSE, nburn = 1, npost = 1 ) str(fit)# Example with simulated data sim <- sim_data(scenario = "linear", n_trial = 60) fit <- PSPI_generalizability( X = as.matrix(sim$trials[, paste0("X", 1:10)]), Y = sim$trials$Y, A = sim$trials$A, pi = sim$population$ps[sim$population$selected], X_pop = as.matrix(sim$population[, paste0("X", 1:10)]), pi_pop = sim$population$ps, model = "SplineBART", transformation = "InvGumbel", verbose = FALSE, nburn = 1, npost = 1 ) str(fit)
Generates a finite population of size 1000 with seven continuous and three
binary covariates, constructs potential outcomes Y0 and Y1
according to the chosen scenario, simulates trial participation through a
logistic selection model calibrated to target n_trial = 200 or 60,
and returns both the target population and the randomized trial
(with treatment assigned at probability prop).
sim_data(n_trial = 200, scenario = "linear", seed = NULL, prop = 0.5)sim_data(n_trial = 200, scenario = "linear", seed = NULL, prop = 0.5)
n_trial |
Integer. Target trial size; must be |
scenario |
Character. One of |
seed |
Optional integer seed for reproducibility. If |
prop |
Numeric in |
A list with two data frames:
columns X1:X10, potential outcomes Y1 and Y0,
selected (logical), and ps (true propensity scores of trial participation).
columns X1:X10, A, and observed Y.
set.seed(2025) sim <- sim_data(n_trial = 200, scenario = "nonlinear", prop = 0.5) str(sim$population) table(sim$trials$A) # treatment allocation mean(sim$population$selected) # selection rate # A smaller trial size and linear scenario with covariate shift sim2 <- sim_data(n_trial = 60, scenario = "linear+covariate shift", seed = 1, prop = 0.6) nrow(sim2$trials)set.seed(2025) sim <- sim_data(n_trial = 200, scenario = "nonlinear", prop = 0.5) str(sim$population) table(sim$trials$A) # treatment allocation mean(sim$population$selected) # selection rate # A smaller trial size and linear scenario with covariate shift sim2 <- sim_data(n_trial = 60, scenario = "linear+covariate shift", seed = 1, prop = 0.6) nrow(sim2$trials)