Further survival analyses
a03_Further_survival_analyses.Rmd
Set up
Let’s first load the packages required.
We’ll create a cdm reference to use our example MGUS2 survival dataset. In practice you would use the CDMConnector package to connect to your data mapped to the OMOP CDM.
cdm <- CohortSurvival::mockMGUS2cdm()
The CohortSurvival package does not have implemented functionality to
do more complex survival analyses than Kaplar Meier curves, like Cox
Proportional Hazards modelling. However, the format the data has to be
in to be inputted to well-known modelling functions from packages like
survival
or cmprsk
can be retrieved from OMOP
data with some in-built functions in this package. Let’s see how to do
it in both single event and competing risk survival settings.
Further analysis with single event survival
To get the time
and status
information we
need for the coxph
function in the package
survival
, for instance, we only need to call
addCohortSurvival
. The stratification variables need to be
columns previously added to the cohort by the user.
input_survival_single <- cdm$mgus_diagnosis %>%
addCohortSurvival(
cdm = cdm,
outcomeCohortTable = "death_cohort",
outcomeCohortId = 1
)
input_survival_single %>%
glimpse()
#> Rows: ??
#> Columns: 13
#> Database: DuckDB v1.1.3-dev165 [unknown@Linux 6.5.0-1025-azure:R 4.4.2/:memory:]
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ subject_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 14, 15, 16, 19, 2…
#> $ cohort_start_date <date> 1981-01-01, 1968-01-01, 1980-01-01, 1977-01-01, …
#> $ cohort_end_date <date> 1981-01-01, 1968-01-01, 1980-01-01, 1977-01-01, …
#> $ age <dbl> 88, 78, 94, 68, 90, 90, 89, 87, 79, 86, 80, 85, 9…
#> $ sex <fct> F, F, M, M, F, M, F, F, F, M, F, M, F, M, M, F, F…
#> $ hgb <dbl> 13.1, 11.5, 10.5, 15.2, 10.7, 12.9, 10.5, 12.3, 9…
#> $ creat <dbl> 1.3, 1.2, 1.5, 1.2, 0.8, 1.0, 0.9, 1.2, 1.1, 1.0,…
#> $ mspike <dbl> 0.5, 2.0, 2.6, 1.2, 1.0, 0.5, 1.3, 1.6, 2.3, 2.3,…
#> $ age_group <chr> ">=70", ">=70", ">=70", "<70", ">=70", ">=70", ">…
#> $ days_to_exit <int> 30, 25, 46, 92, 8, 4, 151, 2, 136, 2, 14, 18, 43,…
#> $ status <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ time <dbl> 30, 25, 46, 92, 8, 4, 151, 2, 136, 2, 14, 18, 43,…
This information should be enough to call any advanced function, like:
survival::coxph(survival::Surv(time, status) ~ age + sex, data = input_survival_single)
#> Call:
#> survival::coxph(formula = survival::Surv(time, status) ~ age +
#> sex, data = input_survival_single)
#>
#> coef exp(coef) se(coef) z p
#> age 0.061622 1.063561 0.003402 18.114 < 2e-16
#> sexM 0.358258 1.430835 0.065693 5.454 4.94e-08
#>
#> Likelihood ratio test=391.2 on 2 df, p=< 2.2e-16
#> n= 1384, number of events= 963
survival::survdiff(survival::Surv(time, status) ~ sex, data = input_survival_single)
#> Call:
#> survival::survdiff(formula = survival::Surv(time, status) ~ sex,
#> data = input_survival_single)
#>
#> N Observed Expected (O-E)^2/E (O-E)^2/V
#> sex=F 631 423 471 4.88 9.67
#> sex=M 753 540 492 4.67 9.67
#>
#> Chisq= 9.7 on 1 degrees of freedom, p= 0.002
Further analysis with competing risk survival
For competing risk, there is a similar function that adds
time
and status
information to the cohort of
interest. We only need to specify which are the outcome and competing
outcome of interest. We can also choose other options such as follow up
time, censoring on a specific date, washout periods, or others.
input_survival_cr <- cdm$mgus_diagnosis %>%
addCompetingRiskCohortSurvival(
cdm = cdm,
outcomeCohortTable = "progression",
outcomeCohortId = 1,
competingOutcomeCohortTable = "death_cohort",
competingOutcomeCohortId = 1
) %>%
glimpse()
#> Rows: 1,384
#> Columns: 13
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ subject_id <int> 56, 81, 124, 127, 147, 163, 165, 180, 186, 190, 1…
#> $ cohort_start_date <date> 1978-01-01, 1985-01-01, 1974-01-01, 1978-01-01, …
#> $ cohort_end_date <date> 1978-01-01, 1985-01-01, 1974-01-01, 1978-01-01, …
#> $ age <dbl> 78, 91, 73, 73, 58, 57, 80, 70, 76, 78, 54, 79, 6…
#> $ sex <fct> M, F, M, M, M, F, M, F, F, M, F, M, M, M, F, M, F…
#> $ hgb <dbl> 10.3, 5.9, 15.3, 12.4, 13.1, 12.2, 11.0, 14.3, 12…
#> $ creat <dbl> 3.0, 1.0, 1.2, 1.6, 1.1, 0.8, 1.4, 1.2, 0.9, 1.1,…
#> $ mspike <dbl> 1.9, 0.0, 1.7, 1.4, 0.8, 1.9, 2.0, 1.6, 1.9, 0.9,…
#> $ age_group <chr> ">=70", ">=70", ">=70", ">=70", "<70", "<70", ">=…
#> $ days_to_exit <int> 44, 21, 82, 60, 189, 260, 85, 107, 104, 101, 171,…
#> $ time <dbl> 29, 14, 80, 30, 188, 201, 76, 81, 51, 101, 153, 8…
#> $ status <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1…
We can use the package cmprsk
to fit a Fine and Gray
model to the competing risk data. We first change our sex
covariate to numeric.
input_survival_cr <- input_survival_cr %>%
dplyr::mutate(sex = dplyr::if_else(sex == "M", 0, 1))
covs <- data.frame(input_survival_cr$age, input_survival_cr$sex)
names(covs) <- c("age", "sex")
summary(cmprsk::crr(ftime = input_survival_cr$time,
fstatus = input_survival_cr$status,
cov1 = covs,
failcode = 1,
cencode = 0))
#> Competing Risks Regression
#>
#> Call:
#> cmprsk::crr(ftime = input_survival_cr$time, fstatus = input_survival_cr$status,
#> cov1 = covs, failcode = 1, cencode = 0)
#>
#> coef exp(coef) se(coef) z p-value
#> age -0.0192 0.981 0.00585 -3.28 0.001
#> sex 0.2871 1.333 0.19309 1.49 0.140
#>
#> exp(coef) exp(-coef) 2.5% 97.5%
#> age 0.981 1.02 0.970 0.992
#> sex 1.333 0.75 0.913 1.945
#>
#> Num. cases = 1384
#> Pseudo Log-likelihood = -726
#> Pseudo likelihood ratio test = 8.32 on 2 df,
Disconnect from the cdm database connection
cdm_disconnect(cdm)