Summarise cohort entries
Source:vignettes/summarise_cohort_entries.Rmd
summarise_cohort_entries.Rmd
Introduction
In this example we’re going to summarise the characteristics of individuals with an ankle sprain, ankle fracture, forearm fracture, or a hip fracture using the Eunomia synthetic data.
We’ll begin by creating our study cohorts.
library(duckdb)
library(CDMConnector)
library(dplyr, warn.conflicts = FALSE)
library(PatientProfiles)
library(CohortCharacteristics)
con <- dbConnect(duckdb(), dbdir = eunomiaDir())
cdm <- cdmFromCon(
con = con, cdmSchem = "main", writeSchema = "main", cdmName = "Eunomia"
)
cdm <- generateConceptCohortSet(
cdm = cdm,
name = "injuries",
conceptSet = list(
"ankle_sprain" = 81151,
"ankle_fracture" = 4059173,
"forearm_fracture" = 4278672,
"hip_fracture" = 4230399
),
end = "event_end_date",
limit = "all"
)
Summarising cohort counts
We can first quickly summarise and present the overall counts of our cohorts.
cohortCounts <- summariseCohortCount(cdm$injuries)
tableCohortCount(cohortCounts)
CDM name | Variable name | Estimate name |
Cohort name
|
|||
---|---|---|---|---|---|---|
ankle_sprain | ankle_fracture | forearm_fracture | hip_fracture | |||
Eunomia | Number records | N | 1,915 | 464 | 569 | 138 |
Number subjects | N | 1,357 | 427 | 510 | 132 |
Moreover, we can also easily stratify these counts. For example, here we add age groups and then stratify our counts by t We can summarise the overall counts of our cohorts.
cdm$injuries <- cdm$injuries |>
addAge(
ageGroup = list(c(0, 3), c(4, 17), c(18, Inf)),
name = "injuries"
)
cohortCounts <- summariseCohortCount(cdm[["injuries"]], strata = "age_group")
tableCohortCount(cohortCounts)
CDM name | Age group | Variable name | Estimate name |
Cohort name
|
|||
---|---|---|---|---|---|---|---|
ankle_sprain | ankle_fracture | forearm_fracture | hip_fracture | ||||
Eunomia | overall | Number records | N | 1,915 | 464 | 569 | 138 |
Number subjects | N | 1,357 | 427 | 510 | 132 | ||
0 to 3 | Number records | N | 202 | 49 | 51 | 7 | |
Number subjects | N | 196 | 49 | 51 | 7 | ||
18 or above | Number records | N | 1,047 | 213 | 268 | 88 | |
Number subjects | N | 847 | 204 | 249 | 83 | ||
4 to 17 | Number records | N | 666 | 202 | 250 | 43 | |
Number subjects | N | 597 | 195 | 239 | 43 |
We can also apply minimum cell count suppression to our cohort counts. In this case we will obscure any counts below 10.
cohortCounts <- cohortCounts |>
suppress(minCellCount = 10)
tableCohortCount(cohortCounts)
CDM name | Age group | Variable name | Estimate name |
Cohort name
|
|||
---|---|---|---|---|---|---|---|
ankle_sprain | ankle_fracture | forearm_fracture | hip_fracture | ||||
Eunomia | overall | Number records | N | 1,915 | 464 | 569 | 138 |
Number subjects | N | 1,357 | 427 | 510 | 132 | ||
0 to 3 | Number records | N | 202 | 49 | 51 | <10 | |
Number subjects | N | 196 | 49 | 51 | <10 | ||
18 or above | Number records | N | 1,047 | 213 | 268 | 88 | |
Number subjects | N | 847 | 204 | 249 | 83 | ||
4 to 17 | Number records | N | 666 | 202 | 250 | 43 | |
Number subjects | N | 597 | 195 | 239 | 43 |
Summarising cohort attrition
Say we specify two inclusion criteria. First, we keep only cohort entries after the year 2000. Second, we keep only cohort entries for those aged 18 or older. We can easily create plots summarising our cohort attrition.
cdm <- generateConceptCohortSet(
cdm = cdm,
name = "ankle_sprain",
conceptSet = list("ankle_sprain" = 81151),
end = "event_end_date",
limit = "all"
)
cdm$ankle_sprain <- cdm$ankle_sprain |>
filter(year(cohort_start_date) >= 2000) |>
compute(temporary = FALSE, name = "ankle_sprain") |>
recordCohortAttrition("Restrict to cohort_start_date >= 2000")
attritionSummary <- summariseCohortAttrition(cdm$ankle_sprain)
plotCohortAttrition(attritionSummary)
cdm$ankle_sprain <- cdm$ankle_sprain |>
addAge() |>
filter(age >= 18) |>
compute(temporary = FALSE, name = "ankle_sprain") |>
recordCohortAttrition("Restrict to age >= 18")
attritionSummary <- summariseCohortAttrition(cdm$ankle_sprain)
plotCohortAttrition(attritionSummary)
We could, of course, have applied these requirements the other way around.
cdm <- generateConceptCohortSet(
cdm = cdm,
name = "ankle_sprain",
conceptSet = list("ankle_sprain" = 81151),
end = "event_end_date",
limit = "all"
)
cdm$ankle_sprain <- cdm$ankle_sprain |>
addAge() |>
filter(age >= 18) |>
compute(temporary = FALSE, name = "ankle_sprain") |>
recordCohortAttrition("Restrict to age >= 18")
cdm$ankle_sprain <- cdm$ankle_sprain |>
filter(year(cohort_start_date) >= 2000) |>
compute(temporary = FALSE, name = "ankle_sprain") |>
recordCohortAttrition("Restrict to cohort_start_date >= 2000")
attritionSummary <- summariseCohortAttrition(cdm$ankle_sprain)
plotCohortAttrition(attritionSummary)
As well as plotting cohort attrition, we can also create a table of our results.
tableCohortAttrition(attritionSummary)
Reason |
Variable name
|
|||
---|---|---|---|---|
number_records | number_subjects | excluded_records | excluded_subjects | |
Eunomia; ankle_sprain | ||||
Initial qualifying events | 1,915 | 1,357 | 0 | 0 |
Restrict to age >= 18 | 1,047 | 847 | 868 | 510 |
Restrict to cohort_start_date >= 2000 | 454 | 420 | 593 | 427 |