Summarise large scale characteristics
Source:vignettes/a07_summariseLargeScaleCharacteristics.Rmd
a07_summariseLargeScaleCharacteristics.Rmd
Introduction
In the previous vignette we have seen how we can use the CohortCharacteristics package to summarise a set of pre-specified characteristics of a study cohort. These characteristics included patient demographics like age and sex, and also concept sets and cohorts that we defined. Another, often complimentary, way that we can approach characterising a study cohort is by simply summarising all clinical events we see for them in some window around their index date (cohort entry).
To show how large scale characterisation can work we’ll first create a first-ever ankle sprain study cohort using the Eunomia synthetic data.
library(CDMConnector)
library(dplyr)
library(ggplot2)
library(CohortCharacteristics)
con <- DBI::dbConnect(duckdb::duckdb(),
dbdir = CDMConnector::eunomia_dir()
)
cdm <- CDMConnector::cdm_from_con(con,
cdm_schem = "main",
write_schema = "main"
)
cdm <- generateConceptCohortSet(
cdm = cdm,
name = "ankle_sprain",
conceptSet = list("ankle_sprain" = 81151),
end = "event_end_date",
limit = "first",
overwrite = TRUE
)
Large scale characteristics of study cohorts
To summarise our cohort of individuals with an ankle sprain we will look at their records in three tables of the OMOP CDM (condition_occurrence, procedure_occurrence, and drug_exposure) over two time windows (any time prior to their index date, and on index date). For conditions and procedures we will identify whether someone had a new record starting in the time window. Meanwhile, for drug exposures we will consider whether they had a new or ongoing record in the period.
Lastly, but important to note, we are only going to only return results for concepts for which at least 10% of the study cohort had a record.
lsc <- cdm$ankle_sprain |>
summariseLargeScaleCharacteristics(
window = list(c(-Inf, -1), c(0, 0)),
eventInWindow = c(
"condition_occurrence",
"procedure_occurrence"
),
episodeInWindow = "drug_exposure",
minimumFrequency = 0.1
)
tableLargeScaleCharacteristics(lsc)
CDM name | ||||
---|---|---|---|---|
Synthea synthetic health database | ||||
Cohort name | ||||
Variable name | Variable level | Estimate name | Concept id | ankle_sprain |
window_name | ||||
condition_occurrence; event; standard | ||||
Streptococcal sore throat | -inf to -1 | N(%) | 28060 | 499(36.77%) |
Sprain of wrist | -inf to -1 | N(%) | 78272 | 148(10.91%) |
Osteoarthritis | -inf to -1 | N(%) | 80180 | 283(20.85%) |
Chronic sinusitis | -inf to -1 | N(%) | 257012 | 162(11.94%) |
Acute bronchitis | -inf to -1 | N(%) | 260139 | 767(56.52%) |
Otitis media | -inf to -1 | N(%) | 372328 | 909(66.99%) |
Concussion with no loss of consciousness | -inf to -1 | N(%) | 378001 | 185(13.63%) |
Acute viral pharyngitis | -inf to -1 | N(%) | 4112343 | 845(62.27%) |
Whiplash injury to neck | -inf to -1 | N(%) | 4218389 | 137(10.10%) |
Sinusitis | -inf to -1 | N(%) | 4283893 | 166(12.23%) |
Acute bacterial sinusitis | -inf to -1 | N(%) | 4294548 | 168(12.38%) |
Viral sinusitis | -inf to -1 | N(%) | 40481087 | 981(72.29%) |
Sprain of ankle | 0 to 0 | N(%) | 81151 | 1,357(100.00%) |
procedure_occurrence; event; standard | ||||
Suture open wound | -inf to -1 | N(%) | 4125906 | 363(26.75%) |
Sputum examination | -inf to -1 | N(%) | 4151422 | 282(20.78%) |
Plain chest X-ray | -inf to -1 | N(%) | 4163872 | 137(10.10%) |
Bone immobilization | -inf to -1 | N(%) | 4170947 | 356(26.23%) |
drug_exposure; episode; standard | ||||
celecoxib | -inf to -1 | N(%) | 1118084 | 189(13.93%) |
Acetaminophen 160 MG Oral Tablet | -inf to -1 | N(%) | 1127078 | 559(41.19%) |
Acetaminophen 325 MG Oral Tablet | -inf to -1 | N(%) | 1127433 | 737(54.31%) |
Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet | -inf to -1 | N(%) | 1713671 | 499(36.77%) |
Penicillin G 375 MG/ML Injectable Solution | -inf to -1 | N(%) | 19006318 | 384(28.30%) |
Aspirin 81 MG Oral Tablet | -inf to -1 | N(%) | 19059056 | 842(62.05%) |
Ampicillin 100 MG/ML Injectable Solution | -inf to -1 | N(%) | 19129655 | 193(14.22%) |
Penicillin V Potassium 250 MG Oral Tablet | -inf to -1 | N(%) | 19133873 | 491(36.18%) |
poliovirus vaccine, inactivated | -inf to -1 | N(%) | 40213160 | 994(73.25%) |
tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use | -inf to -1 | N(%) | 40213227 | 288(21.22%) |
hepatitis B vaccine, adult dosage | -inf to -1 | N(%) | 40213306 | 226(16.65%) |
Haemophilus influenzae type b vaccine, PRP-OMP conjugate | -inf to -1 | N(%) | 40213314 | 210(15.48%) |
Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution | -inf to -1 | N(%) | 40229134 | 296(21.81%) |
Doxycycline Monohydrate 50 MG Oral Tablet | -inf to -1 | N(%) | 46233988 | 172(12.68%) |
Acetaminophen 160 MG Oral Tablet | 0 to 0 | N(%) | 1127078 | 199(14.66%) |
Acetaminophen 325 MG Oral Tablet | 0 to 0 | N(%) | 1127433 | 330(24.32%) |
Aspirin 81 MG Oral Tablet | 0 to 0 | N(%) | 19059056 | 470(34.64%) |
Ibuprofen 200 MG Oral Tablet | 0 to 0 | N(%) | 19078461 | 192(14.15%) |
As we can see we have identified numerous concepts for which at least 10% of our study population had a record. Often with larger cohorts and real patient-level data we will obtain many times more results when running large scale characterisation. One option we have to help summarise our results is to pick out the most frequent concepts. Here, for example, we select the top 5 concepts.
tableLargeScaleCharacteristics(lsc,
topConcepts = 5
)
CDM name | ||||
---|---|---|---|---|
Synthea synthetic health database | ||||
Cohort name | ||||
Variable name | Variable level | Estimate name | Concept id | ankle_sprain |
window_name | ||||
condition_occurrence; event; standard | ||||
Otitis media | -inf to -1 | N(%) | 372328 | 909(66.99%) |
Acute viral pharyngitis | -inf to -1 | N(%) | 4112343 | 845(62.27%) |
Viral sinusitis | -inf to -1 | N(%) | 40481087 | 981(72.29%) |
Sprain of ankle | 0 to 0 | N(%) | 81151 | 1,357(100.00%) |
drug_exposure; episode; standard | ||||
poliovirus vaccine, inactivated | -inf to -1 | N(%) | 40213160 | 994(73.25%) |
Stratified large scale characteristics
Like when summarising pre-specified patient characteristics, we can also get stratified results when summarising large scale characteristics. Here, for example, large scale characteristics are stratified by sex (which we add as an additional column to our cohort table using the PatientProfiles package).
lsc <- cdm$ankle_sprain |>
PatientProfiles::addSex() |>
summariseLargeScaleCharacteristics(
window = list(c(-Inf, -1), c(0, 0)),
strata = list("sex"),
eventInWindow = "drug_exposure",
minimumFrequency = 0.1
)
tableLargeScaleCharacteristics(lsc)
CDM name | ||||||
---|---|---|---|---|---|---|
Synthea synthetic health database | ||||||
Cohort name | ||||||
ankle_sprain | ||||||
Sex | ||||||
Variable name | Variable level | Estimate name | Concept id | overall | Female | Male |
window_name | window_name | window_name | ||||
drug_exposure; event; standard | ||||||
celecoxib | -inf to -1 | N(%) | 1118084 | 189(13.93%) | 92(13.47%) | 97(14.39%) |
Acetaminophen 160 MG Oral Tablet | -inf to -1 | N(%) | 1127078 | 559(41.19%) | 292(42.75%) | 267(39.61%) |
Acetaminophen 325 MG Oral Tablet | -inf to -1 | N(%) | 1127433 | 737(54.31%) | 374(54.76%) | 363(53.86%) |
Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet | -inf to -1 | N(%) | 1713671 | 499(36.77%) | 244(35.72%) | 255(37.83%) |
Penicillin G 375 MG/ML Injectable Solution | -inf to -1 | N(%) | 19006318 | 384(28.30%) | 169(24.74%) | 215(31.90%) |
Aspirin 81 MG Oral Tablet | -inf to -1 | N(%) | 19059056 | 842(62.05%) | 427(62.52%) | 415(61.57%) |
Ampicillin 100 MG/ML Injectable Solution | -inf to -1 | N(%) | 19129655 | 193(14.22%) | 98(14.35%) | 95(14.09%) |
Penicillin V Potassium 250 MG Oral Tablet | -inf to -1 | N(%) | 19133873 | 491(36.18%) | 256(37.48%) | 235(34.87%) |
poliovirus vaccine, inactivated | -inf to -1 | N(%) | 40213160 | 994(73.25%) | 501(73.35%) | 493(73.15%) |
tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use | -inf to -1 | N(%) | 40213227 | 288(21.22%) | 151(22.11%) | 137(20.33%) |
hepatitis B vaccine, adult dosage | -inf to -1 | N(%) | 40213306 | 226(16.65%) | 128(18.74%) | 98(14.54%) |
Haemophilus influenzae type b vaccine, PRP-OMP conjugate | -inf to -1 | N(%) | 40213314 | 210(15.48%) | 112(16.40%) | 98(14.54%) |
Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution | -inf to -1 | N(%) | 40229134 | 296(21.81%) | 132(19.33%) | 164(24.33%) |
Doxycycline Monohydrate 50 MG Oral Tablet | -inf to -1 | N(%) | 46233988 | 172(12.68%) | 94(13.76%) | 78(11.57%) |
Acetaminophen 160 MG Oral Tablet | 0 to 0 | N(%) | 1127078 | 199(14.66%) | 97(14.20%) | 102(15.13%) |
Acetaminophen 325 MG Oral Tablet | 0 to 0 | N(%) | 1127433 | 330(24.32%) | 165(24.16%) | 165(24.48%) |
Aspirin 81 MG Oral Tablet | 0 to 0 | N(%) | 19059056 | 470(34.64%) | 245(35.87%) | 225(33.38%) |
Ibuprofen 200 MG Oral Tablet | 0 to 0 | N(%) | 19078461 | 192(14.15%) | 93(13.62%) | 99(14.69%) |
Nitrofurantoin 5 MG/ML Oral Suspension | -inf to -1 | N(%) | 920300 | - | 84(12.30%) | - |
{7 (Inert Ingredients 1 MG Oral Tablet) / 21 (Mestranol 0.05 MG / Norethindrone 1 MG Oral Tablet) } Pack [Norinyl 1+50 28 Day] | -inf to -1 | N(%) | 19128065 | - | 135(19.77%) | - |
Phenazopyridine hydrochloride 100 MG Oral Tablet | -inf to -1 | N(%) | 40236824 | - | 84(12.30%) | - |