Skip to contents

Introduction

In the previous vignette we have seen how we can use the CohortCharacteristics package to summarise a set of pre-specified characteristics of a study cohort. These characteristics included patient demographics like age and sex, and also concept sets and cohorts that we defined. Another, often complimentary, way that we can approach characterising a study cohort is by simply summarising all clinical events we see for them in some window around their index date (cohort entry).

To show how large scale characterisation can work we’ll first create a first-ever ankle sprain study cohort using the Eunomia synthetic data.

library(CDMConnector)
library(dplyr)
library(ggplot2)
library(CohortCharacteristics)

con <- DBI::dbConnect(duckdb::duckdb(),
  dbdir = CDMConnector::eunomia_dir()
)
cdm <- CDMConnector::cdm_from_con(con,
  cdm_schem = "main",
  write_schema = "main"
)

cdm <- generateConceptCohortSet(
  cdm = cdm,
  name = "ankle_sprain",
  conceptSet = list("ankle_sprain" = 81151),
  end = "event_end_date",
  limit = "first",
  overwrite = TRUE
)

Large scale characteristics of study cohorts

To summarise our cohort of individuals with an ankle sprain we will look at their records in three tables of the OMOP CDM (condition_occurrence, procedure_occurrence, and drug_exposure) over two time windows (any time prior to their index date, and on index date). For conditions and procedures we will identify whether someone had a new record starting in the time window. Meanwhile, for drug exposures we will consider whether they had a new or ongoing record in the period.

Lastly, but important to note, we are only going to only return results for concepts for which at least 10% of the study cohort had a record.

lsc <- cdm$ankle_sprain |>
  summariseLargeScaleCharacteristics(
    window = list(c(-Inf, -1), c(0, 0)),
    eventInWindow = c(
      "condition_occurrence",
      "procedure_occurrence"
    ),
    episodeInWindow = "drug_exposure",
    minimumFrequency = 0.1
  )

tableLargeScaleCharacteristics(lsc)
CDM name
Synthea synthetic health database
Cohort name
Variable name Variable level Estimate name Concept id ankle_sprain
window_name
condition_occurrence; event; standard
Streptococcal sore throat -inf to -1 N(%) 28060 499(36.77%)
Sprain of wrist -inf to -1 N(%) 78272 148(10.91%)
Osteoarthritis -inf to -1 N(%) 80180 283(20.85%)
Chronic sinusitis -inf to -1 N(%) 257012 162(11.94%)
Acute bronchitis -inf to -1 N(%) 260139 767(56.52%)
Otitis media -inf to -1 N(%) 372328 909(66.99%)
Concussion with no loss of consciousness -inf to -1 N(%) 378001 185(13.63%)
Acute viral pharyngitis -inf to -1 N(%) 4112343 845(62.27%)
Whiplash injury to neck -inf to -1 N(%) 4218389 137(10.10%)
Sinusitis -inf to -1 N(%) 4283893 166(12.23%)
Acute bacterial sinusitis -inf to -1 N(%) 4294548 168(12.38%)
Viral sinusitis -inf to -1 N(%) 40481087 981(72.29%)
Sprain of ankle 0 to 0 N(%) 81151 1,357(100.00%)
procedure_occurrence; event; standard
Suture open wound -inf to -1 N(%) 4125906 363(26.75%)
Sputum examination -inf to -1 N(%) 4151422 282(20.78%)
Plain chest X-ray -inf to -1 N(%) 4163872 137(10.10%)
Bone immobilization -inf to -1 N(%) 4170947 356(26.23%)
drug_exposure; episode; standard
celecoxib -inf to -1 N(%) 1118084 189(13.93%)
Acetaminophen 160 MG Oral Tablet -inf to -1 N(%) 1127078 559(41.19%)
Acetaminophen 325 MG Oral Tablet -inf to -1 N(%) 1127433 737(54.31%)
Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet -inf to -1 N(%) 1713671 499(36.77%)
Penicillin G 375 MG/ML Injectable Solution -inf to -1 N(%) 19006318 384(28.30%)
Aspirin 81 MG Oral Tablet -inf to -1 N(%) 19059056 842(62.05%)
Ampicillin 100 MG/ML Injectable Solution -inf to -1 N(%) 19129655 193(14.22%)
Penicillin V Potassium 250 MG Oral Tablet -inf to -1 N(%) 19133873 491(36.18%)
poliovirus vaccine, inactivated -inf to -1 N(%) 40213160 994(73.25%)
tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use -inf to -1 N(%) 40213227 288(21.22%)
hepatitis B vaccine, adult dosage -inf to -1 N(%) 40213306 226(16.65%)
Haemophilus influenzae type b vaccine, PRP-OMP conjugate -inf to -1 N(%) 40213314 210(15.48%)
Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution -inf to -1 N(%) 40229134 296(21.81%)
Doxycycline Monohydrate 50 MG Oral Tablet -inf to -1 N(%) 46233988 172(12.68%)
Acetaminophen 160 MG Oral Tablet 0 to 0 N(%) 1127078 199(14.66%)
Acetaminophen 325 MG Oral Tablet 0 to 0 N(%) 1127433 330(24.32%)
Aspirin 81 MG Oral Tablet 0 to 0 N(%) 19059056 470(34.64%)
Ibuprofen 200 MG Oral Tablet 0 to 0 N(%) 19078461 192(14.15%)

As we can see we have identified numerous concepts for which at least 10% of our study population had a record. Often with larger cohorts and real patient-level data we will obtain many times more results when running large scale characterisation. One option we have to help summarise our results is to pick out the most frequent concepts. Here, for example, we select the top 5 concepts.

tableLargeScaleCharacteristics(lsc,
  topConcepts = 5
)
CDM name
Synthea synthetic health database
Cohort name
Variable name Variable level Estimate name Concept id ankle_sprain
window_name
condition_occurrence; event; standard
Otitis media -inf to -1 N(%) 372328 909(66.99%)
Acute viral pharyngitis -inf to -1 N(%) 4112343 845(62.27%)
Viral sinusitis -inf to -1 N(%) 40481087 981(72.29%)
Sprain of ankle 0 to 0 N(%) 81151 1,357(100.00%)
drug_exposure; episode; standard
poliovirus vaccine, inactivated -inf to -1 N(%) 40213160 994(73.25%)

Stratified large scale characteristics

Like when summarising pre-specified patient characteristics, we can also get stratified results when summarising large scale characteristics. Here, for example, large scale characteristics are stratified by sex (which we add as an additional column to our cohort table using the PatientProfiles package).

lsc <- cdm$ankle_sprain |>
  PatientProfiles::addSex() |>
  summariseLargeScaleCharacteristics(
    window = list(c(-Inf, -1), c(0, 0)),
    strata = list("sex"),
    eventInWindow = "drug_exposure",
    minimumFrequency = 0.1
  )

tableLargeScaleCharacteristics(lsc)
CDM name
Synthea synthetic health database
Cohort name
ankle_sprain
Sex
Variable name Variable level Estimate name Concept id overall Female Male
window_name window_name window_name
drug_exposure; event; standard
celecoxib -inf to -1 N(%) 1118084 189(13.93%) 92(13.47%) 97(14.39%)
Acetaminophen 160 MG Oral Tablet -inf to -1 N(%) 1127078 559(41.19%) 292(42.75%) 267(39.61%)
Acetaminophen 325 MG Oral Tablet -inf to -1 N(%) 1127433 737(54.31%) 374(54.76%) 363(53.86%)
Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet -inf to -1 N(%) 1713671 499(36.77%) 244(35.72%) 255(37.83%)
Penicillin G 375 MG/ML Injectable Solution -inf to -1 N(%) 19006318 384(28.30%) 169(24.74%) 215(31.90%)
Aspirin 81 MG Oral Tablet -inf to -1 N(%) 19059056 842(62.05%) 427(62.52%) 415(61.57%)
Ampicillin 100 MG/ML Injectable Solution -inf to -1 N(%) 19129655 193(14.22%) 98(14.35%) 95(14.09%)
Penicillin V Potassium 250 MG Oral Tablet -inf to -1 N(%) 19133873 491(36.18%) 256(37.48%) 235(34.87%)
poliovirus vaccine, inactivated -inf to -1 N(%) 40213160 994(73.25%) 501(73.35%) 493(73.15%)
tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use -inf to -1 N(%) 40213227 288(21.22%) 151(22.11%) 137(20.33%)
hepatitis B vaccine, adult dosage -inf to -1 N(%) 40213306 226(16.65%) 128(18.74%) 98(14.54%)
Haemophilus influenzae type b vaccine, PRP-OMP conjugate -inf to -1 N(%) 40213314 210(15.48%) 112(16.40%) 98(14.54%)
Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution -inf to -1 N(%) 40229134 296(21.81%) 132(19.33%) 164(24.33%)
Doxycycline Monohydrate 50 MG Oral Tablet -inf to -1 N(%) 46233988 172(12.68%) 94(13.76%) 78(11.57%)
Acetaminophen 160 MG Oral Tablet 0 to 0 N(%) 1127078 199(14.66%) 97(14.20%) 102(15.13%)
Acetaminophen 325 MG Oral Tablet 0 to 0 N(%) 1127433 330(24.32%) 165(24.16%) 165(24.48%)
Aspirin 81 MG Oral Tablet 0 to 0 N(%) 19059056 470(34.64%) 245(35.87%) 225(33.38%)
Ibuprofen 200 MG Oral Tablet 0 to 0 N(%) 19078461 192(14.15%) 93(13.62%) 99(14.69%)
Nitrofurantoin 5 MG/ML Oral Suspension -inf to -1 N(%) 920300 - 84(12.30%) -
{7 (Inert Ingredients 1 MG Oral Tablet) / 21 (Mestranol 0.05 MG / Norethindrone 1 MG Oral Tablet) } Pack [Norinyl 1+50 28 Day] -inf to -1 N(%) 19128065 - 135(19.77%) -
Phenazopyridine hydrochloride 100 MG Oral Tablet -inf to -1 N(%) 40236824 - 84(12.30%) -