Add drug related information to a Drug Utilisation cohort
Marti Catala, Mike Du, Yuchen Guo, Kim Lopez-Guell, Edward Burn, Xintong Li
2024-07-17
a04_addDrugInfo.Rmd
Introduction
The DrugUtilisation package includes a range of functions that add drug-related information of subjects in OMOP CDM tables and cohort tables. In this vignette, we will explore these functions and provide some examples for its usage.
Create mock data first
library(DrugUtilisation)
library(CDMConnector)
library(dplyr)
library(PatientProfiles)
cdm <- mockDrugUtilisation(numberIndividual = 200)
Create a drug utilisation cohort
We will use Acetaminophen as our example drug to construct
our drug utilisation cohort. To begin, we will employ
getDrugIngredientCodes()
function from CodelistGenerator to
generate a concept list associated with Acetaminophen.
conceptList <- getDrugIngredientCodes(cdm, c("acetaminophen"))
conceptList
#>
#> - 161_acetaminophen (4 codes)
Next, we create a drug utilisation cohort by using the
conceptList
with the
generateDrugUtilisationCohortSet()
function. For a better
understanding of the arguments and functionalities of
generateDrugUtilisationCohortSet()
, please refer to the
Use DrugUtilisation to create a cohort vignette.
cdm <- generateDrugUtilisationCohortSet(
cdm = cdm,
name = "acetaminophen_example1",
conceptSet = conceptList
)
Adding routes with addRoute() function
addRoute()
function utilises an internal CSV file
containing all possible routes for various drug dose forms supported by
the package. The function is designed to seamlessly incorporate route
information into your drug table for the supported dose forms. See the
example below to know how it works.
cdm[["drug_exposure"]] |>
addRoute()
#> # Source: SQL [?? x 8]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#> drug_exposure_id person_id drug_concept_id drug_exposure_start_date
#> <int> <int> <dbl> <date>
#> 1 1 1 1503328 2022-01-11
#> 2 2 1 2905077 2021-10-26
#> 3 3 1 1539463 2021-08-10
#> 4 4 1 1516980 2021-12-20
#> 5 5 2 1516978 2013-05-09
#> 6 6 2 2905077 2013-01-20
#> 7 7 3 1125360 2013-05-25
#> 8 9 3 1516978 2010-07-16
#> 9 10 3 2905077 2013-01-26
#> 10 11 4 1125360 2003-12-11
#> # ℹ more rows
#> # ℹ 4 more variables: drug_exposure_end_date <date>,
#> # drug_type_concept_id <dbl>, quantity <dbl>, route <chr>
Generating patterns with patternTable() function
The patternTable()
function in the DrugUtilisation
package is a powerful tool for deriving patterns from a drug strength
table. This function extracts distinct patterns, associating them with
pattern_id
and formula_id
. The resulting
tibble provides the following data:
-
number_concepts
: the count of distinct concepts in the patterns. -
number_ingredients
: the count of distinct ingredients involved. -
number_records
: the overall count of records in the patterns.
Moreover, the tibble includes a column indicating potentially valid and invalid combinations.
patternTable(cdm)
#> # A tibble: 5 × 12
#> pattern_id formula_name validity number_concepts number_ingredients
#> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 9 fixed amount formulati… pattern… 7 4
#> 2 18 concentration formulat… pattern… 1 1
#> 3 24 concentration formulat… pattern… 1 1
#> 4 40 concentration formulat… pattern… 1 1
#> 5 NA NA no patt… 4 4
#> # ℹ 7 more variables: number_records <dbl>, amount_numeric <dbl>,
#> # amount_unit_concept_id <dbl>, numerator_numeric <dbl>,
#> # numerator_unit_concept_id <dbl>, denominator_numeric <dbl>,
#> # denominator_unit_concept_id <dbl>
For detailed information about the patterns, their associated
formula, and combinations of amount_unit
,
numerator_unit
, and denominator_unit
, you can
refer to the data:
patternsWithFormula
Get daily dose with addDailyDose() function
Now that we have all the patterns and formulas supported, the
computation of daily doses can be performed using the
addDailyDose()
function. This function will add to the data
with additional columns, including those for quantity, daily dose, unit,
and route.
addDailyDose(
cdm$drug_exposure,
cdm = cdm,
ingredientConceptId = 1125315
)
#> # Source: table<og_009_1721204216> [?? x 9]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#> drug_exposure_id person_id drug_concept_id drug_exposure_start_date
#> <int> <int> <dbl> <date>
#> 1 2 1 2905077 2021-10-26
#> 2 6 2 2905077 2013-01-20
#> 3 7 3 1125360 2013-05-25
#> 4 10 3 2905077 2013-01-26
#> 5 11 4 1125360 2003-12-11
#> 6 13 4 1125360 1990-09-04
#> 7 15 4 43135274 1995-04-04
#> 8 21 6 43135274 2020-05-19
#> 9 25 7 2905077 2005-11-18
#> 10 26 7 2905077 2008-10-22
#> # ℹ more rows
#> # ℹ 5 more variables: drug_exposure_end_date <date>,
#> # drug_type_concept_id <dbl>, quantity <dbl>, daily_dose <dbl>, unit <chr>
There is also a function, summariseDoseCoverage()
, to
check the coverage of daily dose computation for chosen concept sets and
ingredients.
summariseDoseCoverage(cdm, 1125315)
#> ℹ The following estimates will be computed:
#> • daily_dose: count_missing, percentage_missing, mean, sd, q25, median, q75
#> ! Table is collected to memory as not all requested estimates are supported on
#> the database side
#> → Start summary of data, at 2024-07-17 08:16:56.90615
#>
#> ✔ Summary finished, at 2024-07-17 08:16:57.176255
#> # A tibble: 56 × 13
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 DUS MOCK ingredient_name acetaminophen overall overall
#> 2 1 DUS MOCK ingredient_name acetaminophen overall overall
#> 3 1 DUS MOCK ingredient_name acetaminophen overall overall
#> 4 1 DUS MOCK ingredient_name acetaminophen overall overall
#> 5 1 DUS MOCK ingredient_name acetaminophen overall overall
#> 6 1 DUS MOCK ingredient_name acetaminophen overall overall
#> 7 1 DUS MOCK ingredient_name acetaminophen overall overall
#> 8 1 DUS MOCK ingredient_name acetaminophen overall overall
#> 9 1 DUS MOCK ingredient_name acetaminophen unit milligram
#> 10 1 DUS MOCK ingredient_name acetaminophen unit milligram
#> # ℹ 46 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> # estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> # additional_name <chr>, additional_level <chr>
Adding Drug Usage Details to a Cohort with addDrugUse() function
Additional drug usage details, including duration, initial dose,
cumulative dose, etc., can be incorporated into a cohort using the
addDrugUse()
function.
cdm$acetaminophen_example1 |>
addDrugUse(ingredientConceptId = 1125315)
#> # Source: table<og_016_1721204227> [?? x 13]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date duration
#> <int> <int> <date> <date> <dbl>
#> 1 1 3 2013-05-25 2015-11-12 902
#> 2 1 73 2021-04-16 2021-04-30 15
#> 3 1 9 1989-02-04 1992-11-02 1368
#> 4 1 52 2015-12-13 2016-01-04 23
#> 5 1 183 2011-05-07 2013-07-10 796
#> 6 1 19 2020-12-22 2021-11-11 325
#> 7 1 4 2013-04-21 2015-04-01 711
#> 8 1 26 1991-01-10 1991-05-19 130
#> 9 1 30 2007-04-18 2007-04-24 7
#> 10 1 184 2011-06-09 2012-05-08 335
#> # ℹ more rows
#> # ℹ 8 more variables: number_exposures <dbl>, cumulative_quantity <dbl>,
#> # initial_quantity <dbl>, impute_duration_percentage <dbl>,
#> # number_eras <dbl>, impute_daily_dose_percentage <dbl>,
#> # initial_daily_dose_milligram <dbl>, cumulative_dose_milligram <dbl>
duration parameter
The duration
parameter is a boolean variable
(TRUE
/FALSE
) determining whether to include
duration related columns, which correspond to:
-
duration
: duration is calculated ascohort_end_date - cohort_start_date + 1
. -
impute_duration_percentage
: if a drug exposure record does not have the duration of the exposure, or falls outside the specified duration range, duration will be imputed. The number of records that have been imputed or that would have been imputed (if we choose not to impute the duration) is recorded in this column.
To set the imputation method for duration, use the
imputeDuration
input, which can take values such as
none
(default), median
, mode
or a
numerical value. Define the durationRange
parameter as a
numeric vector of length two, where the first value should be equal or
smaller than the second one. If set to NULL, no restrictions are
applied.
quantity parameter
The quantity
parameter, another boolean variable
(TRUE
/FALSE
), controls the inclusion of
quantity-related columns. If set to TRUE
(default), the
following columns are added:
-
cumulative_quantity
: cumulative sum of the columnquantity
of thedrug_exposure
table during the drug exposure period. -
initial_quantity
: quantity atdrug_exposure_start_date
.
dose parameter
The dose
parameter, also a boolean variable
(TRUE
/FALSE
), governs the addition of daily
dose-related columns. When set to TRUE
, the following
columns are added:
-
initial_daily_dose_milligram
: dose atdrug_exposure_start_date
. -
cumulative_dose_milligram
: cumulative sum of the columndose
ofdrug_exposure
table during the drug exposure period. -
impute_daily_dose_percentage
: If daily dose is missing, or falls outside the imputation range, records will be imputed. This column shows the number of records that have been imputed or that would have been imputed (if we choose not to impute the daily dose).
Similar to duration imputation, use the imputeDose
parameter to set the method for imputing daily dose, with options like
none
(default), median
, mean
,
mode
. Define the imputation range with the
dailyDoseRange
parameter, a numeric vector of length two,
where the first value should be equal or smaller than the second one. If
set to NULL, no restrictions are applied.
These parameters offer flexibility in customizing the drug usage details added to the cohort. See the next example, where we use the cohort created at the beginning of this vignette acetaminophen_example1.
addDrugUse(
cohort = cdm[["acetaminophen_example1"]],
cdm = cdm,
ingredientConceptId = 1125315,
duration = TRUE,
quantity = TRUE,
dose = TRUE
)
#> # Source: table<og_021_1721204238> [?? x 13]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date duration
#> <int> <int> <date> <date> <dbl>
#> 1 1 99 1990-02-04 1990-02-07 4
#> 2 1 164 2022-11-12 2022-11-15 4
#> 3 1 103 2004-03-29 2008-10-09 1656
#> 4 1 7 2005-11-18 2008-07-09 965
#> 5 1 197 2021-02-21 2021-02-22 2
#> 6 1 107 2019-12-11 2021-06-02 540
#> 7 1 69 2020-01-12 2020-10-26 289
#> 8 1 114 2022-06-24 2022-07-15 22
#> 9 1 160 2008-06-04 2015-11-19 2725
#> 10 1 87 2009-03-11 2011-11-20 985
#> # ℹ more rows
#> # ℹ 8 more variables: number_exposures <dbl>, cumulative_quantity <dbl>,
#> # initial_quantity <dbl>, impute_duration_percentage <dbl>,
#> # number_eras <dbl>, impute_daily_dose_percentage <dbl>,
#> # initial_daily_dose_milligram <dbl>, cumulative_dose_milligram <dbl>
If all these parameters are set to false, only
number_exposures
and number_eras
will be
added.
Parameters for Joining Exposures
The way continuous exposures are joined can be configured by using different parameters. Let’s have a look to all the options we have.
gapEra parameter
This parameter sets the number of days between two continuous
exposures to be considered in the same era. If the previous exposure’s
end date minus the next exposure’s start date is less than or equal to
the specified gapEra
, these two exposures will be joined.
Let’s see an illustrative example.
First, let’s create a cohort with gapEra = 0
. For a
better understanding, we will observe only subject number 56.
cdm <- generateDrugUtilisationCohortSet(
cdm = cdm,
name = "acetaminophen_example2",
conceptSet = conceptList,
gapEra = 0
)
cdm$drug_exposure |>
filter(drug_concept_id %in% !!conceptList$acetaminophen) |>
filter(person_id == 56)
#> # Source: SQL [0 x 7]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#> # ℹ 7 variables: drug_exposure_id <int>, person_id <int>,
#> # drug_concept_id <dbl>, drug_exposure_start_date <date>,
#> # drug_exposure_end_date <date>, drug_type_concept_id <dbl>, quantity <dbl>
This subject has two different drug exposure periods separated by less than 6 months. Hence, it has two different cohort periods:
cdm[["acetaminophen_example2"]] |>
addDrugUse(
ingredientConceptId = 1125315,
gapEra = 0
) |>
filter(subject_id == 56)
#> # Source: SQL [2 x 13]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date duration
#> <int> <int> <date> <date> <dbl>
#> 1 1 56 2021-08-08 2021-09-17 41
#> 2 1 56 2022-01-27 2022-03-04 37
#> # ℹ 8 more variables: number_exposures <dbl>, cumulative_quantity <dbl>,
#> # initial_quantity <dbl>, impute_duration_percentage <dbl>,
#> # number_eras <dbl>, impute_daily_dose_percentage <dbl>,
#> # initial_daily_dose_milligram <dbl>, cumulative_dose_milligram <dbl>
Now, we merge this two periods by modifying the gapEra
input when creating the cohort. For a better understanding of
gapEra
arguments and functionalities, please see Use
DrugUtilisation to create a cohort vignette.
cdm <- generateDrugUtilisationCohortSet(
cdm = cdm,
name = "acetaminophen_example3",
conceptSet = conceptList,
gapEra = 180
)
cdm$acetaminophen_example3 |>
addDrugUse(
ingredientConceptId = 1125315,
gapEra = 180,
duration = TRUE,
quantity = FALSE,
dose = FALSE
) |>
filter(subject_id == 56)
#> # Source: SQL [1 x 8]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date duration
#> <int> <int> <date> <date> <dbl>
#> 1 1 56 2021-08-08 2022-03-04 209
#> # ℹ 3 more variables: number_exposures <dbl>, impute_duration_percentage <dbl>,
#> # number_eras <dbl>
See that we only have one record with two exposures for subject
number 56. Note that the number of eras is still 1, as we have defined
the same gapEra
as when the cohort was created. However, it
is possible to specify a different gapEra
than the one
defined when the cohort was created.
cdm$acetaminophen_example3 |>
addDrugUse(
ingredientConceptId = 1125315,
gapEra = 0
) |>
filter(subject_id == 56)
#> # Source: SQL [1 x 13]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date duration
#> <int> <int> <date> <date> <dbl>
#> 1 1 56 2021-08-08 2022-03-04 209
#> # ℹ 8 more variables: number_exposures <dbl>, cumulative_quantity <dbl>,
#> # initial_quantity <dbl>, impute_duration_percentage <dbl>,
#> # number_eras <dbl>, impute_daily_dose_percentage <dbl>,
#> # initial_daily_dose_milligram <dbl>, cumulative_dose_milligram <dbl>
Notice that number_eras
now indicates that we have two
eras within the same record.
eraJoinMode parameter
This parameter defines how two different continuous exposures are joined in an era. There are four options:
-
eraJoinMode = "zero"
(default option): Exposures are joined considering that the period between both continuous exposures means the subject is treated with a daily dose of zero. The time between both exposures contributes to the total exposed time. -
eraJoinMode = "join"
: Exposures are joined, considering that the period between both continuous exposures means the subject is treated with a daily dose of zero. The time between both exposures does not contribute to the total exposed time. -
eraJoinMode = "previous"
: Exposures are joined, considering that the period between both continuous exposures means the subject is treated with the daily dose of the previous subexposure. The time between both exposures contributes to the total exposed time. -
eraJoinMode = "subsequent"
: Exposures are joined, considering that the period between both continuous exposures means the subject is treated with the daily dose of the subsequent subexposure. The time between both exposures contributes to the total exposed time.
overlapMode
parameter
This parameter defines how the overlapping between two exposures that do not start on the same day is resolved inside a subexposure. There are five possible options:
-
overlapMode* = "sum"
(default): The considered daily dose is the sum of all the exposures present in the subexposure. -
overlapMode = minimum
: The considered daily dose is the minimum of all the exposures in the subexposure. -
overlapMode = maximum
: The considered daily dose is the maximum of all the exposures in the subexposure. -
overlapMode = previous
: The considered daily dose is that of the earliest exposure. -
overlapMode = subsequent
: The considered daily dose is that of the latest exposure.
sameIndexMode parameter
This parameter works similarly to overlapMode
, but it
customizes the overlapping between two exposures starting on the same
date. It includes the options sum
(default),
minimum
, and maximum
described in
overlapMode
.
For example, the following example sets a maximum gap of 30 days for exposures to be joined. It uses the daily dose of the previous subexposure when joining exposures, employs the minimum daily dose for exposures starting on the same day, and considers the minimum daily dose for exposures that overlap.
cdm[["acetaminophen_example1"]] |>
addDrugUse(ingredientConceptId = 1125315,
gapEra = 30,
eraJoinMode = "previous",
overlapMode = "minimum",
sameIndexMode = "minimum")
#> # Source: table<og_040_1721204283> [?? x 13]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date duration
#> <int> <int> <date> <date> <dbl>
#> 1 1 6 2020-05-19 2020-11-23 189
#> 2 1 65 2017-03-11 2021-11-06 1702
#> 3 1 4 1990-09-04 2008-05-24 6473
#> 4 1 33 2018-09-30 2018-12-23 85
#> 5 1 72 2007-09-12 2007-10-12 31
#> 6 1 42 1998-05-15 1998-12-22 222
#> 7 1 119 2006-04-03 2017-07-18 4125
#> 8 1 114 2022-10-17 2022-11-13 28
#> 9 1 184 2009-04-22 2010-03-13 326
#> 10 1 49 2021-06-30 2021-11-11 135
#> # ℹ more rows
#> # ℹ 8 more variables: number_exposures <dbl>, cumulative_quantity <dbl>,
#> # initial_quantity <dbl>, impute_duration_percentage <dbl>,
#> # number_eras <dbl>, impute_daily_dose_percentage <dbl>,
#> # initial_daily_dose_milligram <dbl>, cumulative_dose_milligram <dbl>
Summarise drug usage information with summariseDrugUse() function
This functions creates a tibble summarising the dose table across multiple cohorts. See an example below:
cdm[["acetaminophen_example1"]] <- cdm[["acetaminophen_example1"]] |>
addDrugUse(
cdm = cdm,
ingredientConceptId = 1125315
)
summariseDrugUse(cdm[["acetaminophen_example1"]])
#> # A tibble: 101 × 13
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 2 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 3 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 4 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 5 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 6 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 7 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 8 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 9 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 10 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> # ℹ 91 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> # estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> # additional_name <chr>, additional_level <chr>
strata parameter
We can also stratify our cohort and calculate the estimates within
each strata group by using the strata
parameter.
cdm[["acetaminophen_example1"]] <- cdm[["acetaminophen_example1"]] |>
addSex() # Function from PatientProfiles
summariseDrugUse(cdm[["acetaminophen_example1"]],
strata = list("sex" = "sex"))
#> # A tibble: 303 × 13
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 2 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 3 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 4 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 5 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 6 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 7 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 8 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 9 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> 10 1 DUS MOCK cohort_name 161_acetaminophen overall overall
#> # ℹ 293 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> # estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> # additional_name <chr>, additional_level <chr>
drugEstimates parameter
Customize the estimates to be calculated by using the
drugEstimates
parameter. By default, it will compute the
minimum value, quartiles (5%, 25%, 50% - median, 75% and 95%), the
maximum value, the mean, the standard deviation, and the number of
missings values for each column added with
addDrugUse()
.