Add drug related information to a Drug Utilisation cohort

Introduction

The DrugUtilisation package includes a range of functions that add drug-related information of subjects in OMOP CDM tables and cohort tables. In this vignette, we will explore these functions and provide some examples for its usage.

Create mock data first

library(DrugUtilisation)
library(CDMConnector)
library(dplyr)
library(PatientProfiles)

cdm <- mockDrugUtilisation(numberIndividual  = 200)

Create a drug utilisation cohort

We will use Acetaminophen as our example drug to construct our drug utilisation cohort. To begin, we will employ getDrugIngredientCodes() function from CodelistGenerator to generate a concept list associated with Acetaminophen.

conceptList <- getDrugIngredientCodes(cdm, c("acetaminophen"))
conceptList
#> 
#> - 161_acetaminophen (4 codes)

Next, we create a drug utilisation cohort by using the conceptList with the generateDrugUtilisationCohortSet() function. For a better understanding of the arguments and functionalities of generateDrugUtilisationCohortSet(), please refer to the Use DrugUtilisation to create a cohort vignette.

cdm <- generateDrugUtilisationCohortSet(
  cdm  = cdm,
  name = "acetaminophen_example1",
  conceptSet = conceptList
)

Adding routes with addRoute() function

addRoute() function utilises an internal CSV file containing all possible routes for various drug dose forms supported by the package. The function is designed to seamlessly incorporate route information into your drug table for the supported dose forms. See the example below to know how it works.

cdm[["drug_exposure"]] |>
  addRoute() 
#> # Source:   SQL [?? x 8]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#>    drug_exposure_id person_id drug_concept_id drug_exposure_start_date
#>               <int>     <int>           <dbl> <date>                  
#>  1                1         1         1503328 2022-01-11              
#>  2                2         1         2905077 2021-10-26              
#>  3                3         1         1539463 2021-08-10              
#>  4                4         1         1516980 2021-12-20              
#>  5                5         2         1516978 2013-05-09              
#>  6                6         2         2905077 2013-01-20              
#>  7                7         3         1125360 2013-05-25              
#>  8                9         3         1516978 2010-07-16              
#>  9               10         3         2905077 2013-01-26              
#> 10               11         4         1125360 2003-12-11              
#> # ℹ more rows
#> # ℹ 4 more variables: drug_exposure_end_date <date>,
#> #   drug_type_concept_id <dbl>, quantity <dbl>, route <chr>

Generating patterns with patternTable() function

The patternTable() function in the DrugUtilisation package is a powerful tool for deriving patterns from a drug strength table. This function extracts distinct patterns, associating them with pattern_id and formula_id. The resulting tibble provides the following data:

number_concepts: the count of distinct concepts in the patterns.
number_ingredients: the count of distinct ingredients involved.
number_records: the overall count of records in the patterns.

Moreover, the tibble includes a column indicating potentially valid and invalid combinations.

patternTable(cdm)
#> # A tibble: 5 × 12
#>   pattern_id formula_name            validity number_concepts number_ingredients
#>        <dbl> <chr>                   <chr>              <dbl>              <dbl>
#> 1          9 fixed amount formulati… pattern…               7                  4
#> 2         18 concentration formulat… pattern…               1                  1
#> 3         24 concentration formulat… pattern…               1                  1
#> 4         40 concentration formulat… pattern…               1                  1
#> 5         NA NA                      no patt…               4                  4
#> # ℹ 7 more variables: number_records <dbl>, amount_numeric <dbl>,
#> #   amount_unit_concept_id <dbl>, numerator_numeric <dbl>,
#> #   numerator_unit_concept_id <dbl>, denominator_numeric <dbl>,
#> #   denominator_unit_concept_id <dbl>

For detailed information about the patterns, their associated formula, and combinations of amount_unit, numerator_unit, and denominator_unit, you can refer to the data:

patternsWithFormula

Get daily dose with addDailyDose() function

Now that we have all the patterns and formulas supported, the computation of daily doses can be performed using the addDailyDose() function. This function will add to the data with additional columns, including those for quantity, daily dose, unit, and route.

addDailyDose(
  cdm$drug_exposure,
  cdm = cdm,
  ingredientConceptId = 1125315
)
#> # Source:   table<og_009_1721204216> [?? x 9]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#>    drug_exposure_id person_id drug_concept_id drug_exposure_start_date
#>               <int>     <int>           <dbl> <date>                  
#>  1                2         1         2905077 2021-10-26              
#>  2                6         2         2905077 2013-01-20              
#>  3                7         3         1125360 2013-05-25              
#>  4               10         3         2905077 2013-01-26              
#>  5               11         4         1125360 2003-12-11              
#>  6               13         4         1125360 1990-09-04              
#>  7               15         4        43135274 1995-04-04              
#>  8               21         6        43135274 2020-05-19              
#>  9               25         7         2905077 2005-11-18              
#> 10               26         7         2905077 2008-10-22              
#> # ℹ more rows
#> # ℹ 5 more variables: drug_exposure_end_date <date>,
#> #   drug_type_concept_id <dbl>, quantity <dbl>, daily_dose <dbl>, unit <chr>

There is also a function, summariseDoseCoverage(), to check the coverage of daily dose computation for chosen concept sets and ingredients.

summariseDoseCoverage(cdm, 1125315)
#> ℹ The following estimates will be computed:
#> • daily_dose: count_missing, percentage_missing, mean, sd, q25, median, q75
#> ! Table is collected to memory as not all requested estimates are supported on
#>   the database side
#> → Start summary of data, at 2024-07-17 08:16:56.90615
#> 
#> ✔ Summary finished, at 2024-07-17 08:16:57.176255
#> # A tibble: 56 × 13
#>    result_id cdm_name group_name      group_level   strata_name strata_level
#>        <int> <chr>    <chr>           <chr>         <chr>       <chr>       
#>  1         1 DUS MOCK ingredient_name acetaminophen overall     overall     
#>  2         1 DUS MOCK ingredient_name acetaminophen overall     overall     
#>  3         1 DUS MOCK ingredient_name acetaminophen overall     overall     
#>  4         1 DUS MOCK ingredient_name acetaminophen overall     overall     
#>  5         1 DUS MOCK ingredient_name acetaminophen overall     overall     
#>  6         1 DUS MOCK ingredient_name acetaminophen overall     overall     
#>  7         1 DUS MOCK ingredient_name acetaminophen overall     overall     
#>  8         1 DUS MOCK ingredient_name acetaminophen overall     overall     
#>  9         1 DUS MOCK ingredient_name acetaminophen unit        milligram   
#> 10         1 DUS MOCK ingredient_name acetaminophen unit        milligram   
#> # ℹ 46 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> #   estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> #   additional_name <chr>, additional_level <chr>

Adding Drug Usage Details to a Cohort with addDrugUse() function

Additional drug usage details, including duration, initial dose, cumulative dose, etc., can be incorporated into a cohort using the addDrugUse() function.

cdm$acetaminophen_example1 |>
  addDrugUse(ingredientConceptId = 1125315)
#> # Source:   table<og_016_1721204227> [?? x 13]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#>    cohort_definition_id subject_id cohort_start_date cohort_end_date duration
#>                   <int>      <int> <date>            <date>             <dbl>
#>  1                    1          3 2013-05-25        2015-11-12           902
#>  2                    1         73 2021-04-16        2021-04-30            15
#>  3                    1          9 1989-02-04        1992-11-02          1368
#>  4                    1         52 2015-12-13        2016-01-04            23
#>  5                    1        183 2011-05-07        2013-07-10           796
#>  6                    1         19 2020-12-22        2021-11-11           325
#>  7                    1          4 2013-04-21        2015-04-01           711
#>  8                    1         26 1991-01-10        1991-05-19           130
#>  9                    1         30 2007-04-18        2007-04-24             7
#> 10                    1        184 2011-06-09        2012-05-08           335
#> # ℹ more rows
#> # ℹ 8 more variables: number_exposures <dbl>, cumulative_quantity <dbl>,
#> #   initial_quantity <dbl>, impute_duration_percentage <dbl>,
#> #   number_eras <dbl>, impute_daily_dose_percentage <dbl>,
#> #   initial_daily_dose_milligram <dbl>, cumulative_dose_milligram <dbl>

duration parameter

The duration parameter is a boolean variable (TRUE/FALSE) determining whether to include duration related columns, which correspond to:

duration: duration is calculated as cohort_end_date - cohort_start_date + 1.
impute_duration_percentage: if a drug exposure record does not have the duration of the exposure, or falls outside the specified duration range, duration will be imputed. The number of records that have been imputed or that would have been imputed (if we choose not to impute the duration) is recorded in this column.

To set the imputation method for duration, use the imputeDuration input, which can take values such as none (default), median, mode or a numerical value. Define the durationRange parameter as a numeric vector of length two, where the first value should be equal or smaller than the second one. If set to NULL, no restrictions are applied.

quantity parameter

The quantity parameter, another boolean variable (TRUE/FALSE), controls the inclusion of quantity-related columns. If set to TRUE (default), the following columns are added:

cumulative_quantity: cumulative sum of the column quantity of the drug_exposure table during the drug exposure period.
initial_quantity: quantity at drug_exposure_start_date.

dose parameter

The dose parameter, also a boolean variable (TRUE/FALSE), governs the addition of daily dose-related columns. When set to TRUE, the following columns are added:

initial_daily_dose_milligram: dose at drug_exposure_start_date.
cumulative_dose_milligram: cumulative sum of the column dose of drug_exposure table during the drug exposure period.
impute_daily_dose_percentage: If daily dose is missing, or falls outside the imputation range, records will be imputed. This column shows the number of records that have been imputed or that would have been imputed (if we choose not to impute the daily dose).

Similar to duration imputation, use the imputeDose parameter to set the method for imputing daily dose, with options like none (default), median, mean, mode. Define the imputation range with the dailyDoseRange parameter, a numeric vector of length two, where the first value should be equal or smaller than the second one. If set to NULL, no restrictions are applied.

These parameters offer flexibility in customizing the drug usage details added to the cohort. See the next example, where we use the cohort created at the beginning of this vignette acetaminophen_example1.

addDrugUse(
  cohort = cdm[["acetaminophen_example1"]],
  cdm    = cdm,
  ingredientConceptId = 1125315,
  duration = TRUE,
  quantity = TRUE,
  dose     = TRUE
)
#> # Source:   table<og_021_1721204238> [?? x 13]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#>    cohort_definition_id subject_id cohort_start_date cohort_end_date duration
#>                   <int>      <int> <date>            <date>             <dbl>
#>  1                    1         99 1990-02-04        1990-02-07             4
#>  2                    1        164 2022-11-12        2022-11-15             4
#>  3                    1        103 2004-03-29        2008-10-09          1656
#>  4                    1          7 2005-11-18        2008-07-09           965
#>  5                    1        197 2021-02-21        2021-02-22             2
#>  6                    1        107 2019-12-11        2021-06-02           540
#>  7                    1         69 2020-01-12        2020-10-26           289
#>  8                    1        114 2022-06-24        2022-07-15            22
#>  9                    1        160 2008-06-04        2015-11-19          2725
#> 10                    1         87 2009-03-11        2011-11-20           985
#> # ℹ more rows
#> # ℹ 8 more variables: number_exposures <dbl>, cumulative_quantity <dbl>,
#> #   initial_quantity <dbl>, impute_duration_percentage <dbl>,
#> #   number_eras <dbl>, impute_daily_dose_percentage <dbl>,
#> #   initial_daily_dose_milligram <dbl>, cumulative_dose_milligram <dbl>

If all these parameters are set to false, only number_exposures and number_eras will be added.

Parameters for Joining Exposures

The way continuous exposures are joined can be configured by using different parameters. Let’s have a look to all the options we have.

gapEra parameter

This parameter sets the number of days between two continuous exposures to be considered in the same era. If the previous exposure’s end date minus the next exposure’s start date is less than or equal to the specified gapEra, these two exposures will be joined. Let’s see an illustrative example.

First, let’s create a cohort with gapEra = 0. For a better understanding, we will observe only subject number 56.

cdm <- generateDrugUtilisationCohortSet(
  cdm = cdm,
  name = "acetaminophen_example2",
  conceptSet = conceptList,
  gapEra = 0
)
 
cdm$drug_exposure |>
  filter(drug_concept_id %in% !!conceptList$acetaminophen) |>
  filter(person_id == 56)
#> # Source:   SQL [0 x 7]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#> # ℹ 7 variables: drug_exposure_id <int>, person_id <int>,
#> #   drug_concept_id <dbl>, drug_exposure_start_date <date>,
#> #   drug_exposure_end_date <date>, drug_type_concept_id <dbl>, quantity <dbl>

This subject has two different drug exposure periods separated by less than 6 months. Hence, it has two different cohort periods:

cdm[["acetaminophen_example2"]] |>
  addDrugUse(
    ingredientConceptId = 1125315,
    gapEra = 0
  ) |>
  filter(subject_id == 56)
#> # Source:   SQL [2 x 13]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#>   cohort_definition_id subject_id cohort_start_date cohort_end_date duration
#>                  <int>      <int> <date>            <date>             <dbl>
#> 1                    1         56 2021-08-08        2021-09-17            41
#> 2                    1         56 2022-01-27        2022-03-04            37
#> # ℹ 8 more variables: number_exposures <dbl>, cumulative_quantity <dbl>,
#> #   initial_quantity <dbl>, impute_duration_percentage <dbl>,
#> #   number_eras <dbl>, impute_daily_dose_percentage <dbl>,
#> #   initial_daily_dose_milligram <dbl>, cumulative_dose_milligram <dbl>

Now, we merge this two periods by modifying the gapEra input when creating the cohort. For a better understanding of gapEra arguments and functionalities, please see Use DrugUtilisation to create a cohort vignette.

cdm <- generateDrugUtilisationCohortSet(
  cdm = cdm,
  name = "acetaminophen_example3",
  conceptSet = conceptList,
  gapEra = 180
)

cdm$acetaminophen_example3 |>
  addDrugUse(
    ingredientConceptId = 1125315,
    gapEra = 180,
    duration = TRUE,
    quantity = FALSE,
    dose = FALSE
  ) |>
  filter(subject_id == 56) 
#> # Source:   SQL [1 x 8]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#>   cohort_definition_id subject_id cohort_start_date cohort_end_date duration
#>                  <int>      <int> <date>            <date>             <dbl>
#> 1                    1         56 2021-08-08        2022-03-04           209
#> # ℹ 3 more variables: number_exposures <dbl>, impute_duration_percentage <dbl>,
#> #   number_eras <dbl>

See that we only have one record with two exposures for subject number 56. Note that the number of eras is still 1, as we have defined the same gapEra as when the cohort was created. However, it is possible to specify a different gapEra than the one defined when the cohort was created.

cdm$acetaminophen_example3 |>
  addDrugUse(
    ingredientConceptId = 1125315,
    gapEra = 0
  ) |>
  filter(subject_id == 56) 
#> # Source:   SQL [1 x 13]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#>   cohort_definition_id subject_id cohort_start_date cohort_end_date duration
#>                  <int>      <int> <date>            <date>             <dbl>
#> 1                    1         56 2021-08-08        2022-03-04           209
#> # ℹ 8 more variables: number_exposures <dbl>, cumulative_quantity <dbl>,
#> #   initial_quantity <dbl>, impute_duration_percentage <dbl>,
#> #   number_eras <dbl>, impute_daily_dose_percentage <dbl>,
#> #   initial_daily_dose_milligram <dbl>, cumulative_dose_milligram <dbl>

Notice that number_eras now indicates that we have two eras within the same record.

eraJoinMode parameter

This parameter defines how two different continuous exposures are joined in an era. There are four options:

eraJoinMode = "zero" (default option): Exposures are joined considering that the period between both continuous exposures means the subject is treated with a daily dose of zero. The time between both exposures contributes to the total exposed time.
eraJoinMode = "join": Exposures are joined, considering that the period between both continuous exposures means the subject is treated with a daily dose of zero. The time between both exposures does not contribute to the total exposed time.
eraJoinMode = "previous": Exposures are joined, considering that the period between both continuous exposures means the subject is treated with the daily dose of the previous subexposure. The time between both exposures contributes to the total exposed time.
eraJoinMode = "subsequent": Exposures are joined, considering that the period between both continuous exposures means the subject is treated with the daily dose of the subsequent subexposure. The time between both exposures contributes to the total exposed time.

`overlapMode` parameter

This parameter defines how the overlapping between two exposures that do not start on the same day is resolved inside a subexposure. There are five possible options:

overlapMode* = "sum" (default): The considered daily dose is the sum of all the exposures present in the subexposure.
overlapMode = minimum: The considered daily dose is the minimum of all the exposures in the subexposure.
overlapMode = maximum: The considered daily dose is the maximum of all the exposures in the subexposure.
overlapMode = previous: The considered daily dose is that of the earliest exposure.
overlapMode = subsequent: The considered daily dose is that of the latest exposure.

sameIndexMode parameter

This parameter works similarly to overlapMode, but it customizes the overlapping between two exposures starting on the same date. It includes the options sum (default), minimum, and maximum described in overlapMode.

For example, the following example sets a maximum gap of 30 days for exposures to be joined. It uses the daily dose of the previous subexposure when joining exposures, employs the minimum daily dose for exposures starting on the same day, and considers the minimum daily dose for exposures that overlap.

cdm[["acetaminophen_example1"]] |>
  addDrugUse(ingredientConceptId = 1125315,
             gapEra = 30,
             eraJoinMode = "previous",
             overlapMode = "minimum",
             sameIndexMode = "minimum")
#> # Source:   table<og_040_1721204283> [?? x 13]
#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#>    cohort_definition_id subject_id cohort_start_date cohort_end_date duration
#>                   <int>      <int> <date>            <date>             <dbl>
#>  1                    1          6 2020-05-19        2020-11-23           189
#>  2                    1         65 2017-03-11        2021-11-06          1702
#>  3                    1          4 1990-09-04        2008-05-24          6473
#>  4                    1         33 2018-09-30        2018-12-23            85
#>  5                    1         72 2007-09-12        2007-10-12            31
#>  6                    1         42 1998-05-15        1998-12-22           222
#>  7                    1        119 2006-04-03        2017-07-18          4125
#>  8                    1        114 2022-10-17        2022-11-13            28
#>  9                    1        184 2009-04-22        2010-03-13           326
#> 10                    1         49 2021-06-30        2021-11-11           135
#> # ℹ more rows
#> # ℹ 8 more variables: number_exposures <dbl>, cumulative_quantity <dbl>,
#> #   initial_quantity <dbl>, impute_duration_percentage <dbl>,
#> #   number_eras <dbl>, impute_daily_dose_percentage <dbl>,
#> #   initial_daily_dose_milligram <dbl>, cumulative_dose_milligram <dbl>

Summarise drug usage information with summariseDrugUse() function

This functions creates a tibble summarising the dose table across multiple cohorts. See an example below:

cdm[["acetaminophen_example1"]] <- cdm[["acetaminophen_example1"]] |> 
  addDrugUse(
    cdm = cdm,
    ingredientConceptId = 1125315
  )

summariseDrugUse(cdm[["acetaminophen_example1"]])
#> # A tibble: 101 × 13
#>    result_id cdm_name group_name  group_level       strata_name strata_level
#>        <int> <chr>    <chr>       <chr>             <chr>       <chr>       
#>  1         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  2         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  3         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  4         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  5         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  6         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  7         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  8         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  9         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#> 10         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#> # ℹ 91 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> #   estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> #   additional_name <chr>, additional_level <chr>

strata parameter

We can also stratify our cohort and calculate the estimates within each strata group by using the strata parameter.

cdm[["acetaminophen_example1"]] <- cdm[["acetaminophen_example1"]] |>
  addSex() # Function from PatientProfiles

summariseDrugUse(cdm[["acetaminophen_example1"]],
                 strata = list("sex" = "sex")) 
#> # A tibble: 303 × 13
#>    result_id cdm_name group_name  group_level       strata_name strata_level
#>        <int> <chr>    <chr>       <chr>             <chr>       <chr>       
#>  1         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  2         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  3         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  4         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  5         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  6         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  7         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  8         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#>  9         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#> 10         1 DUS MOCK cohort_name 161_acetaminophen overall     overall     
#> # ℹ 293 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> #   estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> #   additional_name <chr>, additional_level <chr>

drugEstimates parameter

Customize the estimates to be calculated by using the drugEstimates parameter. By default, it will compute the minimum value, quartiles (5%, 25%, 50% - median, 75% and 95%), the maximum value, the mean, the standard deviation, and the number of missings values for each column added with addDrugUse().

minCellCount parameter

Specify the minimum number of individuals that a strata group must have in order to appear in the table.

Marti Catala, Mike Du, Yuchen Guo, Kim Lopez-Guell, Edward Burn, Xintong Li

2024-07-17