Characterise a cohort using Drug Utilisation
Marti Catala, Mike Du, Yuchen Guo, Kim Lopez-Guell, Edward Burn, Xintong Li, Xihang Chen
In DrugUtilisation package, there are also functions on daily dose of drugs. The best way to learn these functions is to consider a few examples.
Create mock data first
cdm <- mockDrugUtilisation(numberIndividual = 200)
Add daily dose information via addDailyDose() function
The following example calculates the daily dose of
acetaminophen in the drug_exposure
cdm$drug_exposure |>
filter(drug_concept_id == 2905077) |>
addDailyDose(ingredientConceptId = 1125315) |>
#> Rows: ??
#> Columns: 9
#> Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#> $ drug_exposure_id <int> 2, 6, 10, 16, 25, 26, 42, 46, 62, 64, 76, 90,…
#> $ person_id <int> 1, 2, 3, 4, 7, 7, 17, 18, 22, 22, 25, 33, 36,…
#> $ drug_concept_id <dbl> 2905077, 2905077, 2905077, 2905077, 2905077, …
#> $ drug_exposure_start_date <date> 2021-10-26, 2013-01-20, 2013-01-26, 2013-04-…
#> $ drug_exposure_end_date <date> 2022-01-22, 2013-06-16, 2013-05-09, 2015-04-…
#> $ drug_type_concept_id <dbl> 38000177, 38000177, 38000177, 38000177, 38000…
#> $ quantity <dbl> 90, 50, 15, 100, 100, 60, 15, 40, 50, 20, 30,…
#> $ daily_dose <dbl> 9.707865e+03, 3.243243e+03, 1.384615e+03, 1.3…
#> $ unit <chr> "milligram", "milligram", "milligram", "milli…
The drug_exposure
table was first according to
or not. This is to subset the drug_exposure
table to the
use of acetaminophen in the marketed product form. Later the ID
is supplied in addDailyDose()
function, which corresponds
to the ingredient form of acetaminophen.
We see that there are two extra columns at the end detailing the
daily dose of acetaminophen of each drug exposure of interest -
and unit
. Take first entry for
example, we see that in that drug exposure of acetaminophen, the daily
dose on average was
milligram. The way daily dose is being calculated is to take into
consideration both the strength of the drug and the duration of the
Add route information via addRoute() function
Next, the user could also obtain the information on the route of each
drug used in the drug_exposure
cdm$drug_exposure |>
addRoute() |>
#> Rows: ??
#> Columns: 8
#> Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1023-azure:R 4.4.1/:memory:]
#> $ drug_exposure_id <int> 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 13, 15, 16, 1…
#> $ person_id <int> 1, 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 6, …
#> $ drug_concept_id <dbl> 1503328, 2905077, 1539463, 1516980, 1516978, …
#> $ drug_exposure_start_date <date> 2022-01-11, 2021-10-26, 2021-08-10, 2021-12-…
#> $ drug_exposure_end_date <date> 2022-01-26, 2022-01-22, 2021-11-22, 2022-01-…
#> $ drug_type_concept_id <dbl> 38000177, 38000177, 38000177, 38000177, 38000…
#> $ quantity <dbl> 80, 90, 100, 35, 1, 50, 1, 100, 15, 45, 1, 80…
#> $ route <chr> "topical", "topical", "topical", "topical", "…
Similarly, this adds one extra column named route
. For
example, the drug used in the first entry of the
table was an oral drug.
Finding out the pattern information using patternTable() function
The user could also find the patterns used in the
table. The output will also include a column
of potentially valid and invalid combinations. The idea of a pattern to
provide a platform to associate each drug in the
table with its constituent ingredients.
patternTable(cdm) |>
#> Rows: 5
#> Columns: 12
#> $ pattern_id <dbl> 9, 18, 24, 40, NA
#> $ formula_name <chr> "fixed amount formulation", "concentration…
#> $ validity <chr> "pattern with formula", "pattern with form…
#> $ number_concepts <dbl> 7, 1, 1, 1, 4
#> $ number_ingredients <dbl> 4, 1, 1, 1, 4
#> $ number_records <dbl> 350, 67, 63, 74, 61
#> $ amount_numeric <dbl> 1, 0, 0, 0, NA
#> $ amount_unit_concept_id <dbl> 8576, NA, NA, NA, NA
#> $ numerator_numeric <dbl> 0, 1, 1, 1, NA
#> $ numerator_unit_concept_id <dbl> NA, 8576, 8510, 8510, NA
#> $ denominator_numeric <dbl> 0, 1, 1, 0, NA
#> $ denominator_unit_concept_id <dbl> NA, 8587, 8587, 8587, NA
The output has three important columns, namely
, number_ingredients
, which corresponds to count of distinct
concepts in the patterns, count of distinct ingredients involved and
overall count of records in the patterns respectively.
We also see that there is a column named pattern_id
together with information such as number_concepts
. The idea is to use this output in
conjunction with the data named patternWithFormula
. Please see the data
for different patterns, their associated
formula and combinations of amount_unit
and denominator_unit
patternsWithFormula |>
#> Rows: 41
#> Columns: 9
#> $ pattern_id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16…
#> $ amount <chr> NA, NA, NA, NA, NA, "number", "number", "number", "nu…
#> $ amount_unit <chr> NA, NA, NA, NA, NA, "international unit", "microgram"…
#> $ numerator <chr> "number", "number", "number", "number", "number", NA,…
#> $ numerator_unit <chr> "microgram", "milligram", "unit", "microgram", "milli…
#> $ denominator <chr> "number", "number", "number", NA, NA, NA, NA, NA, NA,…
#> $ denominator_unit <chr> "hour", "hour", "hour", "hour", "hour", NA, NA, NA, N…
#> $ formula_name <chr> "time based with denominator", "time based with denom…
#> $ formula <chr> "if (denominator>24) {numerator * 24 / denominator} e…
Finding out the dose coverage using summariseDoseCoverage() function
This package also provides a function to check the coverage of daily dose computation for chosen concept sets and ingredients. Again let’s take acetaminophen as an example.
summariseDoseCoverage(cdm = cdm, ingredientConceptId = 1125315) |>
#> Rows: 56
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "DUS MOCK", "DUS MOCK", "DUS MOCK", "DUS MOCK", "DUS …
#> $ group_name <chr> "ingredient_name", "ingredient_name", "ingredient_nam…
#> $ group_level <chr> "acetaminophen", "acetaminophen", "acetaminophen", "a…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "number records", "Missing dose", "Missing dose", "da…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name <chr> "count", "count_missing", "percentage_missing", "mean…
#> $ estimate_type <chr> "integer", "integer", "percentage", "numeric", "numer…
#> $ estimate_value <chr> "174", "0", "0", "4986.89847658171", "22201.148944306…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
The output will summarise the usage of acetaminophen in the database.
For example, overall there were
records of acetaminophen use and none missing in terms of the duration
and strength. By default the output will also include the mean, median,
lower and upper quartiles and standard deviation of the daily dose of
acetaminophen (calculated using addDailyDose()
). The
results will also be stratified by unit, route (which we saw in
function) and pattern (which we saw in
One may wish to display the output of
in a gt table form like so:
summariseDoseCoverage(cdm = cdm, ingredientConceptId = 1125315) |>
Database name | Ingredient name | Unit | Route | Pattern id | Number records | Missing dose | Daily dose | |
N | N (%) | Mean (SD) | Median (Q25 - Q75) | |||||
DUS MOCK | Acetaminophen | Overall | Overall | Overall | 174 | 0 (0.00 %) | 4,986.90 (22,201.15) | 131.07 (18.45 - 936.23) |
Milligram | Overall | Overall | 174 | 0 (0.00 %) | 4,986.90 (22,201.15) | 131.07 (18.45 - 936.23) | ||
Oral | Overall | 49 | 0 (0.00 %) | 328.91 (956.18) | 26.85 (7.81 - 188.98) | |||
Topical | Overall | 125 | 0 (0.00 %) | 6,812.83 (25,987.96) | 261.78 (39.68 - 1,384.62) | |||
Oral | 9 | 49 | 0 (0.00 %) | 328.91 (956.18) | 26.85 (7.81 - 188.98) | |||
Topical | 18 | 67 | 0 (0.00 %) | 11,277.48 (34,285.86) | 970.92 (317.48 - 4,882.05) | |||
9 | 58 | 0 (0.00 %) | 1,655.39 (7,590.12) | 35.55 (12.06 - 167.58) |
The user also has the freedom to customize the gt table output. For
example the following will suppress the cdmName
summariseDoseCoverage(cdm = cdm, ingredientConceptId = 1125315) |>
tableDoseCoverage(cdmName = F)
Ingredient name | Unit | Route | Pattern id | Number records | Missing dose | Daily dose | |
N | N (%) | Mean (SD) | Median (Q25 - Q75) | ||||
Acetaminophen | Overall | Overall | Overall | 174 | 0 (0.00 %) | 4,986.90 (22,201.15) | 131.07 (18.45 - 936.23) |
Milligram | Overall | Overall | 174 | 0 (0.00 %) | 4,986.90 (22,201.15) | 131.07 (18.45 - 936.23) | |
Oral | Overall | 49 | 0 (0.00 %) | 328.91 (956.18) | 26.85 (7.81 - 188.98) | ||
Topical | Overall | 125 | 0 (0.00 %) | 6,812.83 (25,987.96) | 261.78 (39.68 - 1,384.62) | ||
Oral | 9 | 49 | 0 (0.00 %) | 328.91 (956.18) | 26.85 (7.81 - 188.98) | ||
Topical | 18 | 67 | 0 (0.00 %) | 11,277.48 (34,285.86) | 970.92 (317.48 - 4,882.05) | ||
9 | 58 | 0 (0.00 %) | 1,655.39 (7,590.12) | 35.55 (12.06 - 167.58) |
Additionally, if the user wants to specify a title for the gt output, they could try the following:
summariseDoseCoverage(cdm = cdm, ingredientConceptId = 1125315) |>
tableDoseCoverage(.options = list(title = "Title of summariseDoseCoverage"))
Title of summariseDoseCoverage | ||||||||
Database name | Ingredient name | Unit | Route | Pattern id | Number records | Missing dose | Daily dose | |
N | N (%) | Mean (SD) | Median (Q25 - Q75) | |||||
DUS MOCK | Acetaminophen | Overall | Overall | Overall | 174 | 0 (0.00 %) | 4,986.90 (22,201.15) | 131.07 (18.45 - 936.23) |
Milligram | Overall | Overall | 174 | 0 (0.00 %) | 4,986.90 (22,201.15) | 131.07 (18.45 - 936.23) | ||
Oral | Overall | 49 | 0 (0.00 %) | 328.91 (956.18) | 26.85 (7.81 - 188.98) | |||
Topical | Overall | 125 | 0 (0.00 %) | 6,812.83 (25,987.96) | 261.78 (39.68 - 1,384.62) | |||
Oral | 9 | 49 | 0 (0.00 %) | 328.91 (956.18) | 26.85 (7.81 - 188.98) | |||
Topical | 18 | 67 | 0 (0.00 %) | 11,277.48 (34,285.86) | 970.92 (317.48 - 4,882.05) | |||
9 | 58 | 0 (0.00 %) | 1,655.39 (7,590.12) | 35.55 (12.06 - 167.58) |