Get patients' characteristics
Martí Català, Mike Du, Yuchen Guo, Kim López-Güell, Xintong Li, Núria Mercadé-Besora, and Edward Burn
2024-03-17
Source:vignettes/addPatientCharacteristics.rmd
addPatientCharacteristics.rmd
Introduction
In this vignette we show different functions to get characteristics (e.g. age, sex, prior history…) of subjects in OMOP CDM tables and cohort tables. This can be useful when doing explanatory analysis as well as calling these functions for more complex analyses.
The PatientProfiles package is designed to work with data in the OMOP CDM format, so our first step is to create a reference to the data using the DBI and CDMConnector packages. The connection to a Postgres database would look like:
library(DBI)
library(CDMConnector)
# The input arguments provided are for illustrative purposes only and do not provide access to any database.
con <- DBI::dbConnect(RPostgres::Postgres(),
dbname = "omop_cdm",
host = "10.80.192.00",
user = "user_name",
password = "user_pasword"
)
cdm <- CDMConnector::cdm_from_con(con,
cdm_schema = "main",
write_schema = "main",
cohort_tables = "cohort_example"
)
For this example we will work with simulated data generated by the
mockPatientProfiles()
function provided in this package,
which mimics a database formatted in OMOP:
library(PatientProfiles)
library(duckdb)
library(dplyr)
cdm <- mockPatientProfiles(
patient_size = 1000,
drug_exposure_size = 1000
)
Example: get characteristics in tables
addAge()
: adds a new column to the input table
containing each patient’s age at a certain date, specified in indexDate.
Function allows to set month and/or day of birth to patients with
missings or it can be imposed to all subjects. Further, the function can
classify patient’s into different age groups based on the argument
ageGroup.
Suppose we want to calculate the age at condition start date for records in the condition_occurrence table. Also, we wan to group patients in 20-year age band and if they are 60 years old or more.
## Rows: ??
## Columns: 6
## Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## $ condition_occurrence_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ person_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ condition_concept_id <int> 4, 3, 5, 2, 3, 4, 4, 3, 5, 4, 1, 1, 4, 4, 3,…
## $ condition_start_date <date> 2005-06-30, 2005-05-28, 2008-06-30, 2011-01…
## $ condition_end_date <date> 2007-07-25, 2007-09-16, 2010-10-06, 2011-10…
## $ condition_type_concept_id <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
cdm$condition_occurrence_mod <- cdm$condition_occurrence %>%
addAge(
ageDefaultMonth = 1,
ageDefaultDay = 6,
indexDate = "condition_start_date",
ageGroup = list(
"age_band_20" =
list(
"0 to 19" = c(0, 19),
"20 to 39" = c(20, 39),
"40 to 59" = c(40, 59),
"60 to 79" = c(60, 79),
"80 to 99" = c(80, 99),
">= 100" = c(100, 150)
),
"age_threshold_60" =
list(
"less60" = c(0, 59),
"more60" = c(60, 150)
)
)
) |>
dplyr::compute(name = "condition_occurrence_mod")
cdm$condition_occurrence_mod %>%
glimpse()
## Rows: ??
## Columns: 9
## Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## $ condition_occurrence_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ person_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ condition_concept_id <int> 4, 3, 5, 2, 3, 4, 4, 3, 5, 4, 1, 1, 4, 4, 1,…
## $ condition_start_date <date> 2005-06-30, 2005-05-28, 2008-06-30, 2011-01…
## $ condition_end_date <date> 2007-07-25, 2007-09-16, 2010-10-06, 2011-10…
## $ condition_type_concept_id <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ age <dbl> 7, 59, 9, 62, 69, 38, 23, 83, 43, 28, 47, 54…
## $ age_band_20 <chr> "0 to 19", "40 to 59", "0 to 19", "60 to 79"…
## $ age_threshold_60 <chr> "less60", "less60", "less60", "more60", "mor…
addSex()
: appends a column to the input table indicating
the sex for each patient as “Female” or “Male”.
First, we can add the sex of the patients to the table. This information can be used to count the occurrences of the condition_concept_id = 5 in males aged 60 years or older. We can also stratify the number of events by age, grouping patients into 20-year age bands.
cdm$condition_occurrence_mod <- cdm$condition_occurrence_mod %>%
addSex()
cdm$condition_occurrence_mod %>%
glimpse()
## Rows: ??
## Columns: 10
## Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## $ condition_occurrence_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ person_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ condition_concept_id <int> 4, 3, 5, 2, 3, 4, 4, 3, 5, 4, 1, 1, 4, 4, 1,…
## $ condition_start_date <date> 2005-06-30, 2005-05-28, 2008-06-30, 2011-01…
## $ condition_end_date <date> 2007-07-25, 2007-09-16, 2010-10-06, 2011-10…
## $ condition_type_concept_id <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ age <dbl> 7, 59, 9, 62, 69, 38, 23, 83, 43, 28, 47, 54…
## $ age_band_20 <chr> "0 to 19", "40 to 59", "0 to 19", "60 to 79"…
## $ age_threshold_60 <chr> "less60", "less60", "less60", "more60", "mor…
## $ sex <chr> "Female", "Female", "Male", "Female", "Male"…
numConditions <- cdm$condition_occurrence_mod %>%
filter(
sex == "Male"
) %>%
filter(
age_threshold_60 == "more60"
) %>%
filter(
condition_concept_id == 5
) %>%
group_by(
age_band_20
) %>%
summarise(
n = count(condition_occurrence_id)
)
numConditions
## # Source: SQL [2 x 2]
## # Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## age_band_20 n
## <chr> <dbl>
## 1 60 to 79 28
## 2 80 to 99 15
Example: get characteristics in cohort tables
PatientProfiles functions can be used on both OMOP CDM tables and cohort tables. In this example we will see some of the package functionalities applied to a cohort table:
addInObservation()
: adds a new binary column to the
input table, indicating whether the subjects are being observed at a
specific time.
addPriorObservation()
: appends a column to the input
table containing the number of days each patient has been in observation
up to a specified date.
addFutureObservation()
: adds a column with the days of
future observation for an individual at a certain date
We can use the first function to obtain patients which are in observation at “cohort_start_date” and subsequently get their prior and future observation days. Notice that we are not using the argument “indexDate”, since it is already defaulted to “cohort_start_date”.
## Rows: ??
## Columns: 4
## Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## $ cohort_definition_id <dbl> 1, 1, 1, 2
## $ subject_id <dbl> 1, 1, 2, 3
## $ cohort_start_date <date> 2020-01-01, 2020-06-01, 2020-01-02, 2020-01-01
## $ cohort_end_date <date> 2020-04-01, 2020-08-01, 2020-02-02, 2020-03-01
cdm$cohort1 <- cdm$cohort1 %>%
addInObservation() %>%
filter(
in_observation == 1
) %>%
addPriorObservation() %>%
addFutureObservation()
cdm$cohort1 %>%
glimpse()
## Rows: ??
## Columns: 7
## Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## $ cohort_definition_id <dbl> 1, 1, 2, 1
## $ subject_id <dbl> 1, 2, 3, 1
## $ cohort_start_date <date> 2020-06-01, 2020-01-02, 2020-01-01, 2020-01-01
## $ cohort_end_date <date> 2020-08-01, 2020-02-02, 2020-03-01, 2020-04-01
## $ in_observation <dbl> 1, 1, 1, 1
## $ prior_observation <dbl> 4209, 4486, 5267, 4057
## $ future_observation <dbl> 9196, 6296, 1121, 9348
If the database allows for multiple observation periods, it’s
important to note that the results of the previous functions will be
based on the period where “indexDate” falls within. If a patient is not
under observation at the specified date,
addPriorObservation()
and
addFutureObservation()
functions will return NA.
Example: get all characteristics at once
addDemographics()
: can be used to add all the features
presented in this vignette (except for addInObservation()
)
at once, in both tables and cohort tables.
If we want to get the age, sex and prior history of individuals at
the day they enter a cohort, we can use the function
addDemographics()
as follows
## Rows: ??
## Columns: 4
## Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## $ cohort_definition_id <dbl> 1, 1, 2, 3, 1
## $ subject_id <dbl> 1, 3, 1, 2, 1
## $ cohort_start_date <date> 2019-12-30, 2020-01-01, 2020-05-25, 2020-01-01, 2…
## $ cohort_end_date <date> 2019-12-30, 2020-01-01, 2020-05-25, 2020-01-01, 2…
cdm$cohort2 <- cdm$cohort2 %>%
addDemographics(
age = TRUE,
ageName = "age",
ageGroup = NULL,
sex = TRUE,
sexName = "sex",
priorObservation = TRUE,
priorObservationName = "prior_observation",
futureObservation = FALSE,
)
cdm$cohort2 %>%
glimpse()
## Rows: ??
## Columns: 7
## Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## $ cohort_definition_id <dbl> 1, 3, 1, 2, 1
## $ subject_id <dbl> 1, 2, 3, 1, 1
## $ cohort_start_date <date> 2020-05-25, 2020-01-01, 2020-01-01, 2020-05-25, 2…
## $ cohort_end_date <date> 2020-05-25, 2020-01-01, 2020-01-01, 2020-05-25, 2…
## $ age <dbl> 22, 73, 21, 22, 22
## $ sex <chr> "Female", "Female", "Male", "Female", "Female"
## $ prior_observation <dbl> 4202, 4485, 5267, 4202, 4055