Get cohort intersections

Introduction

In this vignette we present how functions from this package can be used to get intersections between cohorts. This can be useful, for instance, if we want to identify patients with previous conditions.

The PatientProfiles package is designed to work with data in the OMOP CDM format, so our first step is to create a reference to the data using the DBI and CDMConnector packages. The connection to a Postgres database would look like:

library(DBI)
library(CDMConnector)

# The input arguments provided are for illustrative purposes only and do not provide access to any database.

con <- DBI::dbConnect(RPostgres::Postgres(),
  dbname = "omop_cdm",
  host = "10.80.192.00",
  user = "user_name",
  password = "user_pasword"
)

cdm <- CDMConnector::cdm_from_con(con,
  cdm_schema = "main",
  write_schema = "main",
  cohort_tables = "cohort_example"
)

In this vignette we will work with simulated data generated by the mockPatientProfiles() function provided in this package, which mimics a database formatted in OMOP.

library(PatientProfiles)
library(duckdb)
library(dplyr)

cdm <- mockPatientProfiles(
  patient_size = 1000,
  drug_exposure_size = 1000
)

In this mock dataset there are the following cohort tables:

cdm$cohort1 %>%
  glimpse()

## Rows: ??
## Columns: 4
## Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## $ cohort_definition_id <dbl> 1, 1, 1, 2
## $ subject_id           <dbl> 1, 1, 2, 3
## $ cohort_start_date    <date> 2020-01-01, 2020-06-01, 2020-01-02, 2020-01-01
## $ cohort_end_date      <date> 2020-04-01, 2020-08-01, 2020-02-02, 2020-03-01

cdm$cohort2 %>%
  glimpse()

## Rows: ??
## Columns: 4
## Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## $ cohort_definition_id <dbl> 1, 1, 2, 3, 1
## $ subject_id           <dbl> 1, 3, 1, 2, 1
## $ cohort_start_date    <date> 2019-12-30, 2020-01-01, 2020-05-25, 2020-01-01, 2…
## $ cohort_end_date      <date> 2019-12-30, 2020-01-01, 2020-05-25, 2020-01-01, 2…

Example: addCohortIntersectFlag and addCohortIntersectCount functions

addCohortIntersectFlag(): adds a binary column indicate intersection with a cohort in a time frame.

Suppose cohort2 with definition_id = 1 contains stroke occurrences. If we want to exclude patients from cohort1 who had a stroke event in the last 180 days before entering the cohort, we can use the addCohortIntersectFlag() like this:

cohort1WashOut <- cdm$cohort1 %>%
  addCohortIntersectFlag(
    targetCohortTable = "cohort2",
    window = list(c(-180, -1)),
    targetCohortId = 1,
  ) %>%
  filter(cohort_1_m180_to_m1 == 0)

cohort1WashOut %>%
  glimpse()

## Rows: ??
## Columns: 5
## Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## $ cohort_definition_id <dbl> 2, 1
## $ subject_id           <dbl> 3, 2
## $ cohort_start_date    <date> 2020-01-01, 2020-01-02
## $ cohort_end_date      <date> 2020-03-01, 2020-02-02
## $ cohort_1_m180_to_m1  <dbl> 0, 0

addCohortIntersectCount(): adds a column where it indicates the number of intersections in a certain time window.

We can use the function to count the number of occurrences of an event of interest in different time windows before entering the study population. For example, we can look at the number of strokes in the 0-90 day, 90-365 day, and all prior history windows:

cohort1StrokeCounts <- cdm$cohort1 %>%
  addCohortIntersectCount(
    targetCohortTable = "cohort2",
    window = list(c(-Inf, -366), c(-365, -181), c(-180, -1)),
    targetCohortId = 1
  )

cohort1StrokeCounts %>%
  glimpse()

## Rows: ??
## Columns: 7
## Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## $ cohort_definition_id  <dbl> 1, 1, 2, 1
## $ subject_id            <dbl> 1, 1, 3, 2
## $ cohort_start_date     <date> 2020-01-01, 2020-06-01, 2020-01-01, 2020-01-02
## $ cohort_end_date       <date> 2020-04-01, 2020-08-01, 2020-03-01, 2020-02-02
## $ cohort_1_minf_to_m366 <dbl> 0, 0, 0, 0
## $ cohort_1_m365_to_m181 <dbl> 0, 0, 0, 0
## $ cohort_1_m180_to_m1   <dbl> 1, 2, 0, 0

Let us comment on the targetEndDate functionality found in addCohortIntersectCount() and addCohortIntersectFlag() functions. In both of them, there are three reference dates which can be specified: * indexDate: date from the primary cohort table, which contains the individuals for which we want to find the intersection events * targetStartDate: date from the events table used for the intersection * targetEndDate: date from the events table used for the intersection

By default, indexDate = cohort_start_date, targetStartDate = cohort_start_date and targetEndDate = cohort_end_date. This means that, if we are intersecting two cohorts and specify window = c(-30,-1), we will get any events from the intersecting cohort happening up to 30 days previous to the cohort start date of the main cohort. Namely:

# This will be our "main" cohort
cohort1 <- dplyr::tibble(
  cohort_definition_id = 1,
  subject_id = c("1", "2"),
  cohort_start_date = c(
    as.Date("2010-03-01"),
    as.Date("2012-03-01")
  ),
  cohort_end_date = c(
    as.Date("2015-01-01"),
    as.Date("2016-03-01")
  )
)

# This is the cohort with the events we are interested in
cohort2 <- dplyr::tibble(
  cohort_definition_id = 1,
  subject_id = c("1", "1", "1", "2"),
  cohort_start_date = c(
    as.Date("2010-03-03"),
    as.Date("2010-02-27"),
    as.Date("2010-03-25"),
    as.Date("2013-01-03")
  ),
  cohort_end_date = c(
    as.Date("2010-03-03"),
    as.Date("2010-02-27"),
    as.Date("2012-03-25"),
    as.Date("2013-01-03")
  )
)

observation_period <- dplyr::tibble(
  observation_period_id = 1:2,
  person_id = c(1,2),
  observation_period_start_date = as.Date(c("1990-01-01", "1995-08-16")),
  observation_period_end_date = as.Date(c("2025-01-01", "2030-08-16")),
  period_type_concept_id = 0
)

cdm <- mockPatientProfiles(
  observation_period = observation_period,
  cohort1 = cohort1,
  cohort2 = cohort2
)

cdm$cohort1 <- cdm$cohort1 %>% addCohortIntersectCount(targetCohortTable = "cohort2", window = list(c(-30, -1)))
cdm$cohort1

## # Source:   table<og_027_1710713493> [2 x 5]
## # Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
##   cohort_definition_id subject_id cohort_start_date cohort_end_date
##                  <dbl> <chr>      <date>            <date>         
## 1                    1 1          2010-03-01        2015-01-01     
## 2                    1 2          2012-03-01        2016-03-01     
## # ℹ 1 more variable: cohort_1_m30_to_m1 <dbl>

We get one event for subject_id = 1, the one starting and ending on the 2010-02-27, which is within the window before the index date 2010-03-01. The individual subject_id = 2 does not have any of the intersecting events of interest.

Note that, with the specifications by default, we pick one event (the second one), which is not incident in the window of interest, but overlaps it. Indeed, as the event start date is before the index date in the main cohort and the end date is after it, it is regarded as intersecting.

However, we could be interested in events starting in the window of interest. To only screen for those, we can set targetEndDate = cohort_start_date.

cdm$cohort1 <- cdm$cohort1 %>% addCohortIntersectCount(targetCohortTable = "cohort2", window = list(c(-30, -1)), targetEndDate = "cohort_start_date")
cdm$cohort1

## # Source:   table<og_034_1710713494> [2 x 5]
## # Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
##   cohort_definition_id subject_id cohort_start_date cohort_end_date
##                  <dbl> <chr>      <date>            <date>         
## 1                    1 1          2010-03-01        2015-01-01     
## 2                    1 2          2012-03-01        2016-03-01     
## # ℹ 1 more variable: cohort_1_m30_to_m1 <dbl>

Now we do not pick the event which starts on 2010-01-25, which is more than 30 days before the index date of the main cohort, 2010-03-01.

The input targetEndDate allows, therefore, to select whether to perform the intersection in an “overlapping” or “incident” way.

As for the functions addCohortIntersectDays() and addCohortIntersectDate(), they need a specific date in the target cohort to calculate time outputs. Therefore, only targetDate needs to be specified, which is set to “cohort_start_date” by default.

Example: addCohortIntersectDays function

addCohortIntersectDays(): adds a new column that indicates the number of days in which the subject intersects with another cohort during a specific time frame. If there are multiple intersections, only one will be computed, either the first or the latest one in the time window (“order” argument).

The function can be utilized to calculate the time to the event of interest, such as time until the first stroke after index date. If the patient did not experience the event, the function will return NA.

cohort1TimeTo <- cdm$cohort1 %>%
  addCohortIntersectDays(
    targetCohortTable = "cohort2",
    targetCohortId = 1,
    order = "first"
  )

cohort1TimeTo %>%
  glimpse()

## Rows: ??
## Columns: 6
## Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## $ cohort_definition_id <dbl> 1, 1
## $ subject_id           <chr> "1", "2"
## $ cohort_start_date    <date> 2010-03-01, 2012-03-01
## $ cohort_end_date      <date> 2015-01-01, 2016-03-01
## $ cohort_1_m30_to_m1   <dbl> 1, 0
## $ cohort_1_0_to_inf    <dbl> 2, 308

Example: addCohortIntersectDate function

addCohortIntersectDate(): appends a column containing the start date of cohorts that are present in a certain window.

This function can be handy in obtaining the date of the next occurrence of a specific event. For instance, suppose cohort1 comprises patients who enrolled when they received their first vaccine dose. We could use this function to obtain the date of their second dose if we have a cohort with vaccine records (e.g. cohort2):

cohort1NextEvent <- cdm$cohort1 %>%
  addCohortIntersectDate(
    targetCohortTable = "cohort2",
    order = "first",
    targetCohortId = 1,
    window = c(1, Inf)
  )

cohort1NextEvent %>%
  glimpse()

## Rows: ??
## Columns: 6
## Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## $ cohort_definition_id <dbl> 1, 1
## $ subject_id           <chr> "1", "2"
## $ cohort_start_date    <date> 2010-03-01, 2012-03-01
## $ cohort_end_date      <date> 2015-01-01, 2016-03-01
## $ cohort_1_m30_to_m1   <dbl> 1, 0
## $ cohort_1_1_to_inf    <date> 2010-03-03, 2013-01-03

Please note that the new columns added to the table (for all function presented) will have the format cohort_“cohort_definition_id”_ “time window”. If the window number is negative, a “m” will be added in front to indicate it and no sign will be added to positive numbers.

Example: addCohortIntersect function

addCohortIntersect(): Compute the intersect with a target cohort, you can compute the number of occurrences, a flag of presence, a certain date and/or the days difference.

We can use the function to compute all the intersect information with a target cohort. By default it will return output from addCohortIntersectCount(), addCohortIntersectFlag(), addCohortIntersectDate() and addCohortIntersectDays() in a time frame. Use this function if you want to append all the intersection information with this function. For information on what these function does, you can read above example.

cohort1CohortIntersect <- cdm$cohort1 %>%
  addCohortIntersect(
    targetCohortTable = "cohort2",
    order = "first",
    targetCohortId = 1,
    window = c(1, Inf)
  )
cohort1CohortIntersect %>%
  glimpse()

## Rows: ??
## Columns: 9
## Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## $ cohort_definition_id    <dbl> 1, 1
## $ subject_id              <chr> "1", "2"
## $ cohort_start_date       <date> 2010-03-01, 2012-03-01
## $ cohort_end_date         <date> 2015-01-01, 2016-03-01
## $ cohort_1_m30_to_m1      <dbl> 1, 0
## $ count_cohort_1_1_to_inf <dbl> 2, 1
## $ flag_cohort_1_1_to_inf  <dbl> 1, 1
## $ date_cohort_1_1_to_inf  <date> 2010-03-03, 2013-01-03
## $ days_cohort_1_1_to_inf  <dbl> 2, 308

We can also control which columns to append to your data by using the flag, count, date and time options in the function, if we do not want everything. For example if we only want the cohort count and flag we can use below example.

cohort1CohortIntersect <- cdm$cohort1 %>%
  addCohortIntersect(
    targetCohortTable = "cohort2",
    order = "first",
    targetCohortId = 1,
    window = c(1, Inf),
    flag = TRUE,
    count = TRUE,
    date = FALSE,
    days = FALSE
  )
cohort1CohortIntersect %>%
  glimpse()

## Rows: ??
## Columns: 7
## Database: DuckDB v0.10.0 [unknown@Linux 6.5.0-1016-azure:R 4.3.3/:memory:]
## $ cohort_definition_id    <dbl> 1, 1
## $ subject_id              <chr> "1", "2"
## $ cohort_start_date       <date> 2010-03-01, 2012-03-01
## $ cohort_end_date         <date> 2015-01-01, 2016-03-01
## $ cohort_1_m30_to_m1      <dbl> 1, 0
## $ count_cohort_1_1_to_inf <dbl> 2, 1
## $ flag_cohort_1_1_to_inf  <dbl> 1, 1

Martí Català, Mike Du, Yuchen Guo, Kim López-Güell, Xintong Li, Núria Mercadé-Besora, and Edward Burn

2024-03-17

Introduction

Example: addCohortIntersectFlag and addCohortIntersectCount functions

Example: addCohortIntersectDays function

Example: addCohortIntersectDate function

Example: addCohortIntersect function