In 1. Defining Cohorts we discussed how to define and generate cohorts for TreatmentPatterns. In this section we assume you are able to generate a cohort table using either CohortGenerator or CDMConnector.

Lets generate our Viral Sinusitis dummy cohorts provided in TreatmentPatterns using CDMConnector.

Generating Cohorts

First we need to read in our cohorts.

library(CDMConnector)

cohortSet <- readCohortSet(
  path = system.file(package = "TreatmentPatterns", "exampleCohorts")
)

cohortSet
## # A tibble: 8 × 5
##   cohort_definition_id cohort_name    cohort       json   cohort_name_snakecase
##                  <int> <chr>          <list>       <list> <chr>                
## 1                    1 acetaminophen  <named list> <chr>  acetaminophen        
## 2                    2 amoxicillin    <named list> <chr>  amoxicillin          
## 3                    3 aspirin        <named list> <chr>  aspirin              
## 4                    4 clavulanate    <named list> <chr>  clavulanate          
## 5                    5 death          <named list> <chr>  death                
## 6                    6 doxylamine     <named list> <chr>  doxylamine           
## 7                    7 penicillinv    <named list> <chr>  penicillinv          
## 8                    8 viralsinusitis <named list> <chr>  viralsinusitis

Then we can open a connection to our database, in this case Eunomia.

## 
## Attaching package: 'DatabaseConnector'
## The following objects are masked from 'package:CDMConnector':
## 
##     dbms, insertTable
## 
## Download completed!
con <- DBI::dbConnect(
  drv = duckdb::duckdb(),
  dbdir = eunomia_dir()
)

cdm <- cdmFromCon(
  con = con,
  cdmSchema = "main",
  writeSchema = "main"
)
cdm
## 
## ── # OMOP CDM reference (duckdb) of Synthea synthetic health database ──────────
## • omop tables: person, observation_period, visit_occurrence, visit_detail,
## condition_occurrence, drug_exposure, procedure_occurrence, device_exposure,
## measurement, observation, death, note, note_nlp, specimen, fact_relationship,
## location, care_site, provider, payer_plan_period, cost, drug_era, dose_era,
## condition_era, metadata, cdm_source, concept, vocabulary, domain,
## concept_class, concept_relationship, relationship, concept_synonym,
## concept_ancestor, source_to_concept_map, drug_strength
## • cohort tables: -
## • achilles tables: -
## • other tables: -

Finally we can generate our cohort set as a cohort table into the database

cdm <- generateCohortSet(
  cdm = cdm,
  cohortSet = cohortSet,
  name = "cohort_table",
  overwrite = TRUE
)
##  Generating 8 cohorts
##  Generating cohort (1/8) - acetaminophen Generating cohort (1/8) - acetaminophen [263ms]
##  Generating cohort (2/8) - amoxicillin Generating cohort (2/8) - amoxicillin [177ms]
##  Generating cohort (3/8) - aspirin Generating cohort (3/8) - aspirin [167ms]
##  Generating cohort (4/8) - clavulanate Generating cohort (4/8) - clavulanate [158ms]
##  Generating cohort (5/8) - death Generating cohort (5/8) - death [112ms]
##  Generating cohort (6/8) - doxylamine Generating cohort (6/8) - doxylamine [165ms]
##  Generating cohort (7/8) - penicillinv Generating cohort (7/8) - penicillinv [161ms]
##  Generating cohort (8/8) - viralsinusitis Generating cohort (8/8) - viralsinusitis [232ms]
## Warning: ! 5 casted column in cohort_table (cohort_attrition) as do not match expected
##   column type:
##  `number_records` from numeric to integer
##  `number_subjects` from numeric to integer
##  `reason_id` from numeric to integer
##  `excluded_records` from numeric to integer
##  `excluded_subjects` from numeric to integer
## Warning: ! 1 column in cohort_table do not match expected column type:
##  `subject_id` is numeric but expected integer
cohortCount(cdm$cohort_table)
## # A tibble: 8 × 3
##   cohort_definition_id number_records number_subjects
##                  <int>          <int>           <int>
## 1                    1           2679            2679
## 2                    2           2130            2130
## 3                    3           1927            1927
## 4                    4           2021            2021
## 5                    5              0               0
## 6                    6           1393            1393
## 7                    7           1732            1732
## 8                    8           2159            2159

We can see that all our cohorts are generated in the cohort table. The cohort with cohort_definition_id 5 has a count of 0, this is the Death cohort. This is not detrimental, as exit cohorts are optional, but good to know that Death will not show up in our results.

Computing pathways

The computePathways function of TreatmentPatterns allows us to compute treatment pathways in our cohort table. In order to do this we need to pre-specify some parameters.

According to the documentation we need a data.frame that specifies what cohort is of which type.

Data frame containing the following columns and data types:

cohortId numeric(1) Cohort ID’s of the cohorts to be used in the cohort table.

cohortName character(1) Cohort names of the cohorts to be used in the cohort table.

type character(1) [“target”, “event’,”exit”] Cohort type, describing if the cohort is a target, event, or exit cohort

We are able to re-use our cohortSet for this. As it already contains the cohort ID’s and cohort names. We only have to remove the cohort and json columns, add a type column, and rename cohort_definition_id to cohortId and cohort_name to cohortName.

library(dplyr)

cohorts <- cohortSet %>%
  # Remove 'cohort' and 'json' columns
  select(-"cohort", -"json", -"cohort_name_snakecase") %>%
  mutate(type = c("event", "event", "event", "event", "exit", "event", "event", "target")) %>%
  rename(
    cohortId = "cohort_definition_id",
    cohortName = "cohort_name",
  )

cohorts
## # A tibble: 8 × 3
##   cohortId cohortName     type  
##      <int> <chr>          <chr> 
## 1        1 acetaminophen  event 
## 2        2 amoxicillin    event 
## 3        3 aspirin        event 
## 4        4 clavulanate    event 
## 5        5 death          exit  
## 6        6 doxylamine     event 
## 7        7 penicillinv    event 
## 8        8 viralsinusitis target

With our data.frame of cohort types, CDM reference, and the cohort table name in our database we can compute the treatment pathways, with all of the other settings as their defaults.

library(TreatmentPatterns)

defaultSettings <- computePathways(
  cohorts = cohorts,
  cohortTableName = "cohort_table",
  cdm = cdm
)
## -- Qualifying records for cohort definitions: 1, 2, 3, 4, 5, 6, 7, 8
##  Records: 14041
##  Subjects: 2693
## -- Removing records < minEraDuration (0)
##  Records: 11386
##  Subjects: 2159
## -- Removing events where index date < target index date + indexDateOffset (0)
##  Records: 8381
##  Subjects: 2159
## -- splitEventCohorts
##  Records: 8366
##  Subjects: 2144
## -- eraCollapse (30)
##  Records: 8366
##  Subjects: 2144
## -- Iteration 1: minPostCombinationDuration (30), combinatinoWindow (30)
##  Records: 558
##  Subjects: 512
## -- Iteration 2: minPostCombinationDuration (30), combinatinoWindow (30)
##  Records: 554
##  Subjects: 512
## -- After Combination
##  Records: 554
##  Subjects: 512
## -- filterTreatments (First)
##  Records: 553
##  Subjects: 512
## -- treatment construction done
##  Records: 553
##  Subjects: 512
defaultSettings
## # Andromeda object
## # Physical location:  C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0\file81f87d1d6804.sqlite
## 
## Tables:
## $addRowsFRFS_1 (personId, indexYear, eventCohortId, eventStartDate, eventEndDate, type, age, sex, durationEra, sortOrder, gapPrevious, selectedRows, switch, combinationFRFS, combinationLRFS, eventStartDateNext, eventEndDatePrevious, eventEndDateNext, eventCohortIdPrevious)
## $addRowsFRFS_2 (personId, indexYear, eventCohortId, eventStartDate, age, sex, eventEndDate, durationEra, gapPrevious, sortOrder, selectedRows, switch, combinationFRFS, combinationLRFS, eventStartDateNext, eventEndDatePrevious, eventEndDateNext, eventCohortIdPrevious)
## $addRowsLRFS_1 (personId, indexYear, eventCohortId, eventStartDate, eventEndDate, type, age, sex, durationEra, sortOrder, gapPrevious, selectedRows, switch, combinationFRFS, combinationLRFS, eventStartDateNext, eventEndDatePrevious, eventEndDateNext, eventCohortIdPrevious, checkDuration)
## $addRowsLRFS_2 (personId, indexYear, eventCohortId, eventStartDate, age, sex, eventEndDate, durationEra, gapPrevious, sortOrder, selectedRows, switch, combinationFRFS, combinationLRFS, eventStartDateNext, eventEndDatePrevious, eventEndDateNext, eventCohortIdPrevious, checkDuration)
## $attrition (number_records, number_subject, reason_id, reason, time)
## $cohortTable (cohortId.x, personId, subject_id_origin, startDate.x, endDate.x, age.x, sex.x, type.x, cohortId.y, startDate.y, endDate.y, age.y, sex.y, type.y, indexYear, indexDate)
## $cohorts (cohortId, cohortName, type)
## $currentCohorts (cohortId, personId, subject_id_origin, startDate, endDate, age, sex)
## $dbplyr_6Epxqj9f86 (number_records, number_subject, reason_id, reason, time)
## $dbplyr_A6SYQa2Grm (number_records, number_subject, reason_id, reason, time)
## $dbplyr_Y4ySoaoQh2 (number_records, number_subject, reason_id, reason, time)
## $dbplyr_Z2M8Xrcunc (number_records, number_subject, reason_id, reason, time)
## $dbplyr_cSkDQutjYF (number_records, number_subject, reason_id, reason, time)
## $dbplyr_egRFrGyyPF (number_records, number_subject, reason_id, reason, time)
## $dbplyr_nGJacsYo9B (number_records, number_subject, reason_id, reason, time)
## $dbplyr_oLN1X87rlp (number_records, number_subject, reason_id, reason, time)
## $dbplyr_ol3lkrBoLU (number_records, number_subject, reason_id, reason, time)
## $dbplyr_sI90271IHm (number_records, number_subject, reason_id, reason, time)
## $eventCohorts (cohortId, personId, subject_id_origin, startDate, endDate, age, sex, type)
## $exitCohorts (cohortId, personId, subject_id_origin, startDate, endDate, age, sex, type)
## $exitHistory (personId, indexYear, eventCohortId, eventStartDate, eventEndDate, age, sex, durationEra)
## $labels (eventCohortId, eventCohortName)
## $metadata (cdmSourceName, cdmSourceAbbreviation, cdmReleaseDate, vocabularyVersion, executionStartDate, packageVersion, rVersion, platform, execution_end_date)
## $sqlite_stat1 (tbl, idx, stat)
## $sqlite_stat4 (tbl, idx, neq, nlt, ndlt, sample)
## $targetCohorts (cohortId, personId, subject_id_origin, startDate, endDate, age, sex, type, indexYear, indexDate)
## $treatmentHistory (eventCohortId, personId, indexYear, eventStartDate, age, sex, eventEndDate, durationEra, sortOrder, eventSeq, eventCohortName)

The output of computePathways is an Andromeda environment, which allows us to investigate intermediate results and patient-level data. This data is not sharable.

# treatmentHistory table
head(defaultSettings$treatmentHistory)
## # Source:   SQL [6 x 11]
## # Database: sqlite 3.46.0 [C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0\file81f87d1d6804.sqlite]
##   eventCohortId personId indexYear eventStartDate   age sex    eventEndDate
##   <chr>            <dbl>     <dbl>          <int> <dbl> <chr>         <int>
## 1 1                  275      1982           4425     4 MALE           4455
## 2 1                  570      1968           -562     1 MALE           -532
## 3 1                  195      1984           5156     4 FEMALE         5216
## 4 1                  868      1964          -1956     6 MALE          -1866
## 5 1                  164      1954          -5677     2 MALE          -5617
## 6 1                  132      1966          -1105     6 FEMALE        -1015
## # ℹ 4 more variables: durationEra <int>, sortOrder <dbl>, eventSeq <int>,
## #   eventCohortName <chr>
# metadata table
defaultSettings$metadata
## # Source:   table<`metadata`> [1 x 9]
## # Database: sqlite 3.46.0 [C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0\file81f87d1d6804.sqlite]
##   cdmSourceName           cdmSourceAbbreviation cdmReleaseDate vocabularyVersion
##   <chr>                   <chr>                 <date>         <chr>            
## 1 Synthea synthetic heal… Synthea               2019-05-25     v5.0 18-JAN-19   
## # ℹ 5 more variables: executionStartDate <chr>, packageVersion <chr>,
## #   rVersion <chr>, platform <chr>, execution_end_date <chr>
# First Recieved First Stopped
head(defaultSettings$addRowsFRFS_1)
## # Source:   SQL [0 x 19]
## # Database: sqlite 3.46.0 [C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0\file81f87d1d6804.sqlite]
## # ℹ 19 variables: personId <lgl>, indexYear <dbl>, eventCohortId <lgl>,
## #   eventStartDate <dbl>, eventEndDate <lgl>, type <lgl>, age <dbl>, sex <chr>,
## #   durationEra <lgl>, sortOrder <lgl>, gapPrevious <lgl>, selectedRows <lgl>,
## #   switch <lgl>, combinationFRFS <lgl>, combinationLRFS <lgl>,
## #   eventStartDateNext <lgl>, eventEndDatePrevious <lgl>,
## #   eventEndDateNext <lgl>, eventCohortIdPrevious <lgl>
# Last Recieved Last Stopped
head(defaultSettings$addRowsLRFS_1)
## # Source:   SQL [6 x 20]
## # Database: sqlite 3.46.0 [C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0\file81f87d1d6804.sqlite]
##   personId indexYear eventCohortId eventStartDate eventEndDate type    age sex  
##      <dbl>     <dbl> <chr>                  <int>        <int> <chr> <dbl> <chr>
## 1        3     -3284 2                      12854        12854 event    46 FEMA…
## 2        4       690 2                       7081         7081 event    19 FEMA…
## 3        5     -3648 2                         27           27 event    12 MALE 
## 4        8     -1071 2                      12044        12044 event    40 FEMA…
## 5        9       803 2                      14022        14022 event    38 FEMA…
## 6       10       163 2                      10465        10465 event    31 MALE 
## # ℹ 12 more variables: durationEra <int>, sortOrder <dbl>, gapPrevious <int>,
## #   selectedRows <dbl>, switch <dbl>, combinationFRFS <dbl>,
## #   combinationLRFS <dbl>, eventStartDateNext <int>,
## #   eventEndDatePrevious <int>, eventEndDateNext <int>,
## #   eventCohortIdPrevious <chr>, checkDuration <dbl>

DatabaseConnector is also supported. The following parameters are required instead of cdm:

  1. connectionDetails: ConnectionDetails object form DatabaseConnector.
  2. cdmSchema: Schema where the CDM exists.
  3. resultSchema: Schema to write the cohort table to.
  4. tempEmulationSchema: Some database platforms like Oracle and Impala do not truly support temp tables. To emulate temp tables, provide a schema with write privileges where temp tables can be created.

The following code snippet works with Eunomia, a cohort table (cohort_table) exists in the database, and a cohorts data frame has been created.

computePathways(
  cohorts = cohorts,
  cohortTableName = cohortTableName,
  connectionDetails = Eunomia::getEunomiaConnectionDetails(),
  cdmSchema = "main",
  resultSchema = "main",
  tempEmulationSchema = NULL
)

Pathway settings

Even though the default settings work well for most use cases, it might not work for all situations. The settings below allow us to influence how the events of interest should be processed to form treatment pathways.

Parameter Value Description
indexDateOffset 0 Offset the index date of the Target cohort.
minEraDuration 0 Minimum time an event era should last to be included in analysis
eraCollapseSize 30 Window of time between which two eras of the same event cohort are collapsed into one era
combinationWindow 30 Window of time two event cohorts need to overlap to be considered a combination treatment
minPostCombinationDuration 30 Minimum time an event era before or after a generated combination treatment should last to be included in analysis
filterTreatments First Select first occurrence of (‘First’); changes between (‘Changes’); or all event cohorts (‘All’).
maxPathLength 5 Maximum number of steps included in treatment pathway

The following figure shows how each of these parameters affect the computation of the treatment pathway.

pathwaySettings You can add these settings to the computePathways function call. Lets see what happens when we set our minEraDuration to 60, but keep the rest of the settings mentioned as their default values.

minEra60 <- computePathways(
  cohorts = cohorts,
  cohortTableName = "cohort_table",
  cdm = cdm,
  # Pathway settings
  indexDateOffset = 0,
  minEraDuration = 60,
  eraCollapseSize = 30,
  combinationWindow = 30,
  minPostCombinationDuration = 30,
  filterTreatments = "First",
  maxPathLength = 5
)
## Warning in validateComputePathways(): The `minPostCombinationDuration` is set
## lower than the `minEraDuration`, this might result in unexpected behavior
## Warning in validateComputePathways(): The `combinationWindow` is set lower than
## the `minEraDuration`, this might result in unexpected behavior
## -- Qualifying records for cohort definitions: 1, 2, 3, 4, 5, 6, 7, 8
##  Records: 14041
##  Subjects: 2693
## -- Removing records < minEraDuration (60)
##  Records: 2523
##  Subjects: 2159
## -- Removing events where index date < target index date + indexDateOffset (0)
##  Records: 2216
##  Subjects: 2159
## -- splitEventCohorts
##  Records: 336
##  Subjects: 279
## -- eraCollapse (30)
##  Records: 336
##  Subjects: 279
## -- Iteration 1: minPostCombinationDuration (30), combinatinoWindow (30)
##  Records: 291
##  Subjects: 279
## -- After Combination
##  Records: 291
##  Subjects: 279
## -- filterTreatments (First)
##  Records: 291
##  Subjects: 279
## -- treatment construction done
##  Records: 291
##  Subjects: 279

Number of treatments with a minimum duration of greater or equal to 0 days.

defaultSettings$treatmentHistory %>%
  collect() %>%
  nrow()
## [1] 553

Number of treatments with a minimum duration of greater or equal to 60 days.

minEra60$treatmentHistory %>%
  collect() %>%
  nrow()
## [1] 291

Acute and Therapy splits

We can also split our defined event cohorts into acute and therapy cohorts.

Parameter Description
splitEventCohorts Specify event cohort ID’s (i.e. c(1, 2, 3) to split in acute (< splitTime days) and therapy (>= splitTime days). As an example treatment Drug A could be split into Drug A (therapy) and Drug A (acute). And we could set our splitTime to 30. Drug A (acute) would be the time before day 0-29 and Drug A (therapy) would be the day 30 or later.
splitTime Specify number of days at which each of the split event cohorts should be split in acute and therapy (i.e. c(20, 30, 10)). The length of splitTime must equal the length of splitEventCohorts

Let’s say we want to assume that the first 60 days of our treatment is acute, and beyond that therapy.

splitAcuteTherapy <- computePathways(
  cohorts = cohorts,
  cohortTableName = "cohort_table",
  cdm = cdm,
  # Split settings
  splitEventCohorts = 1,
  splitTime = 60
)
## -- Qualifying records for cohort definitions: 1, 2, 3, 4, 5, 6, 7, 8
##  Records: 14041
##  Subjects: 2693
## -- Removing records < minEraDuration (0)
##  Records: 11386
##  Subjects: 2159
## -- Removing events where index date < target index date + indexDateOffset (0)
##  Records: 8381
##  Subjects: 2159
## -- splitEventCohorts
##  Records: 8366
##  Subjects: 2144
## -- eraCollapse (30)
##  Records: 8366
##  Subjects: 2144
## -- Iteration 1: minPostCombinationDuration (30), combinatinoWindow (30)
##  Records: 558
##  Subjects: 512
## -- Iteration 2: minPostCombinationDuration (30), combinatinoWindow (30)
##  Records: 554
##  Subjects: 512
## -- After Combination
##  Records: 554
##  Subjects: 512
## -- filterTreatments (First)
##  Records: 553
##  Subjects: 512
## -- treatment construction done
##  Records: 553
##  Subjects: 512
splitAcuteTherapy$treatmentHistory %>%
  pull(.data$eventCohortName) %>% unique()
## [1] "acetaminophen (acute)"             "acetaminophen (acute)+amoxicillin"
## [3] "acetaminophens (therapy)"          "amoxicillin"                      
## [5] "amoxicillin+clavulanate"           "aspirin"                          
## [7] "clavulanate"

We can see that our Acetaminophen cohorts are split into Acetaminophen (acute) and (therapy). Acute labels all the Acetaminophen cohorts lasting less than our defined splitTime, in this case 60 days. Therapy labels all the Acetaminophen cohorts lasting 60 days or more.

Include treatments in a time frame

We can dictate in what time frame we want to look. We can look from the start date of our target cohort and on wards, or we can look before the end date of our target cohort. By default TreatmentPatterns looks from the start date and onwards.

includeEndDate <- computePathways(
  cohorts = cohorts,
  cohortTableName = "cohort_table",
  cdm = cdm,
  # Split settings
  includeTreatments = "endDate"
)
## -- Qualifying records for cohort definitions: 1, 2, 3, 4, 5, 6, 7, 8
##  Records: 14041
##  Subjects: 2693
## -- Removing records < minEraDuration (0)
##  Records: 11386
##  Subjects: 2159
## -- Removing events where index date < target index date + indexDateOffset (0)
##  Records: 8381
##  Subjects: 2159
## -- splitEventCohorts
##  Records: 8366
##  Subjects: 2144
## -- eraCollapse (30)
##  Records: 8366
##  Subjects: 2144
## -- Iteration 1: minPostCombinationDuration (30), combinatinoWindow (30)
##  Records: 558
##  Subjects: 512
## -- Iteration 2: minPostCombinationDuration (30), combinatinoWindow (30)
##  Records: 554
##  Subjects: 512
## -- After Combination
##  Records: 554
##  Subjects: 512
## -- filterTreatments (First)
##  Records: 553
##  Subjects: 512
## -- treatment construction done
##  Records: 553
##  Subjects: 512
identical(
  includeEndDate$treatmentHistory %>% pull(personId),
  defaultSettings$treatmentHistory %>% pull(personId)
)
## [1] TRUE

In our example case for Viral Sinusitis it appears to not matter, as the personID’s are identical.

Exporting result objects

The export function allows us to export the generated result objects from computePathways. There are several arguments that we can change to alter the behavior, depending on what we are allowed to share.

minCellCount and censorType

Let’s say we are only able to share results of groups of subjects that have at least 5 subjects in them.

tempDir <- tempdir()

export(
  andromeda = defaultSettings,
  outputPath = file.path(tempDir, "default"),
  minCellCount = 5
)
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/default/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/default/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/default/treatmentPathways.csv
## Censoring 1219 pathways with a frequency <5 to 5.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/default/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/default/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/default/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/default/countsSex.csv

We can also choose between different methods how to handle pathways that fall below are specified minCellCount. These types are 1) "cellCount", 2) "remove", and 3) "mean".

We could say we want to censor all pathways that fall below the minCellCount to be censored to the minCellCount.

export(
  andromeda = minEra60,
  outputPath = file.path(tempDir, "minEra60_cellCount"),
  minCellCount = 5,
  censorType = "minCellCount"
)
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_cellCount/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_cellCount/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_cellCount/treatmentPathways.csv
## Censoring 983 pathways with a frequency <5 to 5.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_cellCount/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_cellCount/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_cellCount/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_cellCount/countsSex.csv

Or we could completely remove them

export(
  andromeda = minEra60,
  outputPath = file.path(tempDir, "minEra60_remove"),
  minCellCount = 5,
  censorType = "remove"
)
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_remove/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_remove/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_remove/treatmentPathways.csv
## Removing 983 pathways with a frequency <5.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_remove/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_remove/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_remove/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_remove/countsSex.csv

Or finally we can censor them as the mean of all the groups that fall below the minCellCount.

export(
  andromeda = minEra60,
  outputPath = file.path(tempDir, "minEra60_mean"),
  minCellCount = 5,
  censorType = "mean"
)
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_mean/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_mean/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_mean/treatmentPathways.csv
## Censoring 983 pathways with a frequency <5 to mean.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_mean/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_mean/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_mean/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_mean/countsSex.csv

ageWindow

We can also specify an age window.

export(
  andromeda = splitAcuteTherapy,
  outputPath = file.path(tempDir, "splitAcuteTherapy_ageWindow3"),
  minCellCount = 5,
  censorType = "mean",
  ageWindow = 3
)
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindow3/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindow3/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindow3/treatmentPathways.csv
## Censoring 2044 pathways with a frequency <5 to mean.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindow3/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindow3/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindow3/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindow3/countsSex.csv

Or a collection of ages.

export(
  andromeda = splitAcuteTherapy,
  outputPath = file.path(tempDir, "splitAcuteTherapy_ageWindowMultiple"),
  minCellCount = 5,
  censorType = "mean",
  ageWindow = c(0, 18, 25, 30, 40, 50, 60, 150)
)
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindowMultiple/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindowMultiple/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindowMultiple/treatmentPathways.csv
## Censoring 1275 pathways with a frequency <5 to mean.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindowMultiple/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindowMultiple/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindowMultiple/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindowMultiple/countsSex.csv

archiveName

Finally we can also specify an archiveName which is the name of a zip-file to zip all our output csv-files to.

export(
  andromeda = includeEndDate,
  outputPath = file.path(tempDir, "includeEndDate"),
  minCellCount = 5,
  censorType = "mean",
  ageWindow = 3,
  archiveName = "output.zip"
)
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/treatmentPathways.csv
## Censoring 1812 pathways with a frequency <5 to mean.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/countsSex.csv
## Zipping files to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/output.zip

All-in-one

Instead of using computePathways and export, instead we could use executeTreatmentPatterns. Which is an all-in-one function that trades full control for convenience.

executeTreatmentPatterns(
  cohorts = cohorts,
  cohortTableName = "cohort_table",
  outputPath = file.path(tempDir, "all-in-one"),
  cdm = cdm,
  minEraDuration = 0,
  eraCollapseSize = 30,
  combinationWindow = 30,
  minCellCount = 5
)
## -- Qualifying records for cohort definitions: 1, 2, 3, 4, 5, 6, 7, 8
##  Records: 14041
##  Subjects: 2693
## -- Removing records < minEraDuration (0)
##  Records: 11386
##  Subjects: 2159
## -- Removing events where index date < target index date + indexDateOffset (0)
##  Records: 8381
##  Subjects: 2159
## -- splitEventCohorts
##  Records: 8366
##  Subjects: 2144
## -- eraCollapse (30)
##  Records: 8366
##  Subjects: 2144
## -- Iteration 1: minPostCombinationDuration (30), combinatinoWindow (30)
##  Records: 558
##  Subjects: 512
## -- Iteration 2: minPostCombinationDuration (30), combinatinoWindow (30)
##  Records: 554
##  Subjects: 512
## -- After Combination
##  Records: 554
##  Subjects: 512
## -- filterTreatments (First)
##  Records: 553
##  Subjects: 512
## -- treatment construction done
##  Records: 553
##  Subjects: 512
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/treatmentPathways.csv
## Censoring 1540 pathways with a frequency <5 to mean.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/countsSex.csv
## Zipping files to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/TreatmentPatterns-Output.zip

When using DatabaseConnector we can substitute the cdm object with connectionDetails, cdmSchema, resultSchema, and tempEmulationSchema.

executeTreatmentPatterns(
  cohorts = cohorts,
  cohortTableName = "cohort_table",
  outputPath = file.path(tempDir, "all-in-one"),
  connectionDetails = Eunomia::getEunomiaConnectionDetails(),
  cdmSchema = "main",
  resultSchema = "main",
  tempEmulationSchema = NULL,
  minEraDuration = 0,
  eraCollapseSize = 30,
  combinationWindow = 30,
  minCellCount = 5
)

Evaluating output

Now that we have exported our output, in various ways, we can evaluate the output. As you may have noticed the export function exports 6 csv-files: 1) treatmentPathways.csv, 2) countsAge.csv, 3) countsSex.csv, 4) countsYear.csv, 5) summaryStatsTherapyDuraion.csv, and 6) metadata.csv

treatmentPathways

The treatmentPathways file contains all the pathways found, with a frequency, pairwise stratified by age group, sex and index year.

treatmentPathways <- read.csv(file.path(tempDir, "default", "treatmentPathways.csv"))
head(treatmentPathways)
##   path freq  age    sex indexYear
## 1 None   12 0-10 female      1950
## 2 None   12 0-10 female      1951
## 3 None   17 0-10 female      1952
## 4 None   19 0-10 female      1953
## 5 None   11 0-10 female      1954
## 6 None   18 0-10 female      1955

We are able to filter based on the strata, and filter on a frequency > 5.

all <- treatmentPathways %>%
  filter(
    age == "all",
    sex == "all",
    indexYear == "all",
    freq > 5
  )

We can see the pathways contain the treatment names we provided in our event cohorts. Besides that we also see the paths are annoted with a + or -. The + indicates two treatments are a combination therapy, i.e. Acetaminophen+Amoxicillin is a combination of Acetaminophen and Amoxicillin. The - indicates a switch between treatments, i.e. Aspirin-Acetaminophen is a switch from Aspirin to Acetaminophen. Note that these combinations and switches can occur in the same pathway, i.e. Amoxicillin+Clavulanate-Aspirin. The first treatment is a combination of Amoxicillin and Clavulanate that switches to Aspirin.

countsAge, countsSex, and countsYear

The countsAge, countsSex, and countsYear contain counts per age, sex, and index year.

age <- read.csv(file.path(tempDir, "default", "countsAge.csv"))
sex <- read.csv(file.path(tempDir, "default", "countsSex.csv"))
year <- read.csv(file.path(tempDir, "default", "countsYear.csv"))

head(age)
##   age   n
## 1   1 311
## 2   2 550
## 3   3 340
## 4   4 181
## 5   5 158
## 6   6 107
head(sex)
##      sex    n
## 1 FEMALE 1092
## 2   MALE 1067
head(year)
##   indexYear  n
## 1      1950 43
## 2      1951 40
## 3      1952 54
## 4      1953 47
## 5      1954 47
## 6      1955 49

summaryStatsTherapyDuration

The summaryEventDuration file contains summary statistics from different events, across all found “lines”. A “line” is equal to the level in the Sunburst or Sankey diagrams. The summary statistics allow for plotting of boxplots with the plotEventDuration() function.

eventDuration <- read.csv(file.path(tempDir, "default", "summaryEventDuration.csv"))

plotEventDuration(
  eventDurations = eventDuration,
  minCellCount = 5,
  treatmentGroups = "both",
  eventLines = NULL,
  includeOverall = TRUE
)

metadata

The metadata file is a file that contains information about the circumstances the analysis was performed in, and information about R, and the CDM.

read.csv(file.path(tempDir, "default", "metadata.csv"))
##                       cdmSourceName cdmSourceAbbreviation cdmReleaseDate
## 1 Synthea synthetic health database               Synthea     2019-05-25
##   vocabularyVersion executionStartDate packageVersion
## 1    v5.0 18-JAN-19         2024-11-26         2.6.10
##                            rVersion           platform execution_end_date
## 1 R version 4.3.3 (2024-02-29 ucrt) x86_64-w64-mingw32         2024-11-26

Sunburst Plot & Sankey Diagram

From the filtered treatmentPathways file we are able to create a sunburst plot.

createSunburstPlot(treatmentPathways = all, legend = list(w = 300))
Legend

Or a Sankey Diagram.

Both plots are interactive in an HTML-environment, and are easy to include in shiny applications.