vignettes/articles/ComputingTreatmentPathways.Rmd
ComputingTreatmentPathways.Rmd
In 1. Defining Cohorts we discussed how to define
and generate cohorts for TreatmentPatterns
. In this section
we assume you are able to generate a cohort table using either
CohortGenerator
or CDMConnector
.
Lets generate our Viral Sinusitis dummy cohorts provided in
TreatmentPatterns
using CDMConnector
.
First we need to read in our cohorts.
library(CDMConnector)
cohortSet <- readCohortSet(
path = system.file(package = "TreatmentPatterns", "exampleCohorts")
)
cohortSet
## # A tibble: 8 × 5
## cohort_definition_id cohort_name cohort json cohort_name_snakecase
## <int> <chr> <list> <list> <chr>
## 1 1 acetaminophen <named list> <chr> acetaminophen
## 2 2 amoxicillin <named list> <chr> amoxicillin
## 3 3 aspirin <named list> <chr> aspirin
## 4 4 clavulanate <named list> <chr> clavulanate
## 5 5 death <named list> <chr> death
## 6 6 doxylamine <named list> <chr> doxylamine
## 7 7 penicillinv <named list> <chr> penicillinv
## 8 8 viralsinusitis <named list> <chr> viralsinusitis
Then we can open a connection to our database, in this case Eunomia.
##
## Attaching package: 'DatabaseConnector'
## The following objects are masked from 'package:CDMConnector':
##
## dbms, insertTable
##
## Download completed!
con <- DBI::dbConnect(
drv = duckdb::duckdb(),
dbdir = eunomia_dir()
)
cdm <- cdmFromCon(
con = con,
cdmSchema = "main",
writeSchema = "main"
)
cdm
##
## ── # OMOP CDM reference (duckdb) of Synthea synthetic health database ──────────
## • omop tables: person, observation_period, visit_occurrence, visit_detail,
## condition_occurrence, drug_exposure, procedure_occurrence, device_exposure,
## measurement, observation, death, note, note_nlp, specimen, fact_relationship,
## location, care_site, provider, payer_plan_period, cost, drug_era, dose_era,
## condition_era, metadata, cdm_source, concept, vocabulary, domain,
## concept_class, concept_relationship, relationship, concept_synonym,
## concept_ancestor, source_to_concept_map, drug_strength
## • cohort tables: -
## • achilles tables: -
## • other tables: -
Finally we can generate our cohort set as a cohort table into the database
cdm <- generateCohortSet(
cdm = cdm,
cohortSet = cohortSet,
name = "cohort_table",
overwrite = TRUE
)
## ℹ Generating 8 cohorts
## ℹ Generating cohort (1/8) - acetaminophen✔ Generating cohort (1/8) - acetaminophen [263ms]
## ℹ Generating cohort (2/8) - amoxicillin✔ Generating cohort (2/8) - amoxicillin [177ms]
## ℹ Generating cohort (3/8) - aspirin✔ Generating cohort (3/8) - aspirin [167ms]
## ℹ Generating cohort (4/8) - clavulanate✔ Generating cohort (4/8) - clavulanate [158ms]
## ℹ Generating cohort (5/8) - death✔ Generating cohort (5/8) - death [112ms]
## ℹ Generating cohort (6/8) - doxylamine✔ Generating cohort (6/8) - doxylamine [165ms]
## ℹ Generating cohort (7/8) - penicillinv✔ Generating cohort (7/8) - penicillinv [161ms]
## ℹ Generating cohort (8/8) - viralsinusitis✔ Generating cohort (8/8) - viralsinusitis [232ms]
## Warning: ! 5 casted column in cohort_table (cohort_attrition) as do not match expected
## column type:
## • `number_records` from numeric to integer
## • `number_subjects` from numeric to integer
## • `reason_id` from numeric to integer
## • `excluded_records` from numeric to integer
## • `excluded_subjects` from numeric to integer
## Warning: ! 1 column in cohort_table do not match expected column type:
## • `subject_id` is numeric but expected integer
cohortCount(cdm$cohort_table)
## # A tibble: 8 × 3
## cohort_definition_id number_records number_subjects
## <int> <int> <int>
## 1 1 2679 2679
## 2 2 2130 2130
## 3 3 1927 1927
## 4 4 2021 2021
## 5 5 0 0
## 6 6 1393 1393
## 7 7 1732 1732
## 8 8 2159 2159
We can see that all our cohorts are generated in the cohort table. The cohort with cohort_definition_id 5 has a count of 0, this is the Death cohort. This is not detrimental, as exit cohorts are optional, but good to know that Death will not show up in our results.
The computePathways
function of
TreatmentPatterns
allows us to compute treatment pathways
in our cohort table. In order to do this we need to pre-specify some
parameters.
According to the documentation we need a data.frame
that
specifies what cohort is of which type.
Data frame containing the following columns and data types:
cohortId numeric(1) Cohort ID’s of the cohorts to be used in the cohort table.
cohortName character(1) Cohort names of the cohorts to be used in the cohort table.
type character(1) [“target”, “event’,”exit”] Cohort type, describing if the cohort is a target, event, or exit cohort
We are able to re-use our cohortSet
for this. As it
already contains the cohort ID’s and cohort names. We only have to
remove the cohort
and json
columns, add a
type
column, and rename cohort_definition_id
to cohortId
and cohort_name
to
cohortName
.
library(dplyr)
cohorts <- cohortSet %>%
# Remove 'cohort' and 'json' columns
select(-"cohort", -"json", -"cohort_name_snakecase") %>%
mutate(type = c("event", "event", "event", "event", "exit", "event", "event", "target")) %>%
rename(
cohortId = "cohort_definition_id",
cohortName = "cohort_name",
)
cohorts
## # A tibble: 8 × 3
## cohortId cohortName type
## <int> <chr> <chr>
## 1 1 acetaminophen event
## 2 2 amoxicillin event
## 3 3 aspirin event
## 4 4 clavulanate event
## 5 5 death exit
## 6 6 doxylamine event
## 7 7 penicillinv event
## 8 8 viralsinusitis target
With our data.frame
of cohort types, CDM reference, and
the cohort table name in our database we can compute the treatment
pathways, with all of the other settings as their defaults.
library(TreatmentPatterns)
defaultSettings <- computePathways(
cohorts = cohorts,
cohortTableName = "cohort_table",
cdm = cdm
)
## -- Qualifying records for cohort definitions: 1, 2, 3, 4, 5, 6, 7, 8
## Records: 14041
## Subjects: 2693
## -- Removing records < minEraDuration (0)
## Records: 11386
## Subjects: 2159
## -- Removing events where index date < target index date + indexDateOffset (0)
## Records: 8381
## Subjects: 2159
## -- splitEventCohorts
## Records: 8366
## Subjects: 2144
## -- eraCollapse (30)
## Records: 8366
## Subjects: 2144
## -- Iteration 1: minPostCombinationDuration (30), combinatinoWindow (30)
## Records: 558
## Subjects: 512
## -- Iteration 2: minPostCombinationDuration (30), combinatinoWindow (30)
## Records: 554
## Subjects: 512
## -- After Combination
## Records: 554
## Subjects: 512
## -- filterTreatments (First)
## Records: 553
## Subjects: 512
## -- treatment construction done
## Records: 553
## Subjects: 512
defaultSettings
## # Andromeda object
## # Physical location: C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0\file81f87d1d6804.sqlite
##
## Tables:
## $addRowsFRFS_1 (personId, indexYear, eventCohortId, eventStartDate, eventEndDate, type, age, sex, durationEra, sortOrder, gapPrevious, selectedRows, switch, combinationFRFS, combinationLRFS, eventStartDateNext, eventEndDatePrevious, eventEndDateNext, eventCohortIdPrevious)
## $addRowsFRFS_2 (personId, indexYear, eventCohortId, eventStartDate, age, sex, eventEndDate, durationEra, gapPrevious, sortOrder, selectedRows, switch, combinationFRFS, combinationLRFS, eventStartDateNext, eventEndDatePrevious, eventEndDateNext, eventCohortIdPrevious)
## $addRowsLRFS_1 (personId, indexYear, eventCohortId, eventStartDate, eventEndDate, type, age, sex, durationEra, sortOrder, gapPrevious, selectedRows, switch, combinationFRFS, combinationLRFS, eventStartDateNext, eventEndDatePrevious, eventEndDateNext, eventCohortIdPrevious, checkDuration)
## $addRowsLRFS_2 (personId, indexYear, eventCohortId, eventStartDate, age, sex, eventEndDate, durationEra, gapPrevious, sortOrder, selectedRows, switch, combinationFRFS, combinationLRFS, eventStartDateNext, eventEndDatePrevious, eventEndDateNext, eventCohortIdPrevious, checkDuration)
## $attrition (number_records, number_subject, reason_id, reason, time)
## $cohortTable (cohortId.x, personId, subject_id_origin, startDate.x, endDate.x, age.x, sex.x, type.x, cohortId.y, startDate.y, endDate.y, age.y, sex.y, type.y, indexYear, indexDate)
## $cohorts (cohortId, cohortName, type)
## $currentCohorts (cohortId, personId, subject_id_origin, startDate, endDate, age, sex)
## $dbplyr_6Epxqj9f86 (number_records, number_subject, reason_id, reason, time)
## $dbplyr_A6SYQa2Grm (number_records, number_subject, reason_id, reason, time)
## $dbplyr_Y4ySoaoQh2 (number_records, number_subject, reason_id, reason, time)
## $dbplyr_Z2M8Xrcunc (number_records, number_subject, reason_id, reason, time)
## $dbplyr_cSkDQutjYF (number_records, number_subject, reason_id, reason, time)
## $dbplyr_egRFrGyyPF (number_records, number_subject, reason_id, reason, time)
## $dbplyr_nGJacsYo9B (number_records, number_subject, reason_id, reason, time)
## $dbplyr_oLN1X87rlp (number_records, number_subject, reason_id, reason, time)
## $dbplyr_ol3lkrBoLU (number_records, number_subject, reason_id, reason, time)
## $dbplyr_sI90271IHm (number_records, number_subject, reason_id, reason, time)
## $eventCohorts (cohortId, personId, subject_id_origin, startDate, endDate, age, sex, type)
## $exitCohorts (cohortId, personId, subject_id_origin, startDate, endDate, age, sex, type)
## $exitHistory (personId, indexYear, eventCohortId, eventStartDate, eventEndDate, age, sex, durationEra)
## $labels (eventCohortId, eventCohortName)
## $metadata (cdmSourceName, cdmSourceAbbreviation, cdmReleaseDate, vocabularyVersion, executionStartDate, packageVersion, rVersion, platform, execution_end_date)
## $sqlite_stat1 (tbl, idx, stat)
## $sqlite_stat4 (tbl, idx, neq, nlt, ndlt, sample)
## $targetCohorts (cohortId, personId, subject_id_origin, startDate, endDate, age, sex, type, indexYear, indexDate)
## $treatmentHistory (eventCohortId, personId, indexYear, eventStartDate, age, sex, eventEndDate, durationEra, sortOrder, eventSeq, eventCohortName)
The output of computePathways
is an Andromeda environment,
which allows us to investigate intermediate results and patient-level
data. This data is not sharable.
# treatmentHistory table
head(defaultSettings$treatmentHistory)
## # Source: SQL [6 x 11]
## # Database: sqlite 3.46.0 [C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0\file81f87d1d6804.sqlite]
## eventCohortId personId indexYear eventStartDate age sex eventEndDate
## <chr> <dbl> <dbl> <int> <dbl> <chr> <int>
## 1 1 275 1982 4425 4 MALE 4455
## 2 1 570 1968 -562 1 MALE -532
## 3 1 195 1984 5156 4 FEMALE 5216
## 4 1 868 1964 -1956 6 MALE -1866
## 5 1 164 1954 -5677 2 MALE -5617
## 6 1 132 1966 -1105 6 FEMALE -1015
## # ℹ 4 more variables: durationEra <int>, sortOrder <dbl>, eventSeq <int>,
## # eventCohortName <chr>
# metadata table
defaultSettings$metadata
## # Source: table<`metadata`> [1 x 9]
## # Database: sqlite 3.46.0 [C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0\file81f87d1d6804.sqlite]
## cdmSourceName cdmSourceAbbreviation cdmReleaseDate vocabularyVersion
## <chr> <chr> <date> <chr>
## 1 Synthea synthetic heal… Synthea 2019-05-25 v5.0 18-JAN-19
## # ℹ 5 more variables: executionStartDate <chr>, packageVersion <chr>,
## # rVersion <chr>, platform <chr>, execution_end_date <chr>
# First Recieved First Stopped
head(defaultSettings$addRowsFRFS_1)
## # Source: SQL [0 x 19]
## # Database: sqlite 3.46.0 [C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0\file81f87d1d6804.sqlite]
## # ℹ 19 variables: personId <lgl>, indexYear <dbl>, eventCohortId <lgl>,
## # eventStartDate <dbl>, eventEndDate <lgl>, type <lgl>, age <dbl>, sex <chr>,
## # durationEra <lgl>, sortOrder <lgl>, gapPrevious <lgl>, selectedRows <lgl>,
## # switch <lgl>, combinationFRFS <lgl>, combinationLRFS <lgl>,
## # eventStartDateNext <lgl>, eventEndDatePrevious <lgl>,
## # eventEndDateNext <lgl>, eventCohortIdPrevious <lgl>
# Last Recieved Last Stopped
head(defaultSettings$addRowsLRFS_1)
## # Source: SQL [6 x 20]
## # Database: sqlite 3.46.0 [C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0\file81f87d1d6804.sqlite]
## personId indexYear eventCohortId eventStartDate eventEndDate type age sex
## <dbl> <dbl> <chr> <int> <int> <chr> <dbl> <chr>
## 1 3 -3284 2 12854 12854 event 46 FEMA…
## 2 4 690 2 7081 7081 event 19 FEMA…
## 3 5 -3648 2 27 27 event 12 MALE
## 4 8 -1071 2 12044 12044 event 40 FEMA…
## 5 9 803 2 14022 14022 event 38 FEMA…
## 6 10 163 2 10465 10465 event 31 MALE
## # ℹ 12 more variables: durationEra <int>, sortOrder <dbl>, gapPrevious <int>,
## # selectedRows <dbl>, switch <dbl>, combinationFRFS <dbl>,
## # combinationLRFS <dbl>, eventStartDateNext <int>,
## # eventEndDatePrevious <int>, eventEndDateNext <int>,
## # eventCohortIdPrevious <chr>, checkDuration <dbl>
DatabaseConnector
is also supported. The following
parameters are required instead of
cdm
:
connectionDetails
: ConnectionDetails object form DatabaseConnector.cdmSchema
: Schema where the CDM exists.resultSchema
: Schema to write the cohort table to.tempEmulationSchema
: Some database platforms like
Oracle and Impala do not truly support temp tables. To emulate temp
tables, provide a schema with write privileges where temp tables can be
created.The following code snippet works with Eunomia
, a cohort
table (cohort_table) exists in the database, and a cohorts
data frame has been created.
computePathways(
cohorts = cohorts,
cohortTableName = cohortTableName,
connectionDetails = Eunomia::getEunomiaConnectionDetails(),
cdmSchema = "main",
resultSchema = "main",
tempEmulationSchema = NULL
)
Even though the default settings work well for most use cases, it might not work for all situations. The settings below allow us to influence how the events of interest should be processed to form treatment pathways.
Parameter | Value | Description |
---|---|---|
indexDateOffset | 0 | Offset the index date of the Target cohort. |
minEraDuration | 0 | Minimum time an event era should last to be included in analysis |
eraCollapseSize | 30 | Window of time between which two eras of the same event cohort are collapsed into one era |
combinationWindow | 30 | Window of time two event cohorts need to overlap to be considered a combination treatment |
minPostCombinationDuration | 30 | Minimum time an event era before or after a generated combination treatment should last to be included in analysis |
filterTreatments | First | Select first occurrence of (‘First’); changes between (‘Changes’); or all event cohorts (‘All’). |
maxPathLength | 5 | Maximum number of steps included in treatment pathway |
The following figure shows how each of these parameters affect the computation of the treatment pathway.
You can add these settings to the
computePathways
function call. Lets see what happens when
we set our minEraDuration
to 60, but keep
the rest of the settings mentioned as their default values.
minEra60 <- computePathways(
cohorts = cohorts,
cohortTableName = "cohort_table",
cdm = cdm,
# Pathway settings
indexDateOffset = 0,
minEraDuration = 60,
eraCollapseSize = 30,
combinationWindow = 30,
minPostCombinationDuration = 30,
filterTreatments = "First",
maxPathLength = 5
)
## Warning in validateComputePathways(): The `minPostCombinationDuration` is set
## lower than the `minEraDuration`, this might result in unexpected behavior
## Warning in validateComputePathways(): The `combinationWindow` is set lower than
## the `minEraDuration`, this might result in unexpected behavior
## -- Qualifying records for cohort definitions: 1, 2, 3, 4, 5, 6, 7, 8
## Records: 14041
## Subjects: 2693
## -- Removing records < minEraDuration (60)
## Records: 2523
## Subjects: 2159
## -- Removing events where index date < target index date + indexDateOffset (0)
## Records: 2216
## Subjects: 2159
## -- splitEventCohorts
## Records: 336
## Subjects: 279
## -- eraCollapse (30)
## Records: 336
## Subjects: 279
## -- Iteration 1: minPostCombinationDuration (30), combinatinoWindow (30)
## Records: 291
## Subjects: 279
## -- After Combination
## Records: 291
## Subjects: 279
## -- filterTreatments (First)
## Records: 291
## Subjects: 279
## -- treatment construction done
## Records: 291
## Subjects: 279
Number of treatments with a minimum duration of greater or equal to 0 days.
## [1] 553
Number of treatments with a minimum duration of greater or equal to 60 days.
## [1] 291
We can also split our defined event cohorts into acute and therapy cohorts.
Parameter | Description |
---|---|
splitEventCohorts | Specify event cohort ID’s (i.e. c(1, 2, 3) to split in
acute (< splitTime days) and therapy (>= splitTime days). As an
example treatment Drug A could be split into
Drug A (therapy) and Drug A (acute).
And we could set our splitTime to 30. Drug A
(acute) would be the time before day 0-29 and Drug A
(therapy) would be the day 30 or later. |
splitTime | Specify number of days at which each of the split event cohorts
should be split in acute and therapy (i.e. c(20, 30, 10) ).
The length of splitTime must equal the length of
splitEventCohorts
|
Let’s say we want to assume that the first 60 days of our treatment is acute, and beyond that therapy.
splitAcuteTherapy <- computePathways(
cohorts = cohorts,
cohortTableName = "cohort_table",
cdm = cdm,
# Split settings
splitEventCohorts = 1,
splitTime = 60
)
## -- Qualifying records for cohort definitions: 1, 2, 3, 4, 5, 6, 7, 8
## Records: 14041
## Subjects: 2693
## -- Removing records < minEraDuration (0)
## Records: 11386
## Subjects: 2159
## -- Removing events where index date < target index date + indexDateOffset (0)
## Records: 8381
## Subjects: 2159
## -- splitEventCohorts
## Records: 8366
## Subjects: 2144
## -- eraCollapse (30)
## Records: 8366
## Subjects: 2144
## -- Iteration 1: minPostCombinationDuration (30), combinatinoWindow (30)
## Records: 558
## Subjects: 512
## -- Iteration 2: minPostCombinationDuration (30), combinatinoWindow (30)
## Records: 554
## Subjects: 512
## -- After Combination
## Records: 554
## Subjects: 512
## -- filterTreatments (First)
## Records: 553
## Subjects: 512
## -- treatment construction done
## Records: 553
## Subjects: 512
## [1] "acetaminophen (acute)" "acetaminophen (acute)+amoxicillin"
## [3] "acetaminophens (therapy)" "amoxicillin"
## [5] "amoxicillin+clavulanate" "aspirin"
## [7] "clavulanate"
We can see that our Acetaminophen cohorts are split into
Acetaminophen (acute) and (therapy).
Acute labels all the Acetaminophen cohorts lasting less than
our defined splitTime
, in this case 60 days.
Therapy labels all the Acetaminophen cohorts lasting 60 days or
more.
We can dictate in what time frame we want to look. We can look from
the start date of our target cohort and on wards, or we can look before
the end date of our target cohort. By default
TreatmentPatterns
looks from the start date and
onwards.
includeEndDate <- computePathways(
cohorts = cohorts,
cohortTableName = "cohort_table",
cdm = cdm,
# Split settings
includeTreatments = "endDate"
)
## -- Qualifying records for cohort definitions: 1, 2, 3, 4, 5, 6, 7, 8
## Records: 14041
## Subjects: 2693
## -- Removing records < minEraDuration (0)
## Records: 11386
## Subjects: 2159
## -- Removing events where index date < target index date + indexDateOffset (0)
## Records: 8381
## Subjects: 2159
## -- splitEventCohorts
## Records: 8366
## Subjects: 2144
## -- eraCollapse (30)
## Records: 8366
## Subjects: 2144
## -- Iteration 1: minPostCombinationDuration (30), combinatinoWindow (30)
## Records: 558
## Subjects: 512
## -- Iteration 2: minPostCombinationDuration (30), combinatinoWindow (30)
## Records: 554
## Subjects: 512
## -- After Combination
## Records: 554
## Subjects: 512
## -- filterTreatments (First)
## Records: 553
## Subjects: 512
## -- treatment construction done
## Records: 553
## Subjects: 512
identical(
includeEndDate$treatmentHistory %>% pull(personId),
defaultSettings$treatmentHistory %>% pull(personId)
)
## [1] TRUE
In our example case for Viral Sinusitis it appears to not matter, as the personID’s are identical.
The export
function allows us to export the generated
result objects from computePathways
. There are several
arguments that we can change to alter the behavior, depending on what we
are allowed to share.
Let’s say we are only able to share results of groups of subjects that have at least 5 subjects in them.
tempDir <- tempdir()
export(
andromeda = defaultSettings,
outputPath = file.path(tempDir, "default"),
minCellCount = 5
)
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/default/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/default/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/default/treatmentPathways.csv
## Censoring 1219 pathways with a frequency <5 to 5.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/default/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/default/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/default/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/default/countsSex.csv
We can also choose between different methods how to handle pathways
that fall below are specified minCellCount
. These types are
1) "cellCount"
, 2)
"remove"
, and 3) "mean"
.
We could say we want to censor all pathways that fall below the
minCellCount
to be censored to the
minCellCount
.
export(
andromeda = minEra60,
outputPath = file.path(tempDir, "minEra60_cellCount"),
minCellCount = 5,
censorType = "minCellCount"
)
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_cellCount/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_cellCount/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_cellCount/treatmentPathways.csv
## Censoring 983 pathways with a frequency <5 to 5.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_cellCount/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_cellCount/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_cellCount/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_cellCount/countsSex.csv
Or we could completely remove them
export(
andromeda = minEra60,
outputPath = file.path(tempDir, "minEra60_remove"),
minCellCount = 5,
censorType = "remove"
)
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_remove/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_remove/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_remove/treatmentPathways.csv
## Removing 983 pathways with a frequency <5.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_remove/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_remove/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_remove/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_remove/countsSex.csv
Or finally we can censor them as the mean of all the groups that fall
below the minCellCount
.
export(
andromeda = minEra60,
outputPath = file.path(tempDir, "minEra60_mean"),
minCellCount = 5,
censorType = "mean"
)
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_mean/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_mean/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_mean/treatmentPathways.csv
## Censoring 983 pathways with a frequency <5 to mean.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_mean/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_mean/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_mean/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/minEra60_mean/countsSex.csv
We can also specify an age window.
export(
andromeda = splitAcuteTherapy,
outputPath = file.path(tempDir, "splitAcuteTherapy_ageWindow3"),
minCellCount = 5,
censorType = "mean",
ageWindow = 3
)
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindow3/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindow3/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindow3/treatmentPathways.csv
## Censoring 2044 pathways with a frequency <5 to mean.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindow3/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindow3/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindow3/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindow3/countsSex.csv
Or a collection of ages.
export(
andromeda = splitAcuteTherapy,
outputPath = file.path(tempDir, "splitAcuteTherapy_ageWindowMultiple"),
minCellCount = 5,
censorType = "mean",
ageWindow = c(0, 18, 25, 30, 40, 50, 60, 150)
)
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindowMultiple/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindowMultiple/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindowMultiple/treatmentPathways.csv
## Censoring 1275 pathways with a frequency <5 to mean.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindowMultiple/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindowMultiple/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindowMultiple/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/splitAcuteTherapy_ageWindowMultiple/countsSex.csv
Finally we can also specify an archiveName
which is the
name of a zip-file to zip all our output csv-files to.
export(
andromeda = includeEndDate,
outputPath = file.path(tempDir, "includeEndDate"),
minCellCount = 5,
censorType = "mean",
ageWindow = 3,
archiveName = "output.zip"
)
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/treatmentPathways.csv
## Censoring 1812 pathways with a frequency <5 to mean.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/countsSex.csv
## Zipping files to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/includeEndDate/output.zip
Instead of using computePathways
and
export
, instead we could use
executeTreatmentPatterns
. Which is an all-in-one function
that trades full control for convenience.
executeTreatmentPatterns(
cohorts = cohorts,
cohortTableName = "cohort_table",
outputPath = file.path(tempDir, "all-in-one"),
cdm = cdm,
minEraDuration = 0,
eraCollapseSize = 30,
combinationWindow = 30,
minCellCount = 5
)
## -- Qualifying records for cohort definitions: 1, 2, 3, 4, 5, 6, 7, 8
## Records: 14041
## Subjects: 2693
## -- Removing records < minEraDuration (0)
## Records: 11386
## Subjects: 2159
## -- Removing events where index date < target index date + indexDateOffset (0)
## Records: 8381
## Subjects: 2159
## -- splitEventCohorts
## Records: 8366
## Subjects: 2144
## -- eraCollapse (30)
## Records: 8366
## Subjects: 2144
## -- Iteration 1: minPostCombinationDuration (30), combinatinoWindow (30)
## Records: 558
## Subjects: 512
## -- Iteration 2: minPostCombinationDuration (30), combinatinoWindow (30)
## Records: 554
## Subjects: 512
## -- After Combination
## Records: 554
## Subjects: 512
## -- filterTreatments (First)
## Records: 553
## Subjects: 512
## -- treatment construction done
## Records: 553
## Subjects: 512
## Writing attrition to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/attrition.csv
## Writing metadata to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/metadata.csv
## Writing treatmentPathways to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/treatmentPathways.csv
## Censoring 1540 pathways with a frequency <5 to mean.
## Writing summaryStatsTherapyDuration to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/summaryEventDuration.csv
## Writing countsYearPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/countsYear.csv
## Writing countsAgePath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/countsAge.csv
## Writing countsSexPath to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/countsSex.csv
## Zipping files to C:\Users\MVANKE~1\AppData\Local\Temp\RtmpQ9uZv0/all-in-one/TreatmentPatterns-Output.zip
When using DatabaseConnector
we can substitute the
cdm
object with connectionDetails
,
cdmSchema
, resultSchema
, and
tempEmulationSchema
.
executeTreatmentPatterns(
cohorts = cohorts,
cohortTableName = "cohort_table",
outputPath = file.path(tempDir, "all-in-one"),
connectionDetails = Eunomia::getEunomiaConnectionDetails(),
cdmSchema = "main",
resultSchema = "main",
tempEmulationSchema = NULL,
minEraDuration = 0,
eraCollapseSize = 30,
combinationWindow = 30,
minCellCount = 5
)
Now that we have exported our output, in various ways, we can
evaluate the output. As you may have noticed the export
function exports 6 csv-files: 1) treatmentPathways.csv,
2) countsAge.csv, 3) countsSex.csv,
4) countsYear.csv, 5)
summaryStatsTherapyDuraion.csv, and 6) metadata.csv
The treatmentPathways file contains all the pathways found, with a frequency, pairwise stratified by age group, sex and index year.
treatmentPathways <- read.csv(file.path(tempDir, "default", "treatmentPathways.csv"))
head(treatmentPathways)
## path freq age sex indexYear
## 1 None 12 0-10 female 1950
## 2 None 12 0-10 female 1951
## 3 None 17 0-10 female 1952
## 4 None 19 0-10 female 1953
## 5 None 11 0-10 female 1954
## 6 None 18 0-10 female 1955
We are able to filter based on the strata, and filter on a frequency > 5.
We can see the pathways contain the treatment names we provided in
our event cohorts. Besides that we also see the paths are annoted with a
+
or -
. The +
indicates two
treatments are a combination therapy,
i.e. Acetaminophen+Amoxicillin
is a combination of
Acetaminophen and Amoxicillin. The -
indicates a switch between treatments,
i.e. Aspirin-Acetaminophen
is a switch from
Aspirin to Acetaminophen. Note that these combinations
and switches can occur in the same pathway,
i.e. Amoxicillin+Clavulanate-Aspirin
. The first treatment
is a combination of Amoxicillin and Clavulanate that
switches to Aspirin.
The countsAge, countsSex, and countsYear contain counts per age, sex, and index year.
age <- read.csv(file.path(tempDir, "default", "countsAge.csv"))
sex <- read.csv(file.path(tempDir, "default", "countsSex.csv"))
year <- read.csv(file.path(tempDir, "default", "countsYear.csv"))
head(age)
## age n
## 1 1 311
## 2 2 550
## 3 3 340
## 4 4 181
## 5 5 158
## 6 6 107
head(sex)
## sex n
## 1 FEMALE 1092
## 2 MALE 1067
head(year)
## indexYear n
## 1 1950 43
## 2 1951 40
## 3 1952 54
## 4 1953 47
## 5 1954 47
## 6 1955 49
The summaryEventDuration file contains summary statistics from
different events, across all found “lines”. A “line” is equal to the
level in the Sunburst or Sankey diagrams. The summary statistics allow
for plotting of boxplots with the plotEventDuration()
function.
eventDuration <- read.csv(file.path(tempDir, "default", "summaryEventDuration.csv"))
plotEventDuration(
eventDurations = eventDuration,
minCellCount = 5,
treatmentGroups = "both",
eventLines = NULL,
includeOverall = TRUE
)
The metadata file is a file that contains information about the circumstances the analysis was performed in, and information about R, and the CDM.
## cdmSourceName cdmSourceAbbreviation cdmReleaseDate
## 1 Synthea synthetic health database Synthea 2019-05-25
## vocabularyVersion executionStartDate packageVersion
## 1 v5.0 18-JAN-19 2024-11-26 2.6.10
## rVersion platform execution_end_date
## 1 R version 4.3.3 (2024-02-29 ucrt) x86_64-w64-mingw32 2024-11-26
From the filtered treatmentPathways file we are able to create a sunburst plot.
createSunburstPlot(treatmentPathways = all, legend = list(w = 300))
Or a Sankey Diagram.
createSankeyDiagram(all)
Both plots are interactive in an HTML-environment, and are easy to include in shiny applications.