episodes_wf_splits
is a wrapper function of episodes
.
It's designed to be more efficient with larger datasets.
Duplicate records which do not affect the case definition are excluded prior to episode tracking.
The resulting episode identifiers are then recycled for the duplicate records.
episodes_wf_splits(..., duplicates_recovered = "ANY", reframe = FALSE)
Arguments passed to episodes
.
[character]
. Determines which duplicate records are recycled.
Options are "ANY"
(default), "without_sub_criteria"
, "with_sub_criteria"
or "ALL"
. See Details
.
[logical]
. Determines if the duplicate records in a sub_criteria
are reframed (TRUE
) or excluded (FALSE
).
epid
; list
episodes_wf_splits()
reduces or re-frames a dataset to
the minimum datasets required to implement a case definition.
This leads to the same outcome but with the benefit of a shorter processing time.
The duplicates_recovered
argument determines which identifiers are recycled.
Selecting the "with_sub_criteria"
option will force only identifiers created resulting from a matched sub_criteria
("Case_CR"
and "Recurrent_CR"
) are recycled.
However, if "without_sub_criteria"
is selected then only identifiers created that do not result from a matched sub_criteria
("Case"
and "Recurrent"
) are recycled
Excluded duplicates of "Duplicate_C"
and "Duplicate_R"
are always recycled.
The reframe
argument will either reframe
or subset a sub_criteria
.
Both will require slightly different functions for match_funcs
or equal_funcs
.
# With 2,000 duplicate records of 20 events,
# `episodes_wf_splits()` will take less time than `episodes()`
dates <- seq(from = as.Date("2019-04-01"), to = as.Date("2019-04-20"), by = 1)
dates <- rep(dates, 2000)
system.time(
ep1 <- episodes(dates, 1)
)
#> user system elapsed
#> 0.35 0.05 0.39
system.time(
ep2 <- episodes_wf_splits(dates, 1)
)
#> user system elapsed
#> 0.03 0.00 0.03
# Both leads to the same outcome.
all(ep1 == ep2)
#> [1] TRUE