Dated events (records) within a certain duration of an index event are assigned to a unique group.
Each group has unique ID and are described as "episodes"
.
"episodes"
can be "fixed"
or "rolling"
("recurring"
).
Each episodes has a "Case"
and/or "Recurrent"
record
while all other records within the group are either "Duplicates"
of
the "Case"
or "Recurrent"
event.
episodes(
date,
case_length = Inf,
episode_type = "fixed",
recurrence_length = case_length,
episode_unit = "days",
strata = NULL,
sn = NULL,
episodes_max = Inf,
rolls_max = Inf,
case_overlap_methods = 8,
recurrence_overlap_methods = case_overlap_methods,
skip_if_b4_lengths = FALSE,
data_source = NULL,
data_links = "ANY",
custom_sort = NULL,
skip_order = Inf,
reference_event = "last_record",
case_for_recurrence = FALSE,
from_last = FALSE,
group_stats = c("case_nm", "wind", "epid_interval"),
display = "none",
case_sub_criteria = NULL,
recurrence_sub_criteria = case_sub_criteria,
case_length_total = 1,
recurrence_length_total = case_length_total,
skip_unique_strata = TRUE,
splits_by_strata = 1,
batched = "semi"
)
links_wf_episodes(
date,
case_length = Inf,
episode_type = "fixed",
strata = NULL,
sn = NULL,
display = "none"
)
episodes_af_shift(
date,
case_length = Inf,
sn = NULL,
strata = NULL,
group_stats = FALSE,
episode_type = "fixed",
data_source = NULL,
episode_unit = "days",
data_links = "ANY",
display = "none"
)
[date|datetime|integer|number_line]
. Record date or period.
[integer|number_line]
. Duration from an index event distinguishing one "Case"
from another.
[character]
. Options are "fixed"
(default) or "rolling"
. See Details
.
[integer|number_line]
. Duration from an index event distinguishing a "Recurrent"
event from its "Case"
or prior "Recurrent"
event.
[character]
. Unit of time for case_length
and recurrence_length
. Options are "seconds", "minutes", "hours", "days" (default), "weeks", "months" or "years". See diyar::episode_unit
.
[atomic]
. Subsets of the dataset. Episodes are created separately by each strata
.
[integer]
. Unique record ID.
[integer]
. Maximum number of episodes permitted within each strata
.
[integer]
. Maximum number of times an index event can recur. Only used if episode_type
is "rolling"
.
[character|integer]
. Specific ways a period (record) most overlap with a "Case"
event. See (overlaps
).
[character|integer]
. Specific ways a period (record) most overlap with a "Recurrent"
event. See (overlaps
).
[logical]
. If TRUE
(default), events before a lagged case_length
or recurrence_length
are skipped.
[character]
. Source ID for each record. If provided, a list of all sources in each episode is returned. See epid_dataset slot
.
[list|character]
. data_source
required in each epid
. An episode without records from these data_sources
will be unlinked
. See Details
.
[atomic]
. Preferential order for selecting index events. See custom_sort
.
[integer]
. End episode tracking in a strata
when the an index event's custom_sort
order is greater than the supplied skip_order
.
[character]
. Specifies which of the records are used as index events. Options are "last_record"
(default), "last_event"
, "first_record"
or "first_event"
.
[logical]
. If TRUE
, a case_length
is applied to both "Case"
and "Recurrent"
events.
If FALSE
(default), a case_length
is applied to only "Case"
events.
[logical]
. Track episodes beginning from the earliest to the most recent record (FALSE
) or vice versa (TRUE
).
[character]
. A selection of group metrics to return for each episode. Most are added to slots of the epid
object.
Options are NULL
or any combination of "case_nm"
, "wind"
and "epid_interval"
.
[character]
. Display progress update and/or generate a linkage report for the analysis. Options are; "none"
(default), "progress"
, "stats"
, "none_with_report"
, "progress_with_report"
or "stats_with_report"
.
[sub_criteria]
. Additional nested match criteria for events in a case_length
.
[sub_criteria]
. Additional nested match criteria for events in a recurrence_length
.
[integer|number_line]
. Minimum number of matched case_lengths
required for an episode.
[integer|number_line]
. Minimum number of matched recurrence_lengths
required for an episode.
[logical]
. If TRUE
, a strata with a single event is skipped.
[integer]
. Split analysis into n
parts. This typically lowers max memory usage but increases run time.
[character]
. Create and compare records in batches. Options are "yes"
, "no"
, and "semi"
.
typically, the ("semi"
) option will have a higher max memory and shorter run-time while ("no"
) will have a lower max memory but longer run-time
epid
; list
episodes()
links dated records (events) that
are within a set duration of each other in iterations.
Every record is linked to a unique group (episode; epid
object).
These episodes represent occurrences of interest as specified by function's arguments and defined by a case definition.
Two main type of episodes are possible;
"fixed"
- An episode where all events are within a fixed duration of an index event.
"rolling"
- An episode where all events are within a recurring duration of an index event.
Every record in each episode is categorised as one of the following;
"Case"
- Index event of the episode (without a nested match criteria).
"Case_CR"
- Index event of the episode (with a nested match criteria).
"Duplicate_C"
- Duplicate of the index event.
"Recurrent"
- Recurrence of the index event (without a nested match criteria).
"Recurrent_CR"
- Recurrence of the index event (with a nested match criteria).
"Duplicate_R"
- Duplicate of the recurrent event.
"Skipped"
- Skipped records.
If data_links
is supplied, every element of the list must be named "l"
(links) or "g"
(groups).
Unnamed elements are assumed to be "l"
.
If named "l"
, groups without records from every listed data_source
will be unlinked.
If named "g"
, groups without records from any listed data_source
will be unlinked.
All records with a missing (NA
) strata
or date
are skipped.
Wrapper functions or alternative implementations of episodes()
for specific use cases or benefits:
episodes_wf_splits()
- Identical records are excluded from the main analysis.
episodes_af_shift()
- A mostly vectorised approach.
links_wf_episodes()
- The same functionality achieved with links
.
See vignette("episodes")
for further details.
data(infections)
data(hospital_admissions)
# One 16-day (15-day difference) fixed episode per type of infection
episodes(date = infections$date,
strata = infections$infection,
case_length = 15,
episodes_max = 1,
episode_type = "fixed")
#> [1] "E.01 2018-04-01 == 2018-04-01 (C)" "E.02 2018-04-07 -> 2018-04-19 (C)"
#> [3] "E.02 2018-04-07 -> 2018-04-19 (D)" "E.02 2018-04-07 -> 2018-04-19 (D)"
#> [5] "E.05 2018-04-25 == 2018-04-25 (S)" "E.06 2018-05-01 == 2018-05-01 (S)"
#> [7] "E.07 2018-05-07 == 2018-05-07 (S)" "E.08 2018-05-13 == 2018-05-13 (S)"
#> [9] "E.09 2018-05-19 -> 2018-05-25 (C)" "E.09 2018-05-19 -> 2018-05-25 (D)"
#> [11] "E.11 2018-05-31 == 2018-05-31 (S)"
# Multiple 16-day episodes with an 11-day recurrence period
episodes(date = infections$date,
strata = NULL,
case_length = 15,
episodes_max = Inf,
episode_type = "rolling",
recurrence_length = 10)
#> [1] "E.1 2018-04-01 -> 2018-05-31 (C)" "E.1 2018-04-01 -> 2018-05-31 (D)"
#> [3] "E.1 2018-04-01 -> 2018-05-31 (D)" "E.1 2018-04-01 -> 2018-05-31 (R)"
#> [5] "E.1 2018-04-01 -> 2018-05-31 (R)" "E.1 2018-04-01 -> 2018-05-31 (R)"
#> [7] "E.1 2018-04-01 -> 2018-05-31 (R)" "E.1 2018-04-01 -> 2018-05-31 (R)"
#> [9] "E.1 2018-04-01 -> 2018-05-31 (R)" "E.1 2018-04-01 -> 2018-05-31 (R)"
#> [11] "E.1 2018-04-01 -> 2018-05-31 (R)"
# Overlapping periods of hospital stays
dfr <- hospital_admissions[2:3]
dfr$admin_period <-
number_line(dfr$admin_dt,dfr$discharge_dt)
dfr$ep <-
episodes(date = dfr$admin_period,
strata = NULL,
case_length = index_window(dfr$admin_period),
case_overlap_methods = "inbetween")
dfr
#> admin_dt discharge_dt admin_period
#> 1 2019-01-01 2019-01-01 2019-01-01 == 2019-01-01
#> 2 2019-01-01 2019-01-10 2019-01-01 -> 2019-01-10
#> 3 2019-01-10 2019-01-13 2019-01-10 -> 2019-01-13
#> 4 2019-01-05 2019-01-06 2019-01-05 -> 2019-01-06
#> 5 2019-01-05 2019-01-15 2019-01-05 -> 2019-01-15
#> 6 2019-01-07 2019-01-15 2019-01-07 -> 2019-01-15
#> 7 2019-01-04 2019-01-13 2019-01-04 -> 2019-01-13
#> 8 2019-01-20 2019-01-30 2019-01-20 -> 2019-01-30
#> 9 2019-01-26 2019-01-31 2019-01-26 -> 2019-01-31
#> 10 2019-01-01 2019-01-10 2019-01-01 -> 2019-01-10
#> 11 2019-01-20 2019-01-30 2019-01-20 -> 2019-01-30
#> ep
#> 1 E.01 2019-01-01 == 2019-01-01 (C)
#> 2 E.02 2019-01-01 -> 2019-01-10 (C)
#> 3 E.05 2019-01-05 -> 2019-01-15 (D)
#> 4 E.02 2019-01-01 -> 2019-01-10 (D)
#> 5 E.05 2019-01-05 -> 2019-01-15 (C)
#> 6 E.06 2019-01-07 -> 2019-01-15 (C)
#> 7 E.07 2019-01-04 -> 2019-01-13 (C)
#> 8 E.08 2019-01-20 -> 2019-01-30 (C)
#> 9 E.09 2019-01-26 -> 2019-01-31 (C)
#> 10 E.10 2019-01-01 -> 2019-01-10 (C)
#> 11 E.11 2019-01-20 -> 2019-01-30 (C)
as.data.frame(dfr$ep)
#> epid sn case_nm dist_wind_index dist_epid_index epid_length epid_total
#> 1 1 1 Case 0.0 days 0.0 days 0 days 1
#> 2 2 2 Case 0.0 days 0.0 days 9 days 2
#> 3 5 3 Duplicate_C 1.5 days 1.5 days 10 days 2
#> 4 2 4 Duplicate_C 0.0 days 0.0 days 9 days 2
#> 5 5 5 Case 0.0 days 0.0 days 10 days 2
#> 6 6 6 Case 0.0 days 0.0 days 8 days 1
#> 7 7 7 Case 0.0 days 0.0 days 9 days 1
#> 8 8 8 Case 0.0 days 0.0 days 10 days 1
#> 9 9 9 Case 0.0 days 0.0 days 5 days 1
#> 10 10 10 Case 0.0 days 0.0 days 9 days 1
#> 11 11 11 Case 0.0 days 0.0 days 10 days 1
#> iteration wind_id1 wind_nm1 epid_start epid_end
#> 1 3 1 Case 2019-01-01 2019-01-01
#> 2 1 2 Case 2019-01-01 2019-01-10
#> 3 5 5 Case 2019-01-05 2019-01-15
#> 4 1 2 Case 2019-01-01 2019-01-10
#> 5 5 5 Case 2019-01-05 2019-01-15
#> 6 6 6 Case 2019-01-07 2019-01-15
#> 7 4 7 Case 2019-01-04 2019-01-13
#> 8 7 8 Case 2019-01-20 2019-01-30
#> 9 9 9 Case 2019-01-26 2019-01-31
#> 10 2 10 Case 2019-01-01 2019-01-10
#> 11 8 11 Case 2019-01-20 2019-01-30