Group dated events into episodes.

Dated events (records) within a certain duration of an index event are assigned to a unique group. Each group has unique ID and are described as "episodes". "episodes" can be "fixed" or "rolling" ("recurring"). Each episodes has a "Case" and/or "Recurrent" record while all other records within the group are either "Duplicates" of the "Case" or "Recurrent" event.

episodes(
  date,
  case_length = Inf,
  episode_type = "fixed",
  recurrence_length = case_length,
  episode_unit = "days",
  strata = NULL,
  sn = NULL,
  episodes_max = Inf,
  rolls_max = Inf,
  case_overlap_methods = 8,
  recurrence_overlap_methods = case_overlap_methods,
  skip_if_b4_lengths = FALSE,
  data_source = NULL,
  data_links = "ANY",
  custom_sort = NULL,
  skip_order = Inf,
  reference_event = "last_record",
  case_for_recurrence = FALSE,
  from_last = FALSE,
  group_stats = c("case_nm", "wind", "epid_interval"),
  display = "none",
  case_sub_criteria = NULL,
  recurrence_sub_criteria = case_sub_criteria,
  case_length_total = 1,
  recurrence_length_total = case_length_total,
  skip_unique_strata = TRUE,
  splits_by_strata = 1,
  batched = "semi"
)

links_wf_episodes(
  date,
  case_length = Inf,
  episode_type = "fixed",
  strata = NULL,
  sn = NULL,
  display = "none"
)

episodes_af_shift(
  date,
  case_length = Inf,
  sn = NULL,
  strata = NULL,
  group_stats = FALSE,
  episode_type = "fixed",
  data_source = NULL,
  episode_unit = "days",
  data_links = "ANY",
  display = "none"
)

Arguments

date: [date|datetime|integer|number_line]. Record date or period.
case_length: [integer|number_line]. Duration from an index event distinguishing one "Case" from another.
episode_type: [character]. Options are "fixed" (default) or "rolling". See Details.
recurrence_length: [integer|number_line]. Duration from an index event distinguishing a "Recurrent" event from its "Case" or prior "Recurrent" event.
episode_unit: [character]. Unit of time for case_length and recurrence_length. Options are "seconds", "minutes", "hours", "days" (default), "weeks", "months" or "years". See diyar::episode_unit.
strata: [atomic]. Subsets of the dataset. Episodes are created separately by each strata.
sn: [integer]. Unique record ID.
episodes_max: [integer]. Maximum number of episodes permitted within each strata.
rolls_max: [integer]. Maximum number of times an index event can recur. Only used if episode_type is "rolling".
case_overlap_methods: [character|integer]. Specific ways a period (record) most overlap with a "Case" event. See (overlaps).
recurrence_overlap_methods: [character|integer]. Specific ways a period (record) most overlap with a "Recurrent" event. See (overlaps).
skip_if_b4_lengths: [logical]. If TRUE (default), events before a lagged case_length or recurrence_length are skipped.
data_source: [character]. Source ID for each record. If provided, a list of all sources in each episode is returned. See epid_dataset slot.
data_links: [list|character]. data_source required in each epid. An episode without records from these data_sources will be unlinked. See Details.
custom_sort: [atomic]. Preferential order for selecting index events. See custom_sort.
skip_order: [integer]. End episode tracking in a strata when the an index event's custom_sort order is greater than the supplied skip_order.
reference_event: [character]. Specifies which of the records are used as index events. Options are "last_record" (default), "last_event", "first_record" or "first_event".
case_for_recurrence: [logical]. If TRUE, a case_length is applied to both "Case" and "Recurrent" events. If FALSE (default), a case_length is applied to only "Case" events.
from_last: [logical]. Track episodes beginning from the earliest to the most recent record (FALSE) or vice versa (TRUE).
group_stats: [character]. A selection of group metrics to return for each episode. Most are added to slots of the epid object. Options are NULL or any combination of "case_nm", "wind" and "epid_interval".
display: [character]. Display progress update and/or generate a linkage report for the analysis. Options are; "none" (default), "progress", "stats", "none_with_report", "progress_with_report" or "stats_with_report".
case_sub_criteria: [sub_criteria]. Additional nested match criteria for events in a case_length.
recurrence_sub_criteria: [sub_criteria]. Additional nested match criteria for events in a recurrence_length.
case_length_total: [integer|number_line]. Minimum number of matched case_lengths required for an episode.
recurrence_length_total: [integer|number_line]. Minimum number of matched recurrence_lengths required for an episode.
skip_unique_strata: [logical]. If TRUE, a strata with a single event is skipped.
splits_by_strata: [integer]. Split analysis into n parts. This typically lowers max memory usage but increases run time.
batched: [character]. Create and compare records in batches. Options are "yes", "no", and "semi". typically, the ("semi") option will have a higher max memory and shorter run-time while ("no") will have a lower max memory but longer run-time

Value

epid; list

Details

episodes() links dated records (events) that are within a set duration of each other in iterations. Every record is linked to a unique group (episode; epid object). These episodes represent occurrences of interest as specified by function's arguments and defined by a case definition.

Two main type of episodes are possible;

"fixed" - An episode where all events are within a fixed duration of an index event.
"rolling" - An episode where all events are within a recurring duration of an index event.

Every record in each episode is categorised as one of the following;

"Case" - Index event of the episode (without a nested match criteria).
"Case_CR" - Index event of the episode (with a nested match criteria).
"Duplicate_C" - Duplicate of the index event.
"Recurrent" - Recurrence of the index event (without a nested match criteria).
"Recurrent_CR" - Recurrence of the index event (with a nested match criteria).
"Duplicate_R" - Duplicate of the recurrent event.
"Skipped" - Skipped records.

If data_links is supplied, every element of the list must be named "l" (links) or "g" (groups). Unnamed elements are assumed to be "l".

If named "l", groups without records from every listed data_source will be unlinked.
If named "g", groups without records from any listed data_source will be unlinked.

All records with a missing (NA) strata or date are skipped.

Wrapper functions or alternative implementations of episodes() for specific use cases or benefits:

episodes_wf_splits() - Identical records are excluded from the main analysis.
episodes_af_shift() - A mostly vectorised approach.
links_wf_episodes() - The same functionality achieved with links.

See vignette("episodes") for further details.

Examples

data(infections)
data(hospital_admissions)

# One 16-day (15-day difference) fixed episode per type of infection
episodes(date = infections$date,
         strata = infections$infection,
         case_length = 15,
         episodes_max = 1,
         episode_type = "fixed")
#>  [1] "E.01 2018-04-01 == 2018-04-01 (C)" "E.02 2018-04-07 -> 2018-04-19 (C)"
#>  [3] "E.02 2018-04-07 -> 2018-04-19 (D)" "E.02 2018-04-07 -> 2018-04-19 (D)"
#>  [5] "E.05 2018-04-25 == 2018-04-25 (S)" "E.06 2018-05-01 == 2018-05-01 (S)"
#>  [7] "E.07 2018-05-07 == 2018-05-07 (S)" "E.08 2018-05-13 == 2018-05-13 (S)"
#>  [9] "E.09 2018-05-19 -> 2018-05-25 (C)" "E.09 2018-05-19 -> 2018-05-25 (D)"
#> [11] "E.11 2018-05-31 == 2018-05-31 (S)"

# Multiple 16-day episodes with an 11-day recurrence period
episodes(date = infections$date,
         strata = NULL,
         case_length = 15,
         episodes_max = Inf,
         episode_type = "rolling",
         recurrence_length = 10)
#>  [1] "E.1 2018-04-01 -> 2018-05-31 (C)" "E.1 2018-04-01 -> 2018-05-31 (D)"
#>  [3] "E.1 2018-04-01 -> 2018-05-31 (D)" "E.1 2018-04-01 -> 2018-05-31 (R)"
#>  [5] "E.1 2018-04-01 -> 2018-05-31 (R)" "E.1 2018-04-01 -> 2018-05-31 (R)"
#>  [7] "E.1 2018-04-01 -> 2018-05-31 (R)" "E.1 2018-04-01 -> 2018-05-31 (R)"
#>  [9] "E.1 2018-04-01 -> 2018-05-31 (R)" "E.1 2018-04-01 -> 2018-05-31 (R)"
#> [11] "E.1 2018-04-01 -> 2018-05-31 (R)"

# Overlapping periods of hospital stays
dfr <- hospital_admissions[2:3]

dfr$admin_period <-
  number_line(dfr$admin_dt,dfr$discharge_dt)

dfr$ep <-
  episodes(date = dfr$admin_period,
           strata = NULL,
           case_length = index_window(dfr$admin_period),
           case_overlap_methods = "inbetween")

dfr
#>      admin_dt discharge_dt             admin_period
#> 1  2019-01-01   2019-01-01 2019-01-01 == 2019-01-01
#> 2  2019-01-01   2019-01-10 2019-01-01 -> 2019-01-10
#> 3  2019-01-10   2019-01-13 2019-01-10 -> 2019-01-13
#> 4  2019-01-05   2019-01-06 2019-01-05 -> 2019-01-06
#> 5  2019-01-05   2019-01-15 2019-01-05 -> 2019-01-15
#> 6  2019-01-07   2019-01-15 2019-01-07 -> 2019-01-15
#> 7  2019-01-04   2019-01-13 2019-01-04 -> 2019-01-13
#> 8  2019-01-20   2019-01-30 2019-01-20 -> 2019-01-30
#> 9  2019-01-26   2019-01-31 2019-01-26 -> 2019-01-31
#> 10 2019-01-01   2019-01-10 2019-01-01 -> 2019-01-10
#> 11 2019-01-20   2019-01-30 2019-01-20 -> 2019-01-30
#>                                   ep
#> 1  E.01 2019-01-01 == 2019-01-01 (C)
#> 2  E.02 2019-01-01 -> 2019-01-10 (C)
#> 3  E.05 2019-01-05 -> 2019-01-15 (D)
#> 4  E.02 2019-01-01 -> 2019-01-10 (D)
#> 5  E.05 2019-01-05 -> 2019-01-15 (C)
#> 6  E.06 2019-01-07 -> 2019-01-15 (C)
#> 7  E.07 2019-01-04 -> 2019-01-13 (C)
#> 8  E.08 2019-01-20 -> 2019-01-30 (C)
#> 9  E.09 2019-01-26 -> 2019-01-31 (C)
#> 10 E.10 2019-01-01 -> 2019-01-10 (C)
#> 11 E.11 2019-01-20 -> 2019-01-30 (C)
as.data.frame(dfr$ep)
#>    epid sn     case_nm dist_wind_index dist_epid_index epid_length epid_total
#> 1     1  1        Case        0.0 days        0.0 days      0 days          1
#> 2     2  2        Case        0.0 days        0.0 days      9 days          2
#> 3     5  3 Duplicate_C        1.5 days        1.5 days     10 days          2
#> 4     2  4 Duplicate_C        0.0 days        0.0 days      9 days          2
#> 5     5  5        Case        0.0 days        0.0 days     10 days          2
#> 6     6  6        Case        0.0 days        0.0 days      8 days          1
#> 7     7  7        Case        0.0 days        0.0 days      9 days          1
#> 8     8  8        Case        0.0 days        0.0 days     10 days          1
#> 9     9  9        Case        0.0 days        0.0 days      5 days          1
#> 10   10 10        Case        0.0 days        0.0 days      9 days          1
#> 11   11 11        Case        0.0 days        0.0 days     10 days          1
#>    iteration wind_id1 wind_nm1 epid_start   epid_end
#> 1          3        1     Case 2019-01-01 2019-01-01
#> 2          1        2     Case 2019-01-01 2019-01-10
#> 3          5        5     Case 2019-01-05 2019-01-15
#> 4          1        2     Case 2019-01-01 2019-01-10
#> 5          5        5     Case 2019-01-05 2019-01-15
#> 6          6        6     Case 2019-01-07 2019-01-15
#> 7          4        7     Case 2019-01-04 2019-01-13
#> 8          7        8     Case 2019-01-20 2019-01-30
#> 9          9        9     Case 2019-01-26 2019-01-31
#> 10         2       10     Case 2019-01-01 2019-01-10
#> 11         8       11     Case 2019-01-20 2019-01-30

Group dated events into episodes.

Arguments

Value

Details

See also

Examples