Dated events (records) within a certain duration of an index event are assigned to a unique group. Each group has unique ID and are described as "episodes". "episodes" can be "fixed" or "rolling" ("recurring"). Each episodes has a "Case" and/or "Recurrent" record while all other records within the group are either "Duplicates" of the "Case" or "Recurrent" event.

episodes(
  date,
  case_length = Inf,
  episode_type = "fixed",
  recurrence_length = case_length,
  episode_unit = "days",
  strata = NULL,
  sn = NULL,
  episodes_max = Inf,
  rolls_max = Inf,
  case_overlap_methods = 8,
  recurrence_overlap_methods = case_overlap_methods,
  skip_if_b4_lengths = FALSE,
  data_source = NULL,
  data_links = "ANY",
  custom_sort = NULL,
  skip_order = Inf,
  reference_event = "last_record",
  case_for_recurrence = FALSE,
  from_last = FALSE,
  group_stats = c("case_nm", "wind", "epid_interval"),
  display = "none",
  case_sub_criteria = NULL,
  recurrence_sub_criteria = case_sub_criteria,
  case_length_total = 1,
  recurrence_length_total = case_length_total,
  skip_unique_strata = TRUE,
  splits_by_strata = 1,
  batched = "semi"
)

links_wf_episodes(
  date,
  case_length = Inf,
  episode_type = "fixed",
  strata = NULL,
  sn = NULL,
  display = "none"
)

episodes_af_shift(
  date,
  case_length = Inf,
  sn = NULL,
  strata = NULL,
  group_stats = FALSE,
  episode_type = "fixed",
  data_source = NULL,
  episode_unit = "days",
  data_links = "ANY",
  display = "none"
)

Arguments

date

[date|datetime|integer|number_line]. Record date or period.

case_length

[integer|number_line]. Duration from an index event distinguishing one "Case" from another.

episode_type

[character]. Options are "fixed" (default) or "rolling". See Details.

recurrence_length

[integer|number_line]. Duration from an index event distinguishing a "Recurrent" event from its "Case" or prior "Recurrent" event.

episode_unit

[character]. Unit of time for case_length and recurrence_length. Options are "seconds", "minutes", "hours", "days" (default), "weeks", "months" or "years". See diyar::episode_unit.

strata

[atomic]. Subsets of the dataset. Episodes are created separately by each strata.

sn

[integer]. Unique record ID.

episodes_max

[integer]. Maximum number of episodes permitted within each strata.

rolls_max

[integer]. Maximum number of times an index event can recur. Only used if episode_type is "rolling".

case_overlap_methods

[character|integer]. Specific ways a period (record) most overlap with a "Case" event. See (overlaps).

recurrence_overlap_methods

[character|integer]. Specific ways a period (record) most overlap with a "Recurrent" event. See (overlaps).

skip_if_b4_lengths

[logical]. If TRUE (default), events before a lagged case_length or recurrence_length are skipped.

data_source

[character]. Source ID for each record. If provided, a list of all sources in each episode is returned. See epid_dataset slot.

data_links

[list|character]. data_source required in each epid. An episode without records from these data_sources will be unlinked. See Details.

custom_sort

[atomic]. Preferential order for selecting index events. See custom_sort.

skip_order

[integer]. End episode tracking in a strata when the an index event's custom_sort order is greater than the supplied skip_order.

reference_event

[character]. Specifies which of the records are used as index events. Options are "last_record" (default), "last_event", "first_record" or "first_event".

case_for_recurrence

[logical]. If TRUE, a case_length is applied to both "Case" and "Recurrent" events. If FALSE (default), a case_length is applied to only "Case" events.

from_last

[logical]. Track episodes beginning from the earliest to the most recent record (FALSE) or vice versa (TRUE).

group_stats

[character]. A selection of group metrics to return for each episode. Most are added to slots of the epid object. Options are NULL or any combination of "case_nm", "wind" and "epid_interval".

display

[character]. Display progress update and/or generate a linkage report for the analysis. Options are; "none" (default), "progress", "stats", "none_with_report", "progress_with_report" or "stats_with_report".

case_sub_criteria

[sub_criteria]. Additional nested match criteria for events in a case_length.

recurrence_sub_criteria

[sub_criteria]. Additional nested match criteria for events in a recurrence_length.

case_length_total

[integer|number_line]. Minimum number of matched case_lengths required for an episode.

recurrence_length_total

[integer|number_line]. Minimum number of matched recurrence_lengths required for an episode.

skip_unique_strata

[logical]. If TRUE, a strata with a single event is skipped.

splits_by_strata

[integer]. Split analysis into n parts. This typically lowers max memory usage but increases run time.

batched

[character]. Create and compare records in batches. Options are "yes", "no", and "semi". typically, the ("semi") option will have a higher max memory and shorter run-time while ("no") will have a lower max memory but longer run-time

Value

epid; list

Details

episodes() links dated records (events) that are within a set duration of each other in iterations. Every record is linked to a unique group (episode; epid object). These episodes represent occurrences of interest as specified by function's arguments and defined by a case definition.

Two main type of episodes are possible;

  • "fixed" - An episode where all events are within a fixed duration of an index event.

  • "rolling" - An episode where all events are within a recurring duration of an index event.

Every record in each episode is categorised as one of the following;

  • "Case" - Index event of the episode (without a nested match criteria).

  • "Case_CR" - Index event of the episode (with a nested match criteria).

  • "Duplicate_C" - Duplicate of the index event.

  • "Recurrent" - Recurrence of the index event (without a nested match criteria).

  • "Recurrent_CR" - Recurrence of the index event (with a nested match criteria).

  • "Duplicate_R" - Duplicate of the recurrent event.

  • "Skipped" - Skipped records.

If data_links is supplied, every element of the list must be named "l" (links) or "g" (groups). Unnamed elements are assumed to be "l".

  • If named "l", groups without records from every listed data_source will be unlinked.

  • If named "g", groups without records from any listed data_source will be unlinked.

All records with a missing (NA) strata or date are skipped.

Wrapper functions or alternative implementations of episodes() for specific use cases or benefits:

  • episodes_wf_splits() - Identical records are excluded from the main analysis.

  • episodes_af_shift() - A mostly vectorised approach.

  • links_wf_episodes() - The same functionality achieved with links.

See vignette("episodes") for further details.

Examples

data(infections)
data(hospital_admissions)

# One 16-day (15-day difference) fixed episode per type of infection
episodes(date = infections$date,
         strata = infections$infection,
         case_length = 15,
         episodes_max = 1,
         episode_type = "fixed")
#>  [1] "E.01 2018-04-01 == 2018-04-01 (C)" "E.02 2018-04-07 -> 2018-04-19 (C)"
#>  [3] "E.02 2018-04-07 -> 2018-04-19 (D)" "E.02 2018-04-07 -> 2018-04-19 (D)"
#>  [5] "E.05 2018-04-25 == 2018-04-25 (S)" "E.06 2018-05-01 == 2018-05-01 (S)"
#>  [7] "E.07 2018-05-07 == 2018-05-07 (S)" "E.08 2018-05-13 == 2018-05-13 (S)"
#>  [9] "E.09 2018-05-19 -> 2018-05-25 (C)" "E.09 2018-05-19 -> 2018-05-25 (D)"
#> [11] "E.11 2018-05-31 == 2018-05-31 (S)"

# Multiple 16-day episodes with an 11-day recurrence period
episodes(date = infections$date,
         strata = NULL,
         case_length = 15,
         episodes_max = Inf,
         episode_type = "rolling",
         recurrence_length = 10)
#>  [1] "E.1 2018-04-01 -> 2018-05-31 (C)" "E.1 2018-04-01 -> 2018-05-31 (D)"
#>  [3] "E.1 2018-04-01 -> 2018-05-31 (D)" "E.1 2018-04-01 -> 2018-05-31 (R)"
#>  [5] "E.1 2018-04-01 -> 2018-05-31 (R)" "E.1 2018-04-01 -> 2018-05-31 (R)"
#>  [7] "E.1 2018-04-01 -> 2018-05-31 (R)" "E.1 2018-04-01 -> 2018-05-31 (R)"
#>  [9] "E.1 2018-04-01 -> 2018-05-31 (R)" "E.1 2018-04-01 -> 2018-05-31 (R)"
#> [11] "E.1 2018-04-01 -> 2018-05-31 (R)"

# Overlapping periods of hospital stays
dfr <- hospital_admissions[2:3]

dfr$admin_period <-
  number_line(dfr$admin_dt,dfr$discharge_dt)

dfr$ep <-
  episodes(date = dfr$admin_period,
           strata = NULL,
           case_length = index_window(dfr$admin_period),
           case_overlap_methods = "inbetween")

dfr
#>      admin_dt discharge_dt             admin_period
#> 1  2019-01-01   2019-01-01 2019-01-01 == 2019-01-01
#> 2  2019-01-01   2019-01-10 2019-01-01 -> 2019-01-10
#> 3  2019-01-10   2019-01-13 2019-01-10 -> 2019-01-13
#> 4  2019-01-05   2019-01-06 2019-01-05 -> 2019-01-06
#> 5  2019-01-05   2019-01-15 2019-01-05 -> 2019-01-15
#> 6  2019-01-07   2019-01-15 2019-01-07 -> 2019-01-15
#> 7  2019-01-04   2019-01-13 2019-01-04 -> 2019-01-13
#> 8  2019-01-20   2019-01-30 2019-01-20 -> 2019-01-30
#> 9  2019-01-26   2019-01-31 2019-01-26 -> 2019-01-31
#> 10 2019-01-01   2019-01-10 2019-01-01 -> 2019-01-10
#> 11 2019-01-20   2019-01-30 2019-01-20 -> 2019-01-30
#>                                   ep
#> 1  E.01 2019-01-01 == 2019-01-01 (C)
#> 2  E.02 2019-01-01 -> 2019-01-10 (C)
#> 3  E.05 2019-01-05 -> 2019-01-15 (D)
#> 4  E.02 2019-01-01 -> 2019-01-10 (D)
#> 5  E.05 2019-01-05 -> 2019-01-15 (C)
#> 6  E.06 2019-01-07 -> 2019-01-15 (C)
#> 7  E.07 2019-01-04 -> 2019-01-13 (C)
#> 8  E.08 2019-01-20 -> 2019-01-30 (C)
#> 9  E.09 2019-01-26 -> 2019-01-31 (C)
#> 10 E.10 2019-01-01 -> 2019-01-10 (C)
#> 11 E.11 2019-01-20 -> 2019-01-30 (C)
as.data.frame(dfr$ep)
#>    epid sn     case_nm dist_wind_index dist_epid_index epid_length epid_total
#> 1     1  1        Case        0.0 days        0.0 days      0 days          1
#> 2     2  2        Case        0.0 days        0.0 days      9 days          2
#> 3     5  3 Duplicate_C        1.5 days        1.5 days     10 days          2
#> 4     2  4 Duplicate_C        0.0 days        0.0 days      9 days          2
#> 5     5  5        Case        0.0 days        0.0 days     10 days          2
#> 6     6  6        Case        0.0 days        0.0 days      8 days          1
#> 7     7  7        Case        0.0 days        0.0 days      9 days          1
#> 8     8  8        Case        0.0 days        0.0 days     10 days          1
#> 9     9  9        Case        0.0 days        0.0 days      5 days          1
#> 10   10 10        Case        0.0 days        0.0 days      9 days          1
#> 11   11 11        Case        0.0 days        0.0 days     10 days          1
#>    iteration wind_id1 wind_nm1 epid_start   epid_end
#> 1          3        1     Case 2019-01-01 2019-01-01
#> 2          1        2     Case 2019-01-01 2019-01-10
#> 3          5        5     Case 2019-01-05 2019-01-15
#> 4          1        2     Case 2019-01-01 2019-01-10
#> 5          5        5     Case 2019-01-05 2019-01-15
#> 6          6        6     Case 2019-01-07 2019-01-15
#> 7          4        7     Case 2019-01-04 2019-01-13
#> 8          7        8     Case 2019-01-20 2019-01-30
#> 9          9        9     Case 2019-01-26 2019-01-31
#> 10         2       10     Case 2019-01-01 2019-01-10
#> 11         8       11     Case 2019-01-20 2019-01-30