Match criteria for record linkage with links and episodes

sub_criteria(
  ...,
  match_funcs = c(exact = diyar::exact_match),
  equal_funcs = c(exact = diyar::exact_match),
  operator = "or"
)

attrs(..., .obj = NULL)

eval_sub_criteria(x, ...)

# S3 method for sub_criteria
print(x, ...)

# S3 method for sub_criteria
format(x, show_levels = FALSE, ...)

# S3 method for sub_criteria
eval_sub_criteria(
  x,
  x_pos = seq_len(max(attr_eval(x))),
  y_pos = rep(1L, length(x_pos)),
  check_duplicates = TRUE,
  depth = 0,
  ...
)

Arguments

...

[atomic] Attributes passed to or eval_sub_criteria() or eval_sub_criteria()

Arguments passed to methods for eval_sub_criteria()

match_funcs

[function]. User defined logical test for matches.

equal_funcs

[function]. User defined logical test for identical record sets (all attributes of the same record).

operator

[character]. Options are "and" or "or".

.obj

[data.frame|list]. Attributes.

x

[sub_criteria]. Attributes.

show_levels

[logical]. If TRUE, show recursive depth for each logic statement of the match criteria.

x_pos

[integer]. Index of one half of a record pair.

y_pos

[integer]. Index of one half of a record pair.

check_duplicates

[logical]. If FALSE, does not check duplicate values. The result of the initial check will be recycled.

depth

[integer]. First order of recursion.

Value

sub_criteria

Details

sub_criteria() - Create a match criteria as a sub_criteria object. A sub_criteria object contains attributes to be compared, logical tests for the comparisons (see predefined_tests for examples) and another set of logical tests to determine identical records.

attrs() - Create a d_attribute object - a collection of atomic objects that can be passed to sub_criteria() as a single attribute.

eval_sub_criteria() - Evaluates a sub_criteria object.

At each iteration of links or episodes, record-pairs are created from each attribute of a sub_criteria object. eval_sub_criteria() evaluates each record-pair using the match_funcs and equal_funcs functions of a sub_criteria object. See predefined_tests for examples of match_funcs and equal_funcs.

User-defined functions are also permitted as match_funcs and equal_funcs. Such functions must meet three requirements:

  1. It must be able to compare the attributes.

  2. It must have two arguments named `x` and `y`, where `y` is the value for one observation being compared against all other observations (`x`).

  3. It must return a logical object i.e. TRUE or FALSE.

attrs() is useful when the match criteria requires an interaction between the multiple attributes. For example, attribute 1 + attribute 2 > attribute 3.

Every attribute, including those in attrs(), must have the same length or a length of 1.

See also

predefined_tests; links; episodes; eval_sub_criteria

Examples

# Attributes
attr_1 <- c(30, 28, 40, 25, 25, 29, 27)
attr_2 <- c("M", "F", "U", "M", "F", "U", "M")

# A match criteria
## Example 1 - A maximum difference of 10 in attribute 1
s_cri1 <- sub_criteria(attr_1, match_funcs = range_match)
s_cri1
#> {
#> match_func(30,28,40 ...)
#> }

# Evaluate the match criteria
## Compare the first element of 'attr_1' against all other elements
eval_sub_criteria(s_cri1)
#> $logical_test
#> [1] 1 1 0 1 1 1 1
#> 
## Compare the second element of 'attr_1' against all other elements
x_pos_val <- seq_len(max(attr_eval(s_cri1)))
eval_sub_criteria(s_cri1,
                  x_pos = x_pos_val,
                  y_pos = rep(2, length(x_pos_val)))
#> $logical_test
#> [1] 0 1 0 1 1 0 1
#> 

## Example 2 - `s_cri1` AND an exact match on attribute 2
s_cri2 <- sub_criteria(
  s_cri1,
  sub_criteria(attr_2, match_funcs = exact_match),
  operator = "and")
s_cri2
#> {
#>   {
#>   match_func(30,28,40 ...)
#>   } AND 
#>   {
#>   match_func(M,F,U ...)
#>   }
#> }

## Example 3 - `s_cri1` OR an exact match on attribute 2
s_cri3 <- sub_criteria(
  s_cri1,
  sub_criteria(attr_2, match_funcs = exact_match),
  operator = "or")
s_cri3
#> {
#>   {
#>   match_func(30,28,40 ...)
#>   } OR 
#>   {
#>   match_func(M,F,U ...)
#>   }
#> }

# Evaluate the match criteria
eval_sub_criteria(s_cri2)
#> $logical_test
#> [1] 1 0 0 1 0 0 1
#> 
eval_sub_criteria(s_cri3)
#> $logical_test
#> [1] 1 1 0 1 1 1 1
#> 

# Alternatively, using `attr()`
AND_func <- function(x, y) range_match(x$a1, y$a1) & x$a2 == y$a2
OR_func <- function(x, y) range_match(x$a1, y$a1) | x$a2 == y$a2

## Create a match criteria
s_cri2b <- sub_criteria(attrs(.obj = list(a1 = attr_1, a2 = attr_2)),
                        match_funcs = AND_func)
s_cri3b <- sub_criteria(attrs(.obj = list(a1 = attr_1, a2 = attr_2)),
                        match_funcs = OR_func)

# Evaluate the match criteria
eval_sub_criteria(s_cri2b)
#> $logical_test
#> [1] 1 0 0 1 0 0 1
#> 
eval_sub_criteria(s_cri3b)
#> $logical_test
#> [1] 1 1 0 1 1 1 1
#>