Creates a rule that checks whether values in a local column exist in a
column of a referenced dataset. Use with check_data() by supplying x as
a named list of datasets and setting data_name in ruleset() (or by
ordering the list so the first entry is the primary dataset).
Usage
reference_rule(
local_col,
ref_dataset,
ref_col,
name = NA,
allow_na = FALSE,
negate = FALSE,
...
)Arguments
- local_col
column name in the primary dataset.
- ref_dataset
name of the referenced dataset in the
xlist.- ref_col
column name in the referenced dataset.
- name
optional display name for the rule.
- allow_na
logical; if
TRUE, missing values inlocal_colare treated as passing.- negate
logical; if
TRUE, inverts the rule (values must not be in the referenced column).- ...
additional fields attached to the rule object.
Value
A reference_rule object that can be included in ruleset().
Examples
flights <- data.frame(carrier = c("AA", "BB", NA_character_))
carriers <- data.frame(carrier_id = c("AA"))
rs <- ruleset(
reference_rule(
local_col = "carrier",
ref_dataset = "carriers",
ref_col = "carrier_id",
allow_na = TRUE
),
data_name = "flights"
)
check_data(list(flights = flights, carriers = carriers), rs)
#> check_type name
#> <char> <char>
#> 1: reference_rule Reference rule: carrier in carriers$carrier_id
#> expr allow_na negate tests pass fail warn
#> <char> <lgcl> <lgcl> <int> <int> <int> <char>
#> 1: carrier %in% carriers$carrier_id TRUE FALSE 3 2 1
#> error time
#> <char> <difftime>
#> 1: 5.65052e-05 secs
# negated relation: value must NOT exist in blacklist
blacklist <- data.frame(carrier_id = c("XX", "YY"))
rs_neg <- ruleset(
reference_rule(
local_col = "carrier",
ref_dataset = "blacklist",
ref_col = "carrier_id",
negate = TRUE,
allow_na = TRUE
),
data_name = "flights"
)
check_data(list(flights = flights, blacklist = blacklist), rs_neg)
#> check_type name
#> <char> <char>
#> 1: reference_rule Reference rule: carrier in blacklist$carrier_id
#> expr allow_na negate tests pass fail warn
#> <char> <lgcl> <lgcl> <int> <int> <int> <char>
#> 1: carrier %in% blacklist$carrier_id TRUE TRUE 3 2 1
#> error time
#> <char> <difftime>
#> 1: 5.054474e-05 secs