Skip to contents

Checks if a dataset confirms to a given set of rules

Usage

check_data(
  x,
  rules,
  xname = deparse(substitute(x)),
  stop_on_fail = FALSE,
  stop_on_warn = FALSE,
  stop_on_error = FALSE
)

Arguments

x

a dataset, either a data.frame, dplyr::tibble, data.table::data.table, arrow::arrow_table, arrow::open_dataset, or dplyr::tbl (SQL connection)

rules

a list of rules

xname

optional, a name for the x variable (only used for errors)

stop_on_fail

when any of the rules fail, throw an error with stop

stop_on_warn

when a warning is found in the code execution, throw an error with stop

stop_on_error

when an error is found in the code execution, throw an error with stop

Value

a data.frame-like object with one row for each rule and its results

See also

Examples

rs <- ruleset(
  rule(mpg > 10),
  rule(cyl %in% c(4, 6)), # missing 8
  rule(qsec >= 14.5 & qsec <= 22.9)
)
rs
#> <Verification Ruleset with 3 elements>
#>   [1] 'Rule for: mpg' matching `mpg > 10` (allow_na: FALSE)
#>   [2] 'Rule for: cyl' matching `cyl %in% c(4, 6)` (allow_na: FALSE)
#>   [3] 'Rule for: qsec' matching `qsec >= 14.5 & qsec <= 22.9` (allow_na: FALSE)

check_data(mtcars, rs)
#>              name                        expr allow_na negate tests pass fail
#> 1:  Rule for: mpg                    mpg > 10    FALSE  FALSE    32   32    0
#> 2:  Rule for: cyl            cyl %in% c(4, 6)    FALSE  FALSE    32   18   14
#> 3: Rule for: qsec qsec >= 14.5 & qsec <= 22.9    FALSE  FALSE    32   32    0
#>    warn error              time
#> 1:            0.0052380562 secs
#> 2:            0.0027456284 secs
#> 3:            0.0002403259 secs