Get started with the jfa package

Koen Derks

2023-07-16

Introduction

Welcome to the ‘Get started’ vignette of the jfa package. jfa is an R package that provides Bayesian and classical statistical methods for audit sampling and data auditing. First, the package provides functions for planning, performing, evaluating, and reporting an audit sample compliant with international standards on auditing. Second, the package includes functions for auditing data, such as testing the distribution of first digits of a data set against Benford’s law.

This vignette provides a simple explanation of the functionality in the package. For a more detailed explanation of the functions in the package, see the other vignettes at the package website.

Audit sampling

jfa provides a user-friendly interface for statistical audit sampling. The standard audit sampling workflow is divided into four distinct parts: planning a sample, selecting the sample from the population, executing the audit and evaluating the misstatement by extrapolating the errors in the sample to the population.

Figure 1: The audit sampling workflow
Figure 1: The audit sampling workflow

See the package vignette Audit sampling: Get started for more details about jfa’s audit sampling functionality.

Functions and intended usage

To illustrate jfa’s’ statistical audit sampling functionality, consider the BuildIt data set that comes with the package. This data consists of 3500 items from a financial statement line item, each of which has a booked value and (for illustrative purposes) a true (audit) value.

# Load the BuildIt population
data("BuildIt")
head(BuildIt)
##      ID bookValue auditValue
## 1 82884    242.61     242.61
## 2 25064    642.99     642.99
## 3 81235    628.53     628.53
## 4 71769    431.87     431.87
## 5 55080    620.88     620.88
## 6 93224    501.76     501.76

The first step in the audit sampling workflow is to calculate a minimum required sample size given the purpose of the sample. You can use the sample to 1) obtain evidence for or against the hypothesis that the misstatement in the population is lower than the performance materiality and / or 2) estimate the misstatement in the population. The planning() function can be used to calculate this minimum required sample size.

# Stage 1: Planning
stage1 <- planning(
  materiality = 0.03, expected = 0.01,
  likelihood = "poisson", conf.level = 0.95
)
summary(stage1)
## 
##  Classical Audit Sample Planning Summary
## 
## Options:
##   Confidence level:              0.95 
##   Materiality:                   0.03 
##   Hypotheses:                    H₀: Θ >= 0.03 vs. H₁: Θ < 0.03 
##   Expected:                      0.01 
##   Likelihood:                    poisson 
## 
## Results:
##   Minimum sample size:           220 
##   Tolerable errors:              2.2 
##   Expected most likely error:    0.01 
##   Expected upper bound:          0.02997 
##   Expected precision:            0.01997 
##   Expected p-value:              0.049761

The selection() function can be used to select the required samples from the population.

# Stage 2: Selection
stage2 <- selection(
  data = BuildIt, size = stage1,
  units = "values", values = "bookValue",
  method = "interval", start = 1
)
summary(stage2)
## 
##  Audit Sample Selection Summary
## 
## Options:
##   Requested sample size:         220 
##   Sampling units:                monetary units 
##   Method:                        fixed interval sampling 
##   Starting point:                1 
## 
## Data:
##   Population size:               3500 
##   Population value:              1403221 
##   Selection interval:            6378.3 
## 
## Results:
##   Selected sampling units:       220 
##   Proportion of value:           0.080554 
##   Selected items:                220 
##   Proportion of size:            0.062857
# Stage 3: Execution
sample <- stage2[["sample"]]

The evaluation() function can be used to evaluate the misstatement in the sample.

# Stage 4: Evaluation
stage4 <- evaluation(
  materiality = 0.03, method = "stringer",
  conf.level = 0.95, data = sample,
  values = "bookValue", values.audit = "auditValue"
)
summary(stage4)
## 
##  Classical Audit Sample Evaluation Summary
## 
## Options:
##   Confidence level:               0.95 
##   Materiality:                    0.03 
##   Method:                         stringer 
## 
## Data:
##   Sample size:                    220 
##   Number of errors:               5 
##   Sum of taints:                  2.9999929 
## 
## Results:
##   Most likely error:              0.013636 
##   95 percent confidence interval: [0, 0.033724] 
##   Precision:                      0.020087

Data auditing

The digit_test() function can be used to test the distribution of leading or last digits in a variable against a pre-specified distribution (e.g., Benford’s law).

See the package vignette Data auditing: Get started for more details about jfa’s data auditing functionality.

Functions and intended usage

# Digit distribution test
x <- digit_test(sinoForest$value, check = "first", reference = "benford")
print(x)
## 
##  Digit Distribution Test
## 
## data:  sinoForest$value
## n = 772, MAD = 0.0065981, X-squared = 7.6517, df = 8, p-value = 0.4682
## alternative hypothesis: leading digit(s) are not distributed according to the benford distribution.
plot(x)

The repeated_test() function can be used to test the numbers in a variable for repeated values.

# Repeated values test
x <- repeated_test(sanitizer$value, check = "lasttwo", samples = 5000)
print(x)
## 
##  Repeated Values Test
## 
## data:  sanitizer$value
## n = 1600, AF = 1.5225, p-value = 0.0026
## alternative hypothesis: average frequency in data is greater than for random data.
plot(x)