** whSample** helps analysts quickly generate
statistical samples from Excel or Comma Separated Value (CSV) files and
write them to a new Excel workbook. Users have a choice of Simple Random
or Stratified Random samples, and a third choice of having each stratum
included in a separate worksheet.

See package vignettes for detailed documentation.

The workhorse function is *sampler*. A helper function,
*ssize*, estimates the minimum sample size necessary to achieve
statistical requirements using a Normal Approximation to the
Hypergeometric Distribution. This distribution spans the probabilities
of yes/no-type responses without replacement. These parameters are:

**N**, the population size.**ci**, the required confidence interval. The default is 95%.**me**, the required level of precision, or margin of error. The default is +/- 7%.**p**, the anticipated rate of occurrence. The default is 50%.

*ssize(N, ci=0.95, me=0.07, p=0.50)* (showing the defaults)
only requires the **N** argument. Used as a standalone, it
can be used to explore sample sizes under other conditions. For example,
a probe sample may suggest that a 50-50 probability isn’t realistic. A
revised sample size can be estimated with the observed success
probability (p=0.6, for example).

The *sampler* function calls *ssize* to get its sample
size estimate. Therefore, it requires the **ci**,
**me**, and **p** arguments, which it passes
to *ssize*.

*sampler* also takes four additional arguments:

**irisData**opens the file chooser to a folder with example files of Anderston’s Iris dataset of flower characteristics.**backups**provides a buffer for use if necessary to replace samples found to be invalid for some reason,**seed**is used to seed the internal random number generator, and**keepOrg**determines if a copy of the population is included in the output.

The defaults for these additional arguments are *backups=5*,
*irisData=F*, *seed=NULL* and *keepOrg=F*. The
default seed will tell *sampler* to use the current system time
in milliseconds. Any number can be used as a seed. Whichever one is used
will be listed in the *Report* output tab. The keep-original
option (*keepOrg*) defaults to FALSE, but could be set to
*keepOrg=T* for smaller populations that wouldn’t exceed Excel’s
row limit is 1,048,576 rows.

To override any of these defaults, enter *name=value* as an
argument.

*sampler* uses a series of menus to guide users through the
sampling process.

*sampler* creates a new Excel workbook with three parts:

a copy of the original (source) data if previously requested,

an Excel spreadsheet with the requested sample, and

a new tab called

*Report*with key reference information:path and name of the source file

size (in rows) of the source file

sample type (Simple Random Sample, Stratified Random Sample, or Tabbed Stratified Sample)

sampling parameters

sample size

stratification key

number of strata

number of backups requested (this number is applied to every stratum in a stratified sample)

random number seed used, for documentation and reproducibility

date-time stamp of when the sample was generated

stratification information (name, number in the population, proportion of the population, and the number of samples)

You can install *whSample* from CRAN with:

`install.packages("whSample")`

or get the latest developmental version with:

`::install_github("km4ivi/whSample") devtools`

*sampler* depends on several external packages to run
properly. If you’re running a developmental version, make sure these
packages are installed on your computer:

- tidyverse (or individually: magrittr, dplyr, purrr)
- openxlsx
- data.table
- tools
- utils
- tcltk
- bit64

*ssize(5000)*: N=5000, other arguments use defaults

*ssize(5000, p=0.60)*: N=5000, with a 60% expected rate of
occurrence

*sampler()*: Uses all defaults, gets N from the source
data.

*sampler(backups=2, seed=12345)*: Overrides specific
defaults