Prepare variables for add_summary — add

This function processes a dataset for statistical analysis by categorizing variables into continuous and categorical types, performing validity checks, and applying necessary conversions. It automatically handles normality checks, equality of variances checks, and expected frequency assumptions checks.

add_var(data, var = NULL, group = "group", norm = "auto", center = "median")

Arguments

data

A data frame containing the variables to analyze, with variables at columns and observations at rows.

var

A character vector of variable names to include. If NULL, by default, all columns except the group column will be used.

group

A character string specifying the grouping variable in data. If not specified, 'group', by default.

norm

Control parameter for normality tests. Accepts:

'auto': Automatically decide based on p-values, but the same as 'ask' when n > 1000, default
'ask': Show p-values, plots QQ plots and prompts for decision
TRUE/'true': Always assuming data are normally distributed
FALSE/'false': Always assuming data are non-normally distributed

center

A character string specifying the center to use in Levene's test for equality of variances. Default is "median", which is more robust than the mean.

Value

A modified data frame with an attribute 'add_var' containing a list of categorized variables and their properties:

var: List of categorized variables:
- valid: All valid variable names after checks
- continuous: Sublist of continuous variables (further divided by normality/equal variance)
- categorical: Sublist of categorical variables (further divided by ordered/expected frequency)
group: Grouping variable name
overall_n: Total number of observations
group_n: Observation counts per group
group_nlevels: Number of groups
group_levels: Group level names
norm: Normality check method used

Examples

if (FALSE) {
datalist <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")
}