This function processes a dataset for statistical analysis by categorizing variables into continuous and categorical types, performing validity checks, and applying necessary conversions. It automatically handles normality checks, equality of variances checks, and expected frequency assumptions checks.

add_var(data, var = NULL, group = "group", norm = "auto", center = "median")

Arguments

data

A data frame containing the variables to analyze, with variables at columns and observations at rows.

var

A character vector of variable names to include. If NULL, by default, all columns except the group column will be used.

group

A character string specifying the grouping variable in data. If not specified, 'group', by default.

norm

Control parameter for normality tests. Accepts:

  • 'auto': Automatically decide based on p-values, but the same as 'ask' when n > 1000, default

  • 'ask': Show p-values, plots QQ plots and prompts for decision

  • TRUE/'true': Always assuming data are normally distributed

  • FALSE/'false': Always assuming data are non-normally distributed

center

A character string specifying the center to use in Levene's test for equality of variances. Default is "median", which is more robust than the mean.

Value

A modified data frame with an attribute 'add_var' containing a list of categorized variables and their properties:

  • var: List of categorized variables:

    • valid: All valid variable names after checks

    • continuous: Sublist of continuous variables (further divided by normality/equal variance)

    • categorical: Sublist of categorical variables (further divided by ordered/expected frequency)

  • group: Grouping variable name

  • overall_n: Total number of observations

  • group_n: Observation counts per group

  • group_nlevels: Number of groups

  • group_levels: Group level names

  • norm: Normality check method used

Examples

if (FALSE) {
datalist <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")
}