describe_data
returns a set of common descriptive statistics
(e.g., n, mean, sd) for numeric variables.
Arguments
- data
A data frame.
- column
An unquoted (numerical) column name from the data frame.
- na.rm
Logical. Should missing values (including NaN) be excluded in calculating the descriptives? The default is TRUE.
- short
Logical. Should only a subset of descriptives be reported? If set to TRUE, only the N, M, and SD will be returned. The default is FALSE.
Details
The data can be grouped using dplyr::group_by
so that
descriptives will be calculated for each group level.
When na.rm is set to FALSE, a percentage column will be added to the output that contains the percentage of non-missing data.
Skew and kurtosis are based on the skewness
and kurtosis
functions of the moments
package (Komsta & Novomestky, 2015).
Percentages are calculated based on the total of non-missing observations. When na.rm is set to FALSE, percentages are based on the total of missing and non-missing observations.
Examples
# Inspect descriptives of the response column from the 'quote_source' data
# frame included in tidystats
describe_data(quote_source, response)
#> # A tibble: 1 × 13
#> var missing N M SD SE min max range median mode skew
#> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 respon… 18 6325 5.59 2.19 0.0275 1 9 8 5 5 -0.137
#> # ℹ 1 more variable: kurtosis <dbl>
# Repeat the former, now for each level of the source column
quote_source |>
dplyr::group_by(source) |>
describe_data(response)
#> # A tibble: 2 × 14
#> # Groups: source [2]
#> var source missing N M SD SE min max range median mode
#> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 respon… Bin L… 18 3083 5.23 2.11 0.0380 1 9 8 5 5
#> 2 respon… Washi… 0 3242 5.93 2.21 0.0388 1 9 8 6 5
#> # ℹ 2 more variables: skew <dbl>, kurtosis <dbl>
# Only inspect the total N, mean, and standard deviation
quote_source |>
dplyr::group_by(source) |>
describe_data(response, short = TRUE)
#> # A tibble: 2 × 5
#> # Groups: source [2]
#> var source N M SD
#> <chr> <chr> <int> <dbl> <dbl>
#> 1 response Bin Laden 3083 5.23 2.11
#> 2 response Washington 3242 5.93 2.21