Package 'ezEDA' reference manual

Title:	Task Oriented Interface for Exploratory Data Analysis
Description:	Enables users to create visualizations using functions based on the data analysis task rather than on plotting mechanics. It hides the details of the individual 'ggplot2' function calls and allows the user to focus on the end goal. Useful for quick preliminary explorations. Provides functions for common exploration patterns. Some of the ideas in this package are motivated by Fox (2015, ISBN:1938377052).
Authors:	Viswa Viswanathan [aut, cre]
Maintainer:	Viswa Viswanathan <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.1
Built:	2025-03-12 04:02:06 UTC
Source:	https://github.com/kviswana/ezeda

Plot the contribution of different categories to a measure

Description

Plot the contribution of different categories to a measure

Usage

category_contribution(data, category, measure)
category_contribution(data, category, measure)

Arguments

`data`	A data frame or tibble
`category`	Unquoted name of category (can be factor, character or numeric)
`measure`	Unquoted name of measure

Value

A ggplot plot object

Examples

category_contribution(ggplot2::diamonds, cut, price)
category_contribution(ggplot2::diamonds, clarity, price)
category_contribution(ggplot2::diamonds, cut, price)
category_contribution(ggplot2::diamonds, clarity, price)

Plot counts of a category

Description

Plot counts of a category

Usage

category_tally(data, category_column)
category_tally(data, category_column)

Arguments

`data`	A data frame or tibble
`category_column`	Unquoted column name of category (can be factor, character or numeric)

Value

A ggplot plot object

Examples

category_tally(ggplot2::mpg, class)
category_tally(ggplot2::diamonds, cut)
category_tally(ggplot2::mpg, class)
category_tally(ggplot2::diamonds, cut)

Private utility function: given a possibly non-factor column passed as a quosure, convert into a factor

Description

Private utility function: given a possibly non-factor column passed as a quosure, convert into a factor

Usage

col_to_factor(data, col_enquo)
col_to_factor(data, col_enquo)

Arguments

`data`	A data frame or tibble
`col_enquo`	A quosure

Value

A data frame or tibble with the corresponding column converted to factor if nevessary

ezeda: A package for task oriented exploratory data analysis

Description

The ezeda package provides functions for visualizations for exploratory data analysis. Whereas graphic packages generally provide many functions that users assemble to create suitable plots, each ezeda function warps ggplot and other code to generate a complete plot for common exploratory data analysis task corresponding to a recurring pattern.

Details

ezeda provides five categories of functions: tally, contribution, measure distribution, measure relationship, and measure trend

tally functions

category_tally
two_category_tally

contribution functions

category_contribution
two_category_contribution

measure distribution functions

measure_distribution
measure_distribution_by_category
measure_distribution_by_two_categories
measure_distribution_by_time

measure relationship functions

two_measures_relationship
multi_measure_relationship

measure trend functions

measure_change_over_time
measure_change_over_time_long

Plot the change of a measure (or set of measures) over time where the data is in "long" format That is, all measures are in one column with another column labeling each measure value

Description

Plot the change of a measure (or set of measures) over time where the data is in "long" format That is, all measures are in one column with another column labeling each measure value

Usage

measure_change_over_time_long(
  data,
  time_col,
  measure_labels,
  measure_values,
  ...
)
measure_change_over_time_long(
  data,
  time_col,
  measure_labels,
  measure_values,
  ...
)

Arguments

`data`	A data frame or tibble
`time_col`	Unquoted column name with time values to plot on the x axis
`measure_labels`	Unquoted column name containing the name of the measure in the corresponding measure_values (see below) row (up to 6 measures)
`measure_values`	Unquted column name of the column with the measure values to be plotted
`...`	Unquoted names of measures to plot (up to 6 measures)

Value

A ggplot plot object

Examples

measure_change_over_time_long(ggplot2::economics_long, date, variable, value, pop, unemploy)
measure_change_over_time_long(ggplot2::economics_long, date, variable, value, pop, unemploy)

Plot the change of a measure (or set of measures) over time where each measure is in a different column

Description

Plot the change of a measure (or set of measures) over time where each measure is in a different column

Usage

measure_change_over_time_wide(data, time_col, ...)
measure_change_over_time_wide(data, time_col, ...)

Arguments

`data`	A data frame or tibble
`time_col`	Unquoted column name with time values to plot on the x axis
`...`	Unquoted column names of one or more measures to plot (up to 6 measures)

Value

A ggplot plot object

Examples

measure_change_over_time_wide(ggplot2::economics, date, pop, unemploy)
measure_change_over_time_wide(ggplot2::economics, date, pop, unemploy)

Plot the distribution of a numeric (measure) column

Description

Plot the distribution of a numeric (measure) column

Usage

measure_distribution(data, measure, type = "hist", bwidth = NULL)
measure_distribution(data, measure, type = "hist", bwidth = NULL)

Arguments

`data`	A data frame or tibble
`measure`	Unquoted column name of containing numbers (measure)
`type`	Histogram ("hist") or Boxplot ("box")
`bwidth`	width of bin for histogram (by default uses binwidth for 30 bins)

Value

A ggplot plot object

Examples

measure_distribution(ggplot2::diamonds, price)
measure_distribution(ggplot2::mpg, hwy)
measure_distribution(ggplot2::mpg, hwy, bwidth = 2)
measure_distribution(ggplot2::mpg, hwy, "hist")
measure_distribution(ggplot2::mpg, hwy, "box")
measure_distribution(ggplot2::diamonds, price)
measure_distribution(ggplot2::mpg, hwy)
measure_distribution(ggplot2::mpg, hwy, bwidth = 2)
measure_distribution(ggplot2::mpg, hwy, "hist")
measure_distribution(ggplot2::mpg, hwy, "box")

Plot the distribution of a numeric (measure) column differentiated by a category

Description

Plot the distribution of a numeric (measure) column differentiated by a category

Usage

measure_distribution_by_category(
  data,
  measure,
  category,
  type = "hist",
  separate = FALSE,
  bwidth = NULL
)
measure_distribution_by_category(
  data,
  measure,
  category,
  type = "hist",
  separate = FALSE,
  bwidth = NULL
)

Arguments

`data`	A data frame or tibble
`measure`	Unquoted column name of measure (containing numbers)
`category`	Unquoted column name of category (can be factor, character or numeric)
`type`	Histogram ("hist") or Boxplot ("box")
`separate`	Boolean specifying whether to plot each category in a separate facet
`bwidth`	width of bin for histogram (by default uses binwidth for 30 bins)

Value

A ggplot plot object

Examples

measure_distribution_by_category(ggplot2::diamonds, price, cut)
measure_distribution_by_category(ggplot2::mpg, hwy, class)
measure_distribution_by_category(ggplot2::diamonds, price, cut, separate = TRUE)
measure_distribution_by_category(ggplot2::mpg, hwy, class, separate = TRUE)
measure_distribution_by_category(ggplot2::mpg, hwy, class, "box")
measure_distribution_by_category(ggplot2::diamonds, price, cut)
measure_distribution_by_category(ggplot2::mpg, hwy, class)
measure_distribution_by_category(ggplot2::diamonds, price, cut, separate = TRUE)
measure_distribution_by_category(ggplot2::mpg, hwy, class, separate = TRUE)
measure_distribution_by_category(ggplot2::mpg, hwy, class, "box")

Plot the distribution of a numeric (measure) column differentiated by two categories

Description

Plot the distribution of a numeric (measure) column differentiated by two categories

Usage

measure_distribution_by_two_categories(
  data,
  measure,
  category1,
  category2,
  bwidth = NULL
)
measure_distribution_by_two_categories(
  data,
  measure,
  category1,
  category2,
  bwidth = NULL
)

Arguments

`data`	A data frame or tibble
`measure`	Unquoted column name of containing numbers (measure)
`category1`, `category2`	Unquoted column names of categories (can be factor, character or numeric)
`bwidth`	width of bin for histogram (by default uses binwidth for 30 bins)

Value

A ggplot plot object

Examples

measure_distribution_by_two_categories(ggplot2::mpg, hwy, class, fl)
measure_distribution_by_two_categories(ggplot2::diamonds, carat, cut, clarity)
measure_distribution_by_two_categories(ggplot2::mpg, hwy, class, fl)
measure_distribution_by_two_categories(ggplot2::diamonds, carat, cut, clarity)

Plot the change of distribution of a numeric (measure) column over time

Description

Plot the change of distribution of a numeric (measure) column over time

Usage

measure_distribution_over_time(data, measure, time, bwidth = NULL)
measure_distribution_over_time(data, measure, time, bwidth = NULL)

Arguments

`data`	A data frame or tibble
`measure`	Unquoted column name of containing numbers (measure)
`time`	Unquoted name of column containing the time object
`bwidth`	width of bin for histogram (by default uses binwidth for 30 bins)

Value

A ggplot plot object

Examples

h1 <- round(rnorm(50, 60, 8), 0)
h2 <- round(rnorm(50, 65, 8), 0)
h3 <- round(rnorm(50, 70, 8), 0)
h <- c(h1, h2, h3)
y <- c(rep(1999, 50), rep(2000, 50), rep(2001, 50))
df <- data.frame(height = h, year = y)
measure_distribution_over_time(df, h, year)
h1 <- round(rnorm(50, 60, 8), 0)
h2 <- round(rnorm(50, 65, 8), 0)
h3 <- round(rnorm(50, 70, 8), 0)
h <- c(h1, h2, h3)
y <- c(rep(1999, 50), rep(2000, 50), rep(2001, 50))
df <- data.frame(height = h, year = y)
measure_distribution_over_time(df, h, year)

Plot the relationship between many measures

Description

Plot the relationship between many measures

Usage

multi_measures_relationship(data, ...)
multi_measures_relationship(data, ...)

Arguments

`data`	A data frame or tibble
`...`	Unquoted column names of numeric columns (measures)

Value

A ggplot plot object

Examples

multi_measures_relationship(ggplot2::mpg, hwy, displ)
multi_measures_relationship(ggplot2::mpg, cty, hwy, displ)
multi_measures_relationship(ggplot2::mpg, hwy, displ)
multi_measures_relationship(ggplot2::mpg, cty, hwy, displ)

Plot the contribution to a measure by combinations of two categories

Description

Plot the contribution to a measure by combinations of two categories

Usage

two_category_contribution(
  data,
  category1,
  category2,
  measure,
  separate = FALSE
)
two_category_contribution(
  data,
  category1,
  category2,
  measure,
  separate = FALSE
)

Arguments

`data`	A data frame or tibble
`category1`, `category2`	Unquoted names of category columns (can be factor, character or numeric)
`measure`	Unquoted name of measure
`separate`	Boolean to indicate whether the plots for different combinations should be in different facets

Value

A ggplot plot object

Examples

two_category_contribution(ggplot2::diamonds, cut, clarity, price)
two_category_contribution(ggplot2::diamonds,  clarity, cut, price, separate = TRUE)
two_category_contribution(ggplot2::diamonds, cut, clarity, price)
two_category_contribution(ggplot2::diamonds,  clarity, cut, price, separate = TRUE)

Plot counts of combinations of two category columns

Description

Plot counts of combinations of two category columns

Usage

two_category_tally(
  data,
  main_category,
  sub_category,
  separate = FALSE,
  position = "stack"
)
two_category_tally(
  data,
  main_category,
  sub_category,
  separate = FALSE,
  position = "stack"
)

Arguments

`data`	A data frame or tibble
`main_category`, `sub_category`	Unquoted column names of two categories (can be factor, character or numeric)
`separate`	Boolean indicating whether the plot should be faceted or not
`position`	"stack" or "dodge"

Value

A ggplot plot object

Examples

two_category_tally(ggplot2::mpg, class, drv)
two_category_tally(ggplot2::mpg, class, drv, position = "dodge")
two_category_tally(ggplot2::mpg, class, drv, separate = TRUE)
two_category_tally(ggplot2::diamonds, cut, clarity)
two_category_tally(ggplot2::diamonds, cut, clarity, separate = TRUE)
two_category_tally(ggplot2::mpg, class, drv)
two_category_tally(ggplot2::mpg, class, drv, position = "dodge")
two_category_tally(ggplot2::mpg, class, drv, separate = TRUE)
two_category_tally(ggplot2::diamonds, cut, clarity)
two_category_tally(ggplot2::diamonds, cut, clarity, separate = TRUE)

Plot the relationship between two measures and optionally highlight a category

Description

Plot the relationship between two measures and optionally highlight a category

Usage

two_measures_relationship(data, measure1, measure2, category = NULL)
two_measures_relationship(data, measure1, measure2, category = NULL)

Arguments

`data`	A data frame or tibble
`measure1`, `measure2`	Unquoted column names of measures
`category`	Unquoted name of a category (can be factor, character or numeric)

Value

A ggplot plot object

Examples


two_measures_relationship(ggplot2::diamonds, carat, price)
two_measures_relationship(ggplot2::diamonds, carat, depth)

two_measures_relationship(ggplot2::mpg, displ, hwy)
two_measures_relationship(ggplot2::mpg, cty, hwy)
two_measures_relationship(ggplot2::mpg, displ, hwy, class)
two_measures_relationship(ggplot2::diamonds, carat, price)
two_measures_relationship(ggplot2::diamonds, carat, depth)

two_measures_relationship(ggplot2::mpg, displ, hwy)
two_measures_relationship(ggplot2::mpg, cty, hwy)
two_measures_relationship(ggplot2::mpg, displ, hwy, class)

Package 'ezEDA'

Help Index

Plot the contribution of different categories to a measure

Description

Usage

Arguments

Value

Examples

Plot counts of a category

Description

Usage

Arguments

Value

Examples

Private utility function: given a possibly non-factor column passed as a quosure, convert into a factor

Description

Usage

Arguments

Value

ezeda: A package for task oriented exploratory data analysis

Description

Details

tally functions

contribution functions

measure distribution functions

measure relationship functions

measure trend functions

Plot the change of a measure (or set of measures) over time where the data is in "long" format That is, all measures are in one column with another column labeling each measure value

Description

Usage

Arguments

Value

Examples

Plot the change of a measure (or set of measures) over time where each measure is in a different column

Description

Usage

Arguments

Value

Examples

Plot the distribution of a numeric (measure) column

Description

Usage

Arguments

Value

Examples

Plot the distribution of a numeric (measure) column differentiated by a category

Description

Usage

Arguments

Value

Examples

Plot the distribution of a numeric (measure) column differentiated by two categories

Description

Usage

Arguments

Value

Examples

Plot the change of distribution of a numeric (measure) column over time

Description

Usage

Arguments

Value

Examples

Plot the relationship between many measures

Description

Usage

Arguments

Value

Examples

Plot the contribution to a measure by combinations of two categories

Description

Usage

Arguments

Value

Examples

Plot counts of combinations of two category columns

Description

Usage

Arguments

Value