Title: | Task Oriented Interface for Exploratory Data Analysis |
---|---|
Description: | Enables users to create visualizations using functions based on the data analysis task rather than on plotting mechanics. It hides the details of the individual 'ggplot2' function calls and allows the user to focus on the end goal. Useful for quick preliminary explorations. Provides functions for common exploration patterns. Some of the ideas in this package are motivated by Fox (2015, ISBN:1938377052). |
Authors: | Viswa Viswanathan [aut, cre] |
Maintainer: | Viswa Viswanathan <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1 |
Built: | 2025-03-12 04:02:06 UTC |
Source: | https://github.com/kviswana/ezeda |
Plot the contribution of different categories to a measure
category_contribution(data, category, measure)
category_contribution(data, category, measure)
data |
A data frame or tibble |
category |
Unquoted name of category (can be factor, character or numeric) |
measure |
Unquoted name of measure |
A ggplot plot object
category_contribution(ggplot2::diamonds, cut, price) category_contribution(ggplot2::diamonds, clarity, price)
category_contribution(ggplot2::diamonds, cut, price) category_contribution(ggplot2::diamonds, clarity, price)
Plot counts of a category
category_tally(data, category_column)
category_tally(data, category_column)
data |
A data frame or tibble |
category_column |
Unquoted column name of category (can be factor, character or numeric) |
A ggplot plot object
category_tally(ggplot2::mpg, class) category_tally(ggplot2::diamonds, cut)
category_tally(ggplot2::mpg, class) category_tally(ggplot2::diamonds, cut)
Private utility function: given a possibly non-factor column passed as a quosure, convert into a factor
col_to_factor(data, col_enquo)
col_to_factor(data, col_enquo)
data |
A data frame or tibble |
col_enquo |
A quosure |
A data frame or tibble with the corresponding column converted to factor if nevessary
The ezeda package provides functions for visualizations for exploratory data analysis. Whereas graphic packages generally provide many functions that users assemble to create suitable plots, each ezeda function warps ggplot and other code to generate a complete plot for common exploratory data analysis task corresponding to a recurring pattern.
ezeda provides five categories of functions: tally, contribution, measure distribution, measure relationship, and measure trend
category_tally
two_category_tally
category_contribution
two_category_contribution
measure_distribution
measure_distribution_by_category
measure_distribution_by_two_categories
measure_distribution_by_time
two_measures_relationship
multi_measure_relationship
measure_change_over_time
measure_change_over_time_long
Plot the change of a measure (or set of measures) over time where the data is in "long" format That is, all measures are in one column with another column labeling each measure value
measure_change_over_time_long( data, time_col, measure_labels, measure_values, ... )
measure_change_over_time_long( data, time_col, measure_labels, measure_values, ... )
data |
A data frame or tibble |
time_col |
Unquoted column name with time values to plot on the x axis |
measure_labels |
Unquoted column name containing the name of the measure in the corresponding measure_values (see below) row (up to 6 measures) |
measure_values |
Unquted column name of the column with the measure values to be plotted |
... |
Unquoted names of measures to plot (up to 6 measures) |
A ggplot plot object
measure_change_over_time_long(ggplot2::economics_long, date, variable, value, pop, unemploy)
measure_change_over_time_long(ggplot2::economics_long, date, variable, value, pop, unemploy)
Plot the change of a measure (or set of measures) over time where each measure is in a different column
measure_change_over_time_wide(data, time_col, ...)
measure_change_over_time_wide(data, time_col, ...)
data |
A data frame or tibble |
time_col |
Unquoted column name with time values to plot on the x axis |
... |
Unquoted column names of one or more measures to plot (up to 6 measures) |
A ggplot plot object
measure_change_over_time_wide(ggplot2::economics, date, pop, unemploy)
measure_change_over_time_wide(ggplot2::economics, date, pop, unemploy)
Plot the distribution of a numeric (measure) column
measure_distribution(data, measure, type = "hist", bwidth = NULL)
measure_distribution(data, measure, type = "hist", bwidth = NULL)
data |
A data frame or tibble |
measure |
Unquoted column name of containing numbers (measure) |
type |
Histogram ("hist") or Boxplot ("box") |
bwidth |
width of bin for histogram (by default uses binwidth for 30 bins) |
A ggplot plot object
measure_distribution(ggplot2::diamonds, price) measure_distribution(ggplot2::mpg, hwy) measure_distribution(ggplot2::mpg, hwy, bwidth = 2) measure_distribution(ggplot2::mpg, hwy, "hist") measure_distribution(ggplot2::mpg, hwy, "box")
measure_distribution(ggplot2::diamonds, price) measure_distribution(ggplot2::mpg, hwy) measure_distribution(ggplot2::mpg, hwy, bwidth = 2) measure_distribution(ggplot2::mpg, hwy, "hist") measure_distribution(ggplot2::mpg, hwy, "box")
Plot the distribution of a numeric (measure) column differentiated by a category
measure_distribution_by_category( data, measure, category, type = "hist", separate = FALSE, bwidth = NULL )
measure_distribution_by_category( data, measure, category, type = "hist", separate = FALSE, bwidth = NULL )
data |
A data frame or tibble |
measure |
Unquoted column name of measure (containing numbers) |
category |
Unquoted column name of category (can be factor, character or numeric) |
type |
Histogram ("hist") or Boxplot ("box") |
separate |
Boolean specifying whether to plot each category in a separate facet |
bwidth |
width of bin for histogram (by default uses binwidth for 30 bins) |
A ggplot plot object
measure_distribution_by_category(ggplot2::diamonds, price, cut) measure_distribution_by_category(ggplot2::mpg, hwy, class) measure_distribution_by_category(ggplot2::diamonds, price, cut, separate = TRUE) measure_distribution_by_category(ggplot2::mpg, hwy, class, separate = TRUE) measure_distribution_by_category(ggplot2::mpg, hwy, class, "box")
measure_distribution_by_category(ggplot2::diamonds, price, cut) measure_distribution_by_category(ggplot2::mpg, hwy, class) measure_distribution_by_category(ggplot2::diamonds, price, cut, separate = TRUE) measure_distribution_by_category(ggplot2::mpg, hwy, class, separate = TRUE) measure_distribution_by_category(ggplot2::mpg, hwy, class, "box")
Plot the distribution of a numeric (measure) column differentiated by two categories
measure_distribution_by_two_categories( data, measure, category1, category2, bwidth = NULL )
measure_distribution_by_two_categories( data, measure, category1, category2, bwidth = NULL )
data |
A data frame or tibble |
measure |
Unquoted column name of containing numbers (measure) |
category1 , category2
|
Unquoted column names of categories (can be factor, character or numeric) |
bwidth |
width of bin for histogram (by default uses binwidth for 30 bins) |
A ggplot plot object
measure_distribution_by_two_categories(ggplot2::mpg, hwy, class, fl) measure_distribution_by_two_categories(ggplot2::diamonds, carat, cut, clarity)
measure_distribution_by_two_categories(ggplot2::mpg, hwy, class, fl) measure_distribution_by_two_categories(ggplot2::diamonds, carat, cut, clarity)
Plot the change of distribution of a numeric (measure) column over time
measure_distribution_over_time(data, measure, time, bwidth = NULL)
measure_distribution_over_time(data, measure, time, bwidth = NULL)
data |
A data frame or tibble |
measure |
Unquoted column name of containing numbers (measure) |
time |
Unquoted name of column containing the time object |
bwidth |
width of bin for histogram (by default uses binwidth for 30 bins) |
A ggplot plot object
h1 <- round(rnorm(50, 60, 8), 0) h2 <- round(rnorm(50, 65, 8), 0) h3 <- round(rnorm(50, 70, 8), 0) h <- c(h1, h2, h3) y <- c(rep(1999, 50), rep(2000, 50), rep(2001, 50)) df <- data.frame(height = h, year = y) measure_distribution_over_time(df, h, year)
h1 <- round(rnorm(50, 60, 8), 0) h2 <- round(rnorm(50, 65, 8), 0) h3 <- round(rnorm(50, 70, 8), 0) h <- c(h1, h2, h3) y <- c(rep(1999, 50), rep(2000, 50), rep(2001, 50)) df <- data.frame(height = h, year = y) measure_distribution_over_time(df, h, year)
Plot the relationship between many measures
multi_measures_relationship(data, ...)
multi_measures_relationship(data, ...)
data |
A data frame or tibble |
... |
Unquoted column names of numeric columns (measures) |
A ggplot plot object
multi_measures_relationship(ggplot2::mpg, hwy, displ) multi_measures_relationship(ggplot2::mpg, cty, hwy, displ)
multi_measures_relationship(ggplot2::mpg, hwy, displ) multi_measures_relationship(ggplot2::mpg, cty, hwy, displ)
Plot the contribution to a measure by combinations of two categories
two_category_contribution( data, category1, category2, measure, separate = FALSE )
two_category_contribution( data, category1, category2, measure, separate = FALSE )
data |
A data frame or tibble |
category1 , category2
|
Unquoted names of category columns (can be factor, character or numeric) |
measure |
Unquoted name of measure |
separate |
Boolean to indicate whether the plots for different combinations should be in different facets |
A ggplot plot object
two_category_contribution(ggplot2::diamonds, cut, clarity, price) two_category_contribution(ggplot2::diamonds, clarity, cut, price, separate = TRUE)
two_category_contribution(ggplot2::diamonds, cut, clarity, price) two_category_contribution(ggplot2::diamonds, clarity, cut, price, separate = TRUE)
Plot counts of combinations of two category columns
two_category_tally( data, main_category, sub_category, separate = FALSE, position = "stack" )
two_category_tally( data, main_category, sub_category, separate = FALSE, position = "stack" )
data |
A data frame or tibble |
main_category , sub_category
|
Unquoted column names of two categories (can be factor, character or numeric) |
separate |
Boolean indicating whether the plot should be faceted or not |
position |
"stack" or "dodge" |
A ggplot plot object
two_category_tally(ggplot2::mpg, class, drv) two_category_tally(ggplot2::mpg, class, drv, position = "dodge") two_category_tally(ggplot2::mpg, class, drv, separate = TRUE) two_category_tally(ggplot2::diamonds, cut, clarity) two_category_tally(ggplot2::diamonds, cut, clarity, separate = TRUE)
two_category_tally(ggplot2::mpg, class, drv) two_category_tally(ggplot2::mpg, class, drv, position = "dodge") two_category_tally(ggplot2::mpg, class, drv, separate = TRUE) two_category_tally(ggplot2::diamonds, cut, clarity) two_category_tally(ggplot2::diamonds, cut, clarity, separate = TRUE)
Plot the relationship between two measures and optionally highlight a category
two_measures_relationship(data, measure1, measure2, category = NULL)
two_measures_relationship(data, measure1, measure2, category = NULL)
data |
A data frame or tibble |
measure1 , measure2
|
Unquoted column names of measures |
category |
Unquoted name of a category (can be factor, character or numeric) |
A ggplot plot object
two_measures_relationship(ggplot2::diamonds, carat, price) two_measures_relationship(ggplot2::diamonds, carat, depth) two_measures_relationship(ggplot2::mpg, displ, hwy) two_measures_relationship(ggplot2::mpg, cty, hwy) two_measures_relationship(ggplot2::mpg, displ, hwy, class)
two_measures_relationship(ggplot2::diamonds, carat, price) two_measures_relationship(ggplot2::diamonds, carat, depth) two_measures_relationship(ggplot2::mpg, displ, hwy) two_measures_relationship(ggplot2::mpg, cty, hwy) two_measures_relationship(ggplot2::mpg, displ, hwy, class)