Title: | Facilitate Exploration of touRR optimisatioN |
---|---|
Description: | Diagnostic plots for optimisation, with a focus on projection pursuit. These show paths the optimiser takes in the high-dimensional space in multiple ways: by reducing the dimension using principal component analysis, and also using the tour to show the path on the high-dimensional space. Several botanical colour palettes are included, reflecting the name of the package. A paper describing the methodology can be found at <https://journal.r-project.org/archive/2021/RJ-2021-105/index.html>. |
Authors: | H. Sherry Zhang [aut, cre] , Dianne Cook [aut] , Ursula Laa [aut] , Nicolas Langrené [aut] , Patricia Menéndez [aut] |
Maintainer: | H. Sherry Zhang <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0 |
Built: | 2025-01-15 04:25:39 UTC |
Source: | https://github.com/huizezhang-sherry/ferrn |
This is a wrapper function used by explore_space_pca()
and
should be be called directly by the user
add_anchor(dt, anchor_size = 3, anchor_alpha = 0.5, anchor_color = NULL, ...)
add_anchor(dt, anchor_size = 3, anchor_alpha = 0.5, anchor_color = NULL, ...)
dt |
A data object from the running the optimisation algorithm in guided tour |
anchor_size |
numeric; the size of the anchor points |
anchor_alpha |
numeric; the alpha of the anchor points |
anchor_color |
the variable to be coloured by |
... |
other aesthetics inherent from |
a wrapper for drawing anchor points in explore_space_pca()
Other draw functions:
add_anno()
,
add_dir_search()
,
add_end()
,
add_interp()
,
add_interp_last()
,
add_interrupt()
,
add_search()
,
add_space()
,
add_start()
,
add_theo()
This is a wrapper function used by explore_space_pca()
and
should be be called directly by the user
add_anno(dt, anno_color = "black", anno_lty = "dashed", anno_alpha = 0.1, ...)
add_anno(dt, anno_color = "black", anno_lty = "dashed", anno_alpha = 0.1, ...)
dt |
A data object from the running the optimisation algorithm in guided tour |
anno_color |
character; the colour of the annotation line |
anno_lty |
character; the linetype of the annotation line |
anno_alpha |
numeric; the alpha of the annotation line |
... |
other aesthetics inherent from |
a wrapper for annotating the symmetry of start points in explore_space_pca()
Other draw functions:
add_anchor()
,
add_dir_search()
,
add_end()
,
add_interp()
,
add_interp_last()
,
add_interrupt()
,
add_search()
,
add_space()
,
add_start()
,
add_theo()
This is a wrapper function used by explore_space_pca()
and
should be be called directly by the user
add_dir_search(dt, dir_size = 0.5, dir_alpha = 0.5, dir_color = NULL, ...)
add_dir_search(dt, dir_size = 0.5, dir_alpha = 0.5, dir_color = NULL, ...)
dt |
A data object from the running the optimisation algorithm in guided tour |
dir_size |
numeric; the size of the directional search points in pseudo derivative search |
dir_alpha |
numeric; the alpha of the directional search points in pseudo derivative search |
dir_color |
the variable to be coloured by |
... |
other aesthetics inherent from |
a wrapper for drawing directional search points (used in pseudo derivative search) with buffer in explore_space_pca()
Other draw functions:
add_anchor()
,
add_anno()
,
add_end()
,
add_interp()
,
add_interp_last()
,
add_interrupt()
,
add_search()
,
add_space()
,
add_start()
,
add_theo()
This is a wrapper function used by explore_space_pca()
and
should be be called directly by the user
add_end(dt, end_size = 5, end_alpha = 1, end_color = NULL, ...)
add_end(dt, end_size = 5, end_alpha = 1, end_color = NULL, ...)
dt |
A data object from the running the optimisation algorithm in guided tour |
end_size |
numeric; the size of the end point |
end_alpha |
numeric; the alpha of the end point |
end_color |
the variable to be coloured by |
... |
other aesthetics inherent from |
a wrapper for drawing end points in explore_space_pca()
Other draw functions:
add_anchor()
,
add_anno()
,
add_dir_search()
,
add_interp()
,
add_interp_last()
,
add_interrupt()
,
add_search()
,
add_space()
,
add_start()
,
add_theo()
This is a wrapper function used by explore_space_pca()
and
should be be called directly by the user
add_interp( dt, interp_size = 1.5, interp_alpha = NULL, interp_color = NULL, interp_group = NULL, ... )
add_interp( dt, interp_size = 1.5, interp_alpha = NULL, interp_color = NULL, interp_group = NULL, ... )
dt |
A data object from the running the optimisation algorithm in guided tour |
interp_size |
numeric; the size of the interpolation path |
interp_alpha |
numeric; the alpha of the interpolation path |
interp_color |
the variable to be coloured by |
interp_group |
the variable to label different interpolation path |
... |
other aesthetics inherent from |
a wrapper for drawing the interpolation points in explore_space_pca()
Other draw functions:
add_anchor()
,
add_anno()
,
add_dir_search()
,
add_end()
,
add_interp_last()
,
add_interrupt()
,
add_search()
,
add_space()
,
add_start()
,
add_theo()
This is a wrapper function used by explore_space_pca()
and
should be be called directly by the user
add_interp_last( dt, interp_last_size = 3, interp_last_alpha = 1, interp_last_color = NULL, ... )
add_interp_last( dt, interp_last_size = 3, interp_last_alpha = 1, interp_last_color = NULL, ... )
dt |
A data object from the running the optimisation algorithm in guided tour |
interp_last_size |
numeric; the size of the last interpolation points in each iteration |
interp_last_alpha |
numeric; the alpha of the last interpolation points in each iteration |
interp_last_color |
the variable to be coloured by |
... |
other aesthetics inherent from |
a wrapper for drawing the last interpolation points of each iteration in explore_space_pca()
Other draw functions:
add_anchor()
,
add_anno()
,
add_dir_search()
,
add_end()
,
add_interp()
,
add_interrupt()
,
add_search()
,
add_space()
,
add_start()
,
add_theo()
This is a wrapper function used by explore_space_pca()
and
should be be called directly by the user
add_interrupt( dt, interrupt_size = 0.5, interrupt_alpha = NULL, interrupt_color = NULL, interrupt_group = NULL, interrupt_linetype = "dashed", ... )
add_interrupt( dt, interrupt_size = 0.5, interrupt_alpha = NULL, interrupt_color = NULL, interrupt_group = NULL, interrupt_linetype = "dashed", ... )
dt |
A data object from the running the optimisation algorithm in guided tour |
interrupt_size |
numeric; the size of the interruption path |
interrupt_alpha |
numeric; the alpha of the interruption path |
interrupt_color |
the variable to be coloured by |
interrupt_group |
the variable to label different interruption |
interrupt_linetype |
character; the linetype to annotate the interruption |
... |
other aesthetics inherent from |
a wrapper for annotating the interruption in explore_space_pca()
Other draw functions:
add_anchor()
,
add_anno()
,
add_dir_search()
,
add_end()
,
add_interp()
,
add_interp_last()
,
add_search()
,
add_space()
,
add_start()
,
add_theo()
This is a wrapper function used by explore_space_pca()
and
should be be called directly by the user
add_search(dt, search_size = 0.5, search_alpha = 0.5, search_color = NULL, ...)
add_search(dt, search_size = 0.5, search_alpha = 0.5, search_color = NULL, ...)
dt |
A data object from the running the optimisation algorithm in guided tour |
search_size |
numeric; the size of the search points |
search_alpha |
numeric; the alpha of the anchor points |
search_color |
the variable to be coloured by |
... |
other aesthetics inherent from |
a wrapper for drawing search points in explore_space_pca()
Other draw functions:
add_anchor()
,
add_anno()
,
add_dir_search()
,
add_end()
,
add_interp()
,
add_interp_last()
,
add_interrupt()
,
add_space()
,
add_start()
,
add_theo()
This is a wrapper function used by explore_space_pca()
and
should be be called directly by the user
add_space( dt, space_alpha = 0.5, space_fill = "grey92", space_color = "white", cent_size = 1, cent_alpha = 1, cent_color = "black", ... )
add_space( dt, space_alpha = 0.5, space_fill = "grey92", space_color = "white", cent_size = 1, cent_alpha = 1, cent_color = "black", ... )
dt |
A data object from the running the optimisation algorithm in guided tour |
space_alpha |
numeric; the alpha of the basis space |
space_fill |
character; the colour of the space filling |
space_color |
character; the colour of the space brim |
cent_size |
numeric; the size of the centre point |
cent_alpha |
numeric; an alpha of the centre point |
cent_color |
character; the colour of the centre point |
... |
other aesthetics inherent from |
a wrapper for drawing the space in explore_space_pca()
Other draw functions:
add_anchor()
,
add_anno()
,
add_dir_search()
,
add_end()
,
add_interp()
,
add_interp_last()
,
add_interrupt()
,
add_search()
,
add_start()
,
add_theo()
library(ggplot2) space <- tibble::tibble(x0 = 0, y0 = 0, r = 5) ggplot() + add_space(space) + theme_void() + theme(aspect.ratio = 1)
library(ggplot2) space <- tibble::tibble(x0 = 0, y0 = 0, r = 5) ggplot() + add_space(space) + theme_void() + theme(aspect.ratio = 1)
This is a wrapper function used by explore_space_pca()
and
should be be called directly by the user
add_start(dt, start_size = 5, start_alpha = 1, start_color = NULL, ...)
add_start(dt, start_size = 5, start_alpha = 1, start_color = NULL, ...)
dt |
A data object from the running the optimisation algorithm in guided tour |
start_size |
numeric; the size of start point |
start_alpha |
numeric; the alpha of start point |
start_color |
the variable to be coloured by |
... |
other aesthetics inherent from |
a wrapper for drawing start points in explore_space_pca()
Other draw functions:
add_anchor()
,
add_anno()
,
add_dir_search()
,
add_end()
,
add_interp()
,
add_interp_last()
,
add_interrupt()
,
add_search()
,
add_space()
,
add_theo()
library(ggplot2) # construct the space and start df for plotting space <- tibble::tibble(x0 = 0, y0 = 0, r = 5) holes_1d_geo %>% compute_pca() %>% purrr::pluck("aug") %>% clean_method() %>% get_start()
library(ggplot2) # construct the space and start df for plotting space <- tibble::tibble(x0 = 0, y0 = 0, r = 5) holes_1d_geo %>% compute_pca() %>% purrr::pluck("aug") %>% clean_method() %>% get_start()
This is a wrapper function used by explore_space_pca()
and
should be be called directly by the user
add_theo( dt, theo_label = "*", theo_size = 25, theo_alpha = 0.8, theo_color = "#000000", ... )
add_theo( dt, theo_label = "*", theo_size = 25, theo_alpha = 0.8, theo_color = "#000000", ... )
dt |
A data object from the running the optimisation algorithm in guided tour |
theo_label |
character; a symbol to label the theoretical point |
theo_size |
numeric; the size of the theoretical point |
theo_alpha |
numeric; the alpha of the theoretical point |
theo_color |
character; the colour of the theoretical point in hex |
... |
other aesthetics inherent from |
a wrapper for drawing theoretical points in explore_space_pca()
Other draw functions:
add_anchor()
,
add_anno()
,
add_dir_search()
,
add_end()
,
add_interp()
,
add_interp_last()
,
add_interrupt()
,
add_search()
,
add_space()
,
add_start()
Given the orthonormality constraint, the projection bases live in a high dimensional hollow sphere. Generating random points on the sphere is useful to perceive the data object in the high dimensional space.
bind_random(dt, n = 500, seed = 1)
bind_random(dt, n = 500, seed = 1)
dt |
a data object collected by the projection pursuit guided tour optimisation in the |
n |
numeric; the number of random bases to generate in each dimension by geozoo |
seed |
numeric; a seed for generating reproducible random bases from geozoo |
a tibble object containing both the searched and random bases
Other bind:
bind_random_matrix()
,
bind_theoretical()
bind_random(holes_1d_better) %>% tail(5)
bind_random(holes_1d_better) %>% tail(5)
Bind random bases in the projection bases space as a matrix
bind_random_matrix(basis, n = 500, d = 1, front = FALSE, seed = 1)
bind_random_matrix(basis, n = 500, d = 1, front = FALSE, seed = 1)
basis |
a matrix returned by |
n |
numeric; the number of random bases to generate in each dimension by geozoo |
d |
numeric; dimension of the basis, d = 1, 2, ... |
front |
logical; if the random bases should be bound before or after the original bases |
seed |
numeric; a seed for generating reproducible random bases from geozoo |
matrix
a matrix containing both the searched and random bases
Other bind:
bind_random()
,
bind_theoretical()
data <- get_basis_matrix(holes_1d_geo) bind_random_matrix(data) %>% tail(5)
data <- get_basis_matrix(holes_1d_geo) bind_random_matrix(data) %>% tail(5)
The theoretical best basis is usually known for a simulated problem. Augment this information into the data object allows for evaluating the performance of optimisation against the theory.
bind_theoretical(dt, matrix, index, raw_data)
bind_theoretical(dt, matrix, index, raw_data)
dt |
a data object collected by the projection pursuit guided tour optimisation in the |
matrix |
a matrix of the theoretical basis |
index |
the index function used to calculate the index value |
raw_data |
a tibble of the original data used to calculate the index value |
a tibble object containing both the searched and theoretical best bases
Other bind:
bind_random()
,
bind_random_matrix()
best <- matrix(c(0, 1, 0, 0, 0), nrow = 5) tail(holes_1d_better %>% bind_theoretical(best, tourr::holes(), raw_data = boa5), 1)
best <- matrix(c(0, 1, 0, 0, 0), nrow = 5) tail(holes_1d_better %>% bind_theoretical(best, tourr::holes(), raw_data = boa5), 1)
Available colours in the palettes
botanical_palettes botanical_pal(palette = "fern", reverse = FALSE)
botanical_palettes botanical_pal(palette = "fern", reverse = FALSE)
palette |
Colour palette from the botanical_palette |
reverse |
logical, if the colour should be reversed |
An object of class list
of length 5.
a function for interpolating colour in the botanical palette
Clean method names
clean_method(dt)
clean_method(dt)
dt |
a data object |
a tibble with method cleaned
head(clean_method(holes_1d_better), 5)
head(clean_method(holes_1d_better), 5)
Plot the PCA projection of the projection bases space
explore_space_start(dt, group = NULL, pca = TRUE, ...) explore_space_end(dt, group = NULL, pca = TRUE, ...) explore_space_pca( dt, details = FALSE, pca = TRUE, group = NULL, color = NULL, facet = NULL, ..., animate = FALSE )
explore_space_start(dt, group = NULL, pca = TRUE, ...) explore_space_end(dt, group = NULL, pca = TRUE, ...) explore_space_pca( dt, details = FALSE, pca = TRUE, group = NULL, color = NULL, facet = NULL, ..., animate = FALSE )
dt |
a data object collected by the projection pursuit guided tour optimisation in |
group |
the variable to label different runs of the optimiser(s) |
pca |
logical; if PCA coordinates need to be computed for the data |
... |
other arguments passed to |
details |
logical; if components other than start, end and interpolation need to be shown |
color |
the variable to be coloured by |
facet |
the variable to be faceted by |
animate |
logical; if the interpolation path needs to be animated |
a ggplot2 object
Other main plot functions:
explore_space_tour()
,
explore_trace_interp()
,
explore_trace_search()
dplyr::bind_rows(holes_1d_geo, holes_1d_better) %>% bind_theoretical(matrix(c(0, 1, 0, 0, 0), nrow = 5), index = tourr::holes(), raw_data = boa5 ) %>% explore_space_pca(group = method, details = TRUE) + scale_color_discrete_botanical() ## Not run: best <- matrix(c(0, 1, 0, 0, 0), nrow = 5) dt <- bind_theoretical(holes_1d_jellyfish, best, tourr::holes(), raw_data = boa5) explore_space_start(dt) explore_space_end(dt, group = loop, theo_size = 10, theo_color = "#FF0000") explore_space_pca( dt, facet = loop, interp_size = 0.5, theo_size = 10, start_size = 1, end_size = 3 ) ## End(Not run)
dplyr::bind_rows(holes_1d_geo, holes_1d_better) %>% bind_theoretical(matrix(c(0, 1, 0, 0, 0), nrow = 5), index = tourr::holes(), raw_data = boa5 ) %>% explore_space_pca(group = method, details = TRUE) + scale_color_discrete_botanical() ## Not run: best <- matrix(c(0, 1, 0, 0, 0), nrow = 5) dt <- bind_theoretical(holes_1d_jellyfish, best, tourr::holes(), raw_data = boa5) explore_space_start(dt) explore_space_end(dt, group = loop, theo_size = 10, theo_color = "#FF0000") explore_space_pca( dt, facet = loop, interp_size = 0.5, theo_size = 10, start_size = 1, end_size = 3 ) ## End(Not run)
Plot the grand tour animation of the bases space in high dimension
explore_space_tour(..., axes = "bottomleft") prep_space_tour( dt, group = NULL, flip = FALSE, n_random = 2000, color = NULL, rand_size = 1, rand_color = "#D3D3D3", point_size = 1.5, end_size = 5, theo_size = 3, theo_shape = 17, theo_color = "black", palette = botanical_palettes$fern, ... )
explore_space_tour(..., axes = "bottomleft") prep_space_tour( dt, group = NULL, flip = FALSE, n_random = 2000, color = NULL, rand_size = 1, rand_color = "#D3D3D3", point_size = 1.5, end_size = 5, theo_size = 3, theo_shape = 17, theo_color = "black", palette = botanical_palettes$fern, ... )
... |
other argument passed to |
axes |
see [tourr::animate_xy()] |
dt |
a data object collected by the projection pursuit guided tour optimisation in |
group |
the variable to label different runs of the optimiser(s) |
flip |
logical; if the sign flipping need to be performed |
n_random |
numeric; the number of random basis to generate |
color |
the variable to be coloured by |
rand_size |
numeric; the size of random points |
rand_color |
character; the color hex code for random points |
point_size |
numeric; the size of points searched by the optimiser(s) |
end_size |
numeric; the size of end points |
theo_size |
numeric; the size of theoretical point(s) |
theo_shape |
numeric; the shape symbol in the basic plot |
theo_color |
character; the color of theoretical point(s) |
palette |
the colour palette to be used |
explore_space_tour()
an animation of the search path in the high-dimensional sphere
prep_space_tour()
a list containing various components needed for producing the animation
Other main plot functions:
explore_space_start()
,
explore_trace_interp()
,
explore_trace_search()
if (FALSE){ explore_space_tour(dplyr::bind_rows(holes_1d_better, holes_1d_geo), group = method, palette = botanical_palettes$fern[c(1, 6)] ) }
if (FALSE){ explore_space_tour(dplyr::bind_rows(holes_1d_better, holes_1d_geo), group = method, palette = botanical_palettes$fern[c(1, 6)] ) }
Trace the index value of search/ interpolation points in guided tour optimisation
explore_trace_interp( dt, iter = NULL, color = NULL, group = NULL, cutoff = 50, target_size = 3, interp_size = 1, accuracy_x = 5, accuracy_y = 0.01 )
explore_trace_interp( dt, iter = NULL, color = NULL, group = NULL, cutoff = 50, target_size = 3, interp_size = 1, accuracy_x = 5, accuracy_y = 0.01 )
dt |
a data object collected by the projection pursuit guided tour optimisation in |
iter |
the variable to be plotted on the x-axis |
color |
the variable to be coloured by |
group |
the variable to label different runs of the optimiser(s) |
cutoff |
numeric; if the number of interpolating points is smaller than |
target_size |
numeric; the size of target points in the interpolation |
interp_size |
numeric; the size of interpolation points |
accuracy_x |
numeric; If the difference of two neighbour x-labels is smaller than |
accuracy_y |
numeric; the precision of y-axis label |
a ggplot object for diagnosing how the index value progresses during the interpolation
Other main plot functions:
explore_space_start()
,
explore_space_tour()
,
explore_trace_search()
# Compare the trace of interpolated points in two algorithms holes_1d_better %>% explore_trace_interp(interp_size = 2) + scale_color_continuous_botanical(palette = "fern")
# Compare the trace of interpolated points in two algorithms holes_1d_better %>% explore_trace_interp(interp_size = 2) + scale_color_continuous_botanical(palette = "fern")
Plot the count in each iteration
explore_trace_search( dt, iter = NULL, color = NULL, cutoff = 15, extend_lower = 0.95, ... )
explore_trace_search( dt, iter = NULL, color = NULL, cutoff = 15, extend_lower = 0.95, ... )
dt |
a data object collected by the projection pursuit guided tour optimisation in |
iter |
the variable to be plotted on the x-axis |
color |
the variable to be coloured by |
cutoff |
numeric; if the number of searches in one iteration is smaller than |
extend_lower |
a numeric for extending the y-axis to display text labels |
... |
arguments passed into geom_label_repel() for displaying text labels |
a ggplot object for diagnosing how many points the optimiser(s) have searched
Other main plot functions:
explore_space_start()
,
explore_space_tour()
,
explore_trace_interp()
# Summary plots for search points in two algorithms library(patchwork) library(dplyr) library(ggplot2) p1 <- holes_1d_better %>% explore_trace_search() + scale_color_continuous_botanical(palette = "fern") p2 <- holes_2d_better_max_tries %>% explore_trace_search() + scale_color_continuous_botanical(palette = "daisy") p1 / p2
# Summary plots for search points in two algorithms library(patchwork) library(dplyr) library(ggplot2) p1 <- holes_1d_better %>% explore_trace_search() + scale_color_continuous_botanical(palette = "fern") p2 <- holes_2d_better_max_tries %>% explore_trace_search() + scale_color_continuous_botanical(palette = "daisy") p1 / p2
Helper functions for 'explore_space_pca()'
flip_sign(dt, group = NULL, ...) compute_pca(dt, group = NULL, random = TRUE, flip = TRUE, ...)
flip_sign(dt, group = NULL, ...) compute_pca(dt, group = NULL, random = TRUE, flip = TRUE, ...)
dt |
a data object collected by the projection pursuit guided tour optimisation in |
group |
the variable to label different runs of the optimiser(s) |
... |
other arguments received from |
random |
logical; if random bases from the basis space need to be added to the data |
flip |
logical; if the sign flipping need to be performed |
flip_sign()
: a list containing a matrix of all the bases, a logical
value indicating whether a flip of sign is performed, and a data frame of
the original dataset.
compute_pca()
: a list containing the PCA summary and a data frame
with PC coordinates augmented.
dt <- dplyr::bind_rows(holes_1d_geo, holes_1d_better) flip_sign(dt, group = method) %>% str(max = 1) compute_pca(dt, group = method)
dt <- dplyr::bind_rows(holes_1d_geo, holes_1d_better) flip_sign(dt, group = method) %>% str(max = 1) compute_pca(dt, group = method)
Better label formatting to avoid overlapping
format_label(labels, accuracy)
format_label(labels, accuracy)
labels |
a numerical vector of labels |
accuracy |
the accuracy of the label |
a vector of adjusted labels
format_label(c(0.87, 0.87, 0.9, 0.93, 0.95), 0.01) format_label(c(0.87, 0.87, 0.9, 0.93, 0.95, 0.96, 0.96), 0.01)
format_label(c(0.87, 0.87, 0.9, 0.93, 0.95), 0.01) format_label(c(0.87, 0.87, 0.9, 0.93, 0.95, 0.96, 0.96), 0.01)
The Huber plot presents the projection pursuit index values of 2D data in each 1D
projection in polar coordinates, corresponding to each projection direction.
It offers a simpler illustration of more complex projection from
high-dimensional data to lower dimensions in projection pursuit. The
function prep_huber()
calculates each component required for the Huber plot
(see details), which can then be supplied to various geom layers in ggplot2.
geom_huber( mapping = NULL, data = NULL, stat = "identity", position = "identity", ..., show.legend = NA, inherit.aes = TRUE ) prep_huber(data, index) theme_huber(...)
geom_huber( mapping = NULL, data = NULL, stat = "identity", position = "identity", ..., show.legend = NA, inherit.aes = TRUE ) prep_huber(data, index) theme_huber(...)
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
stat |
The statistical transformation to use on the data for this layer.
When using a
|
position |
A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The
|
... |
Other arguments passed on to
|
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
index |
a function, the projection pursuit index function, see examples |
the prep_huber()
function calculates components required for
making the Huber plots. It returns a list including three elements:
idx_df
data frame: the x/y coordinates of the index value, in polar coordinates. Used for plotting the index value at each projection direction, with the reference circle.
proj_df
data frame: the best 1D projection. Used for plotting the 1D projection in histogram.
slope
value: the slope to plot in the Huber plot to indicate the direction of the best 1D projection.
library(ggplot2) library(tourr) library(ash) data(randu) randu_std <- as.data.frame(apply(randu, 2, function(x) (x-mean(x))/sd(x))) randu_std$yz <- sqrt(35)/6*randu_std$y-randu_std$z/6 randu_df <- randu_std[c(1,4)] randu_huber <- prep_huber(randu_df, index = norm_bin(nr = nrow(randu_df))) ggplot() + geom_huber(data = randu_huber$idx_df, aes(x = x, y = y)) + geom_point(data = randu_df, aes(x = x, y = yz)) + geom_abline(slope = randu_huber$slope, intercept = 0) + theme_huber() + coord_fixed() ggplot(randu_huber$proj_df, aes(x = x)) + geom_histogram(breaks = seq(-2.2, 2.4, 0.12)) + xlab("") + ylab("") + theme_bw() + theme(axis.text.y = element_blank())
library(ggplot2) library(tourr) library(ash) data(randu) randu_std <- as.data.frame(apply(randu, 2, function(x) (x-mean(x))/sd(x))) randu_std$yz <- sqrt(35)/6*randu_std$y-randu_std$z/6 randu_df <- randu_std[c(1,4)] randu_huber <- prep_huber(randu_df, index = norm_bin(nr = nrow(randu_df))) ggplot() + geom_huber(data = randu_huber$idx_df, aes(x = x, y = y)) + geom_point(data = randu_df, aes(x = x, y = yz)) + geom_abline(slope = randu_huber$slope, intercept = 0) + theme_huber() + coord_fixed() ggplot(randu_huber$proj_df, aes(x = x)) + geom_histogram(breaks = seq(-2.2, 2.4, 0.12)) + xlab("") + ylab("") + theme_bw() + theme(axis.text.y = element_blank())
Functions to get components from the data collecting object
get_best(dt, group = NULL) get_start(dt, group = NULL) get_interp(dt, group = NULL) get_interp_last(dt, group = NULL) get_anchor(dt, group = NULL) get_search(dt) get_dir_search(dt, ratio = 5, ...) get_space_param(dt, ...) get_theo(dt) get_interrupt(dt, group = NULL, precision = 0.001) get_search_count(dt, iter = NULL, group = NULL) get_basis_matrix(dt)
get_best(dt, group = NULL) get_start(dt, group = NULL) get_interp(dt, group = NULL) get_interp_last(dt, group = NULL) get_anchor(dt, group = NULL) get_search(dt) get_dir_search(dt, ratio = 5, ...) get_space_param(dt, ...) get_theo(dt) get_interrupt(dt, group = NULL, precision = 0.001) get_search_count(dt, iter = NULL, group = NULL) get_basis_matrix(dt)
dt |
a data object collected by the projection pursuit guided tour optimisation in the |
group |
the variable to label different runs of the optimiser(s) |
ratio |
numeric; a buffer value to deviate directional search points from the anchor points |
... |
other arguments passed to |
precision |
numeric; if the index value of the last interpolating point and the anchor point differ by |
iter |
the variable to be counted by |
get_best
: extract the best basis found by the optimiser(s)
get_start
: extract the start point of the optimisation
get_interp
: extract the interpolation points
get_interp_last
: extract the last point in each interpolation
get_anchor
: extract the anchor points on the geodesic path
get_search
: extract search points in the optimisation (for
search_geodesic
)
get_dir_search
: extract directional search points (for
search_geodesic
)
get_space_param
: estimate the radius of the background circle
based on the randomly generated points. The space of projected bases is a
circle when reduced to 2D. A radius is estimated using the largest distance
from the bases in the data object to the centre point.
get_theo
: extract the theoretical basis, if exist
get_interrupt
: extract the end point of the interpolation and the
target point in the iteration when an interruption happens. The optimiser
can find better basis on the interpolation path, an interruption is
implemented to stop further interpolation from the highest point to the
target point. This discrepancy is highlighted in the PCA plot.
get_search_count
: summarise the number of search points in each iteration
get_basis_matrix
: extract all the bases as a matrix
a tibble object containing the best basis found by the optimiser(s)
get_search(holes_1d_geo) get_anchor(holes_1d_geo) get_start(holes_1d_better) get_interrupt(holes_1d_better) get_interp(holes_1d_better) %>% head() get_basis_matrix(holes_1d_better) %>% head() get_best(dplyr::bind_rows(holes_1d_better, holes_1d_geo), group = method) get_search_count(holes_1d_better) get_search_count(dplyr::bind_rows(holes_1d_better, holes_1d_geo), group = method) get_interp_last(holes_1d_better) get_interp_last(dplyr::bind_rows(holes_1d_better, holes_1d_geo), group = method) res <- holes_1d_geo %>% compute_pca() %>% purrr::pluck("aug") get_dir_search(res) best <- matrix(c(0, 1, 0, 0, 0), nrow = 5) holes_1d_better %>% bind_theoretical(best, tourr::holes(), raw_data = boa5) %>% get_theo()
get_search(holes_1d_geo) get_anchor(holes_1d_geo) get_start(holes_1d_better) get_interrupt(holes_1d_better) get_interp(holes_1d_better) %>% head() get_basis_matrix(holes_1d_better) %>% head() get_best(dplyr::bind_rows(holes_1d_better, holes_1d_geo), group = method) get_search_count(holes_1d_better) get_search_count(dplyr::bind_rows(holes_1d_better, holes_1d_geo), group = method) get_interp_last(holes_1d_better) get_interp_last(dplyr::bind_rows(holes_1d_better, holes_1d_geo), group = method) res <- holes_1d_geo %>% compute_pca() %>% purrr::pluck("aug") get_dir_search(res) best <- matrix(c(0, 1, 0, 0, 0), nrow = 5) holes_1d_better %>% bind_theoretical(best, tourr::holes(), raw_data = boa5) %>% get_theo()
Simulated data to demonstrate the usage of four diagnostic plots in the package, users can create their own guided tour data objects and diagnose with the visualisation designed in this package.
holes_1d_geo holes_1d_better holes_1d_jellyfish holes_2d_jellyfish holes_2d_better holes_2d_better_max_tries
holes_1d_geo holes_1d_better holes_1d_jellyfish holes_2d_jellyfish holes_2d_better holes_2d_better_max_tries
An object of class tbl_df
(inherits from tbl
, data.frame
) with 416 rows and 8 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 79 rows and 8 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 2500 rows and 8 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 2500 rows and 8 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 98 rows and 8 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 1499 rows and 8 columns.
The prefix holes_*
indicates the use of holes index in the guided tour.
The suffix *_better/geo/jellyfish
indicates the optimiser used:
search_better
, search_geodesic
, search_jellyfish
.
holes_1d_better %>% explore_trace_interp(interp_size = 2) + scale_color_continuous_botanical(palette = "fern")
holes_1d_better %>% explore_trace_interp(interp_size = 2) + scale_color_continuous_botanical(palette = "fern")
Plot the projection from the optimisation data collected from projection pursuit
plot_projection( dt, data, id = NULL, cols = NULL, label = TRUE, animate_along = NULL, keep = 0.2 ) compute_projection(dt, data, id = NULL, cols = NULL)
plot_projection( dt, data, id = NULL, cols = NULL, label = TRUE, animate_along = NULL, keep = 0.2 ) compute_projection(dt, data, id = NULL, cols = NULL)
dt |
a data object collected by the projection pursuit guided tour optimisation in |
data |
the original data |
id |
the grouping variable |
cols |
additional columns to include in the plot |
label |
logical, whether to label each panel by its index value |
animate_along |
the variable to animate along |
keep |
numeric, the proportion of the data to keep for animation (default is 0.2). Only used when 'animate_along' is not NULL |
a ggplot object
library(dplyr) holes_2d_jellyfish |> filter(loop == 1, tries %in% seq(1, 50, 5)) |> plot_projection(data = boa6) ## Not run: library(dplyr) # track the first jellyfish (loop == 1) holes_2d_jellyfish |> filter(loop == 1) |> plot_projection(data = boa6, animate_along = tries, id = loop) ## End(Not run)
library(dplyr) holes_2d_jellyfish |> filter(loop == 1, tries %in% seq(1, 50, 5)) |> plot_projection(data = boa6) ## Not run: library(dplyr) # track the first jellyfish (loop == 1) holes_2d_jellyfish |> filter(loop == 1) |> plot_projection(data = boa6, animate_along = tries, id = loop) ## End(Not run)
Function to calculate smoothness and squintability
sample_bases( idx, data = sine1000, n_basis = 300, parallel = FALSE, best = matrix(c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1), nrow = 6), min_proj_dist = NA, step_size = NA, seed = 123 ) ## S3 method for class 'basis_df' print(x, width = NULL, ...) ## S3 method for class 'basis_df' tbl_sum(x) calc_smoothness( basis_df, start_params = c(0.001, 0.5, 2, 2), other_gp_params = NULL, verbose = FALSE ) ## S3 method for class 'smoothness_res' print(x, width = NULL, ...) ## S3 method for class 'smoothness_res' tbl_sum(x) calc_squintability( basis_df, method = c("ks", "nls"), scale = TRUE, bin_width = 0.005, other_params = NULL ) ## S3 method for class 'squintability_res' print(x, width = NULL, ...) ## S3 method for class 'squintability_res' tbl_sum(x) fit_ks(basis_df, idx, other_params = NULL) fit_nls(basis_df, other_params = NULL)
sample_bases( idx, data = sine1000, n_basis = 300, parallel = FALSE, best = matrix(c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1), nrow = 6), min_proj_dist = NA, step_size = NA, seed = 123 ) ## S3 method for class 'basis_df' print(x, width = NULL, ...) ## S3 method for class 'basis_df' tbl_sum(x) calc_smoothness( basis_df, start_params = c(0.001, 0.5, 2, 2), other_gp_params = NULL, verbose = FALSE ) ## S3 method for class 'smoothness_res' print(x, width = NULL, ...) ## S3 method for class 'smoothness_res' tbl_sum(x) calc_squintability( basis_df, method = c("ks", "nls"), scale = TRUE, bin_width = 0.005, other_params = NULL ) ## S3 method for class 'squintability_res' print(x, width = NULL, ...) ## S3 method for class 'squintability_res' tbl_sum(x) fit_ks(basis_df, idx, other_params = NULL) fit_nls(basis_df, other_params = NULL)
idx |
character, the name of projection pursuit index function, e.g. "holes" |
data |
a matrix or data frame, the high dimensional data to be projected |
n_basis |
numeric, the number of random bases to generate |
parallel |
logic, whether to use parallel computing for calculating the index. Recommend for the stringy index. |
best |
a matrix, the theoretical/ empirical best projection matrix to calculate the projection distance from the simulated random bases. |
min_proj_dist |
only for squintability, the threshold for projection distance for the random basis to be considered in sampling |
step_size |
numeric, step size for interpolating from each random basis to the best basis, recommend 0.005 |
seed |
numeric, seed for sampling random bases |
x |
objects with specialised printing methods |
width |
only used when |
... |
further arguments passed to or from other methods. |
basis_df |
the basis data frame returned from |
start_params |
list, the starting parameters for the Gaussian process for smoothness |
other_gp_params |
list, additional parameters to be passed to [GpGp::fit_model()] for calculating smoothness |
verbose |
logical, whether to print optimisation progression when fitting the Gaussian process |
method |
either "ks" (kernel smoothing) or "nls" (non-linear least square) for calculating squintability. |
scale |
logic, whether to scale the index value to 0-1 in squintability |
bin_width |
numeric, the bin width to average the index value before fitting the kernel, recommend to set as the same as 'step' parameter |
other_params |
list additional parameters for fitting kernel smoothing or non-linear least square, see [stats::ksmooth()] and [stats::nls()] for details |
## Not run: library(GpGp) library(fields) library(tourr) basis_smoothness <- sample_bases(idx = "holes") calc_smoothness(basis_smoothness) basis_squint <- sample_bases(idx = "holes", n_basis = 100, step_size = 0.01, min_proj_dist = 1.5) calc_squintability(basis_squint, method = "ks", bin_width = 0.01) ## End(Not run)
## Not run: library(GpGp) library(fields) library(tourr) basis_smoothness <- sample_bases(idx = "holes") calc_smoothness(basis_smoothness) basis_squint <- sample_bases(idx = "holes", n_basis = 100, step_size = 0.01, min_proj_dist = 1.5) calc_squintability(basis_squint, method = "ks", bin_width = 0.01) ## End(Not run)
continuous scale colour function
Discrete scale colour function
continuous scale fill function
discrete scale fill function
scale_color_continuous_botanical(palette = "fern", reverse = FALSE, ...) scale_color_discrete_botanical(palette = "fern", reverse = FALSE, ...) scale_fill_continuous_botanical(palette = "fern", reverse = FALSE, ...) scale_fill_discrete_botanical(palette = "fern", reverse = FALSE, ...)
scale_color_continuous_botanical(palette = "fern", reverse = FALSE, ...) scale_color_discrete_botanical(palette = "fern", reverse = FALSE, ...) scale_fill_continuous_botanical(palette = "fern", reverse = FALSE, ...) scale_fill_discrete_botanical(palette = "fern", reverse = FALSE, ...)
palette |
colour palette from the botanical_palette |
reverse |
logical; if the colour should be reversed |
... |
other arguments passed into scale_color_gradientn |
a wrapper for continuous scales in the botanical palette
a wrapper for discrete scales in the botanical palette
a wrapper for continuous fill in the botanical palette
a wrapper for discrete fill in the botanical palette
Simulated sine and pipe data for calculating optimisation features. Each dataset has 1000 observations and the last two columns contain the intended structure with the rest being noise.
sine1000 sine1000_8d pipe1000 pipe1000_8d pipe1000_10d pipe1000_12d boa boa5 boa6
sine1000 sine1000_8d pipe1000 pipe1000_8d pipe1000_10d pipe1000_12d boa boa5 boa6
An object of class matrix
(inherits from array
) with 1000 rows and 6 columns.
An object of class matrix
(inherits from array
) with 1000 rows and 8 columns.
An object of class matrix
(inherits from array
) with 1000 rows and 6 columns.
An object of class matrix
(inherits from array
) with 1000 rows and 8 columns.
An object of class matrix
(inherits from array
) with 1000 rows and 10 columns.
An object of class matrix
(inherits from array
) with 1000 rows and 12 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 1000 rows and 10 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 1000 rows and 5 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 1000 rows and 6 columns.
library(ggplot2) library(tidyr) library(dplyr) boa %>% pivot_longer(cols = x1:x10, names_to = "var", values_to = "value") %>% mutate(var = forcats::fct_relevel(as.factor(var), paste0("x", 1:10))) %>% ggplot(aes(x = value)) + geom_density() + facet_wrap(vars(var)) sine1000 |> ggplot(aes(x = V5, y = V6)) + geom_point() + theme(aspect.ratio = 1) pipe1000_8d |> ggplot(aes(x = V5, y = V6)) + geom_point() + theme(aspect.ratio = 1) pipe1000_8d |> ggplot(aes(x = V7, y = V8)) + geom_point() + theme(aspect.ratio = 1)
library(ggplot2) library(tidyr) library(dplyr) boa %>% pivot_longer(cols = x1:x10, names_to = "var", values_to = "value") %>% mutate(var = forcats::fct_relevel(as.factor(var), paste0("x", 1:10))) %>% ggplot(aes(x = value)) + geom_density() + facet_wrap(vars(var)) sine1000 |> ggplot(aes(x = V5, y = V6)) + geom_point() + theme(aspect.ratio = 1) pipe1000_8d |> ggplot(aes(x = V5, y = V6)) + geom_point() + theme(aspect.ratio = 1) pipe1000_8d |> ggplot(aes(x = V7, y = V8)) + geom_point() + theme(aspect.ratio = 1)
A specific theme for trace plots
theme_fern()
theme_fern()
a ggplot2 theme for explore_trace_interp()