library(tidyindex)
library(dplyr)
library(lubridate)
library(lmomco)
library(ggplot2)
library(tsibble)
The tidyindex package provides functionality to construct indexes in a data pipeline, align with the tidyverse paradigm. The pipeline approach is universally applicable to indexes of all kinds. It allows indexes to be broken down into a set of defined building blocks (modules) and hence provides means to standardise the workflow to construct, compare, and analyse indexes.
Here we present an example to calculate one of the most widely used drought index: Standardised Precipitation Index (SPI). The index is composed to three steps:
These three steps correspond to three modules in the tidyindex
pipeline (temporal_aggregate()
,
distribution_fit()
, and normalise()
). Each
module uses a tidyverse-mutate style to calculate a step within the
module. For example, the following code fits a gamma distribution to the
variable .agg
. Different distributions are available and
prefixed with dist_*()
and additional distribution can be
added by the user following a similar style to the existing
dist_*()
steps. The step dist_*()
can also be
evaluated standalone and seen as a recipe of the step:
dist_gamma(var = ".agg")
#> [1] "distfit_gamma"
#> attr(,"var")
#> <quosure>
#> expr: ^".agg"
#> env: empty
#> attr(,"fn")
#> function (var_para, var_fit)
#> {
#> para <- do.call("pargam", list(do.call("lmoms", list(var_para))))
#> var_fit2 <- var_fit[!is.na(var_fit)]
#> fit <- do.call("cdfgam", list(x = var_fit2, para = para))
#> n_padding <- length(var_fit) - length(fit)
#> if (n_padding > 0) {
#> fit <- c(rep(NA, n_padding), fit)
#> }
#> tibble(para = list(para), fit = list(fit))
#> }
#> <bytecode: 0x55dcfa44e968>
#> <environment: 0x55dcfa450cb0>
#> attr(,"n_boot")
#> [1] 1
#> attr(,"boot_seed")
#> [1] 123
#> attr(,"dist")
#> [1] "gamma"
#> attr(,"class")
#> [1] "dist_fit"
Here we select a single station, Texas Post Office, where is heavily impacted during the 2019/20 bushfire season, in Queensland, Australia, to demonstrate the calculation.
texas_post_office <- queensland %>%
filter(name == "TEXAS POST OFFICE") %>%
mutate(month = lubridate::month(ym))
dt <- texas_post_office |>
init(id = id, time = ym, group = month) |>
temporal_aggregate(.agg = temporal_rolling_window(prcp, scale = 24)) |>
distribution_fit(.fit = dist_gamma(var = ".agg")) |>
tidyindex::normalise(.index = norm_quantile(.fit))
dt
#> Index pipeline:
#>
#> Steps:
#> temporal: `rolling_window()` -> .agg
#> distribution_fit: `distfit_gamma()` -> .fit
#> normalise: `norm_quantile()` -> .index
#>
#> Data:
#> # A tibble: 365 × 14
#> id month ym prcp tmax tmin tavg long lat name .agg .fit
#> <chr> <dbl> <mth> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1 ASN0004… 12 1991 Dec 1340 31.7 16.1 23.9 151. -28.9 TEXA… 10807 0.216
#> 2 ASN0004… 1 1992 Jan 1732 34.4 18.5 26.5 151. -28.9 TEXA… 11699 0.313
#> 3 ASN0004… 2 1992 Feb 1850 31.2 19.2 25.2 151. -28.9 TEXA… 13091 0.533
#> 4 ASN0004… 3 1992 Mar 146 31.0 14.1 22.6 151. -28.9 TEXA… 13081 0.518
#> 5 ASN0004… 4 1992 Apr 324 27.2 12.6 19.9 151. -28.9 TEXA… 11876 0.333
#> 6 ASN0004… 5 1992 May 597 22.0 9.12 15.6 151. -28.9 TEXA… 11861 0.340
#> 7 ASN0004… 6 1992 Jun 76 18.6 2.49 10.5 151. -28.9 TEXA… 11775 0.320
#> 8 ASN0004… 7 1992 Jul 100 19.8 1.73 10.8 151. -28.9 TEXA… 11735 0.319
#> 9 ASN0004… 8 1992 Aug 578 20.8 4.01 12.4 151. -28.9 TEXA… 12213 0.397
#> 10 ASN0004… 9 1992 Sep 416 22.8 6.41 14.6 151. -28.9 TEXA… 12439 0.436
#> # ℹ 355 more rows
#> # ℹ 2 more variables: .fit_obj <list>, .index <dbl>
The results contain a summary of the steps used and the data with
intermediate variables (.agg
, .fit
, and
.fit_obj
) and the index (.index
). We can plot
the result using ggplot2
as:
dt$data |>
ggplot(aes(x = ym, y = .index)) +
geom_hline(yintercept = -2, color = "red", linewidth = 1) +
geom_line() +
scale_x_yearmonth(name = "Year", date_break = "2 years", date_label = "%Y") +
theme_bw() +
facet_wrap(vars(name), ncol = 1) +
theme(panel.grid = element_blank(),
legend.position = "bottom") +
ylab("SPI")
There are many different things you can do with the package, for example:
variable_trans(.pet = trans_thornthwaite(.tavg = tavg, .lat = lat))
idx_spi()
, idx_spei()
, idx_edi()
,
and idx_rdi()
compute_indexes()
.n_boot
argument in the
distribution_fit()