Create a mudata object, which is a collection of five tables: data, locations, params, datasets, and columns. You are only required to provide the data table, which must contain columns "param" and "value", but will more typically contain columns "location", "param", "datetime" (or "date"), and "value". See ns_climate, kentvillegreenwood, alta_lake, long_lake, and second_lake_temp for examples of data in this format.

mudata(
  data,
  locations = NULL,
  params = NULL,
  datasets = NULL,
  columns = NULL,
  x_columns = NULL,
  ...,
  more_tbls = NULL,
  dataset_id = "default",
  location_id = "default",
  validate = TRUE
)

Arguments

data

A data.frame/tibble containing columns "param" and "value" (at least), but more typically columns "location", "param", "datetime" (or "date", depending on the type of data), and "value".

locations

The locations table, which is a data frame containing the columns (at least) "dataset", and "location". If omitted, it will be created automatically using all unique dataset/location combinations.

params

The params table, which is a data frame containing the columns (at least) "dataset", and "param". If omitted, it will be created automatically using all unique dataset/param combinations.

datasets

The datasets table, which is a data frame containing the column (at least) "dataset". If omitted, it will be generated automatically using all unique datasets.

columns

The columns table, which is a data frame containing the columns (at least) "dataset", "table", and "column". If omitted, it will be created automatically using all dataset/table/column combinations.

x_columns

A vector of column names from the data table that in combination with "dataset", "location", and "param" identify unique rows. These will typically be guessed using the column names between "param" and "value".

..., more_tbls

More tbls (as named arguments) to be included in the mudata object

dataset_id

The dataset to use if a "dataset" column is omitted.

location_id

The location if a "location" column is omitted.

validate

Pass FALSE to skip validation of input tables using validate_mudata.

Value

An object of class "mudata", which is a list with components data, locations, params, datasets, columns, and any other tables provided in more_tbls. All list components must be tbls.

References

Dunnington DW and Spooner IS (2018). "Using a linked table-based structure to encode self-describing multiparameter spatiotemporal data". FACETS. doi:10.1139/facets-2017-0026

Examples

# use the data table from kentvillegreenwood as a template kg_data <- tbl_data(kentvillegreenwood) # create mudata object using just the data table mudata(kg_data)
#> Guessing x columns: date
#> A mudata object aligned along "date" #> distinct_datasets(): "ecclimate" #> distinct_locations(): "GREENWOOD A", "KENTVILLE CDA CS" #> distinct_params(): "cooldegdays", "dirofmaxgust" ... and 9 more #> src_tbls(): "data", "locations" ... and 3 more #> #> tbl_data() %>% head(): #> # A tibble: 6 x 6 #> dataset location param date value flags #> <chr> <chr> <chr> <date> <dbl> <chr> #> 1 ecclimate KENTVILLE CDA CS maxtemp 1999-07-01 28.5 <NA> #> 2 ecclimate KENTVILLE CDA CS maxtemp 1999-07-02 30.7 <NA> #> 3 ecclimate KENTVILLE CDA CS maxtemp 1999-07-03 26.4 <NA> #> 4 ecclimate KENTVILLE CDA CS maxtemp 1999-07-04 28.6 <NA> #> 5 ecclimate KENTVILLE CDA CS maxtemp 1999-07-05 26 <NA> #> 6 ecclimate KENTVILLE CDA CS maxtemp 1999-07-06 25.3 <NA>
# create a mudata object starting from a parameter-wide data frame library(tidyr) library(dplyr)
#> #> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:lubridate’: #> #> intersect, setdiff, union
#> The following objects are masked from ‘package:stats’: #> #> filter, lag
#> The following objects are masked from ‘package:base’: #> #> intersect, setdiff, setequal, union
# gather columns and summarise replicates datatable <- pocmaj %>% gather(Ca, Ti, V, key = "param", value = "param_value") %>% group_by(core, param, depth) %>% summarise(value = mean(param_value), sd = mean(param_value)) %>% rename(location = core) # create mudata object mudata(datatable)
#> Guessing x columns: depth
#> A mudata object aligned along "depth" #> distinct_datasets(): "default" #> distinct_locations(): "MAJ-1", "POC-2" #> distinct_params(): "Ca", "Ti", "V" #> src_tbls(): "data", "locations" ... and 3 more #> #> tbl_data() %>% head(): #> # A tibble: 6 x 6 #> dataset location param depth value sd #> <chr> <chr> <chr> <int> <dbl> <dbl> #> 1 default MAJ-1 Ca 0 1885. 1885. #> 2 default MAJ-1 Ca 1 1418 1418 #> 3 default MAJ-1 Ca 2 1550 1550 #> 4 default MAJ-1 Ca 3 1448 1448 #> 5 default MAJ-1 Ca 4 1247 1247 #> 6 default MAJ-1 Ca 5 1412. 1412.