Skip to contents

Whereas arrow::write_parquet() will happily convert/write geometry columns to Parquet format when the geoarrow package is loaded, write_geoparquet() generates additional file-level metadata and chooses a more generic encoding to improve the interoperability of the Parquet file when read by non-Arrow Parquet readers.

Usage

write_geoparquet(handleable, ..., schema = NULL, strict = FALSE)

Arguments

handleable

An object with a wk::wk_handle() method

...

Arguments passed on to arrow::write_parquet

x

data.frame, RecordBatch, or Table

sink

A string file path, URI, or OutputStream, or path in a file system (SubTreeFileSystem)

chunk_size

how many rows of data to write to disk at once. This directly corresponds to how many rows will be in each row group in parquet. If NULL, a best guess will be made for optimal size (based on the number of columns and number of rows), though if the data has fewer than 250 million cells (rows x cols), then the total number of rows is used.

version

parquet version, "1.0" or "2.0". Default "1.0". Numeric values are coerced to character.

compression

compression algorithm. Default "snappy". See details.

compression_level

compression level. Meaning depends on compression algorithm

use_dictionary

Specify if we should use dictionary encoding. Default TRUE

write_statistics

Specify if we should write statistics. Default TRUE

data_page_size

Set a target threshold for the approximate encoded size of data pages within a column chunk (in bytes). Default 1 MiB.

use_deprecated_int96_timestamps

Write timestamps to INT96 Parquet format. Default FALSE.

coerce_timestamps

Cast timestamps a particular resolution. Can be NULL, "ms" or "us". Default NULL (no casting)

allow_truncated_timestamps

Allow loss of data when coercing timestamps to a particular resolution. E.g. if microsecond or nanosecond data is lost when coercing to "ms", do not raise an exception

properties

A ParquetWriterProperties object, used instead of the options enumerated in this function's signature. Providing properties as an argument is deprecated; if you need to assemble ParquetWriterProperties outside of write_parquet(), use ParquetFileWriter instead.

arrow_properties

A ParquetArrowWriterProperties object. Like properties, this argument is deprecated.

schema

A narrow::narrow_schema() to use as a storage method.

strict

Use TRUE to respect choices of storage type, dimensions, and CRS provided by schema. The default, FALSE, updates these values to match the data.

Value

The result of arrow::write_parquet(), invisibly

Examples

tf <- tempfile()
write_geoparquet(data.frame(col1 = 1:5, col2 = wk::xy(1:5, 6:10)), tf)
read_geoparquet(tf)
#> # A tibble: 5 × 2
#>    col1 col2        
#>   <int> <grrw_wkb>  
#> 1     1 POINT (1 6) 
#> 2     2 POINT (2 7) 
#> 3     3 POINT (3 8) 
#> 4     4 POINT (4 9) 
#> 5     5 POINT (5 10)
unlink(tf)