Write 'GeoParquet' files
write_geoparquet.Rd
Whereas arrow::write_parquet()
will happily convert/write geometry columns
to Parquet format when the geoarrow package is loaded, write_geoparquet()
generates additional file-level metadata and chooses a more generic encoding
to improve the interoperability of the Parquet file when read by non-Arrow
Parquet readers.
Arguments
- handleable
An object with a
wk::wk_handle()
method- ...
Arguments passed on to
arrow::write_parquet
x
data.frame
, RecordBatch, or Tablesink
A string file path, URI, or OutputStream, or path in a file system (
SubTreeFileSystem
)chunk_size
how many rows of data to write to disk at once. This directly corresponds to how many rows will be in each row group in parquet. If
NULL
, a best guess will be made for optimal size (based on the number of columns and number of rows), though if the data has fewer than 250 million cells (rows x cols), then the total number of rows is used.version
parquet version, "1.0" or "2.0". Default "1.0". Numeric values are coerced to character.
compression
compression algorithm. Default "snappy". See details.
compression_level
compression level. Meaning depends on compression algorithm
use_dictionary
Specify if we should use dictionary encoding. Default
TRUE
write_statistics
Specify if we should write statistics. Default
TRUE
data_page_size
Set a target threshold for the approximate encoded size of data pages within a column chunk (in bytes). Default 1 MiB.
use_deprecated_int96_timestamps
Write timestamps to INT96 Parquet format. Default
FALSE
.coerce_timestamps
Cast timestamps a particular resolution. Can be
NULL
, "ms" or "us". DefaultNULL
(no casting)allow_truncated_timestamps
Allow loss of data when coercing timestamps to a particular resolution. E.g. if microsecond or nanosecond data is lost when coercing to "ms", do not raise an exception
properties
A
ParquetWriterProperties
object, used instead of the options enumerated in this function's signature. Providingproperties
as an argument is deprecated; if you need to assembleParquetWriterProperties
outside ofwrite_parquet()
, useParquetFileWriter
instead.arrow_properties
A
ParquetArrowWriterProperties
object. Likeproperties
, this argument is deprecated.
- schema
A
narrow::narrow_schema()
to use as a storage method.- strict
Use
TRUE
to respect choices of storage type, dimensions, and CRS provided byschema
. The default,FALSE
, updates these values to match the data.
Value
The result of arrow::write_parquet()
, invisibly
Examples
tf <- tempfile()
write_geoparquet(data.frame(col1 = 1:5, col2 = wk::xy(1:5, 6:10)), tf)
read_geoparquet(tf)
#> # A tibble: 5 × 2
#> col1 col2
#> <int> <grrw_wkb>
#> 1 1 POINT (1 6)
#> 2 2 POINT (2 7)
#> 3 3 POINT (3 8)
#> 4 4 POINT (4 9)
#> 5 5 POINT (5 10)
unlink(tf)