At the heart of the wk philosophy is the concept of a
handler, whose job it is to respond to bits of
geometric information as they are iterated over by the
reader. These bits of information have a very specific
structure and order such that the messages can be guaranteed to be
backward-compatible for all time (for each version of the API). This
means that the reader can focus on iterating over the
data structure and handlers can focus on computing a
value (usually) based on the geometries. The advantage of this system is
that readers and handlers can be mixed and matched so that handlers can
be used with many data structures! As an example, the wk package itself
uses readers and handlers to power validation of and conversion among
wkt()
, wkb()
, xy()
, and
rct()
classes.
The wk API works with a vector of
features: readers iterate over such vectors and
handlers compute a value based on some or all of the features in the
vector. Each feature can be NULL
or
contain exactly one geometry (although this geometry
can be a collection or multi-geometry which can contain a tree of other
geometries). A good way to visualize the structure, order, and type of
messages passed to a handler is to use the
wk_debug_filter()
on some well-known text (WKT):
library(wk)
wk_handle(
wkt(c(NA, "POINT (0 1)", "GEOMETRYCOLLECTION (POINT (2 3))")),
wk_debug_filter()
)
#> initialize (dirty = 0 -> 1)
#> vector_start: <Unknown type / 0>[3] <0x7ffeda70d1b0> => WK_CONTINUE
#> feature_start (1): <0x7ffeda70d1b0> => WK_CONTINUE
#> null_feature => WK_CONTINUE
#> feature_end (1): <0x7ffeda70d1b0> => WK_CONTINUE
#> feature_start (2): <0x7ffeda70d1b0> => WK_CONTINUE
#> geometry_start (<none>): POINT[UNKNOWN] <0x7ffeda70d040> => WK_CONTINUE
#> coord (1): <0x7ffeda70d040> (0.000000 1.000000) => WK_CONTINUE
#> geometry_end (<none>) => WK_CONTINUE
#> feature_end (2): <0x7ffeda70d1b0> => WK_CONTINUE
#> feature_start (3): <0x7ffeda70d1b0> => WK_CONTINUE
#> geometry_start (<none>): GEOMETRYCOLLECTION[UNKNOWN] <0x7ffeda70d040> => WK_CONTINUE
#> geometry_start (1): POINT[UNKNOWN] <0x7ffeda70cee0> => WK_CONTINUE
#> coord (1): <0x7ffeda70cee0> (2.000000 3.000000) => WK_CONTINUE
#> geometry_end (1) => WK_CONTINUE
#> geometry_end (<none>) => WK_CONTINUE
#> feature_end (3): <0x7ffeda70d1b0> => WK_CONTINUE
#> vector_end: <0x7ffeda70d1b0>
#> deinitialize
#> NULL
The concept of a vector was inspired by the
sf::sfc()
data type and draws heavily from the vctrs framework. The concept of a
feature draws from the implementations of the simple
features specification in most databases and the support in R for
“missing” values, which are not quite the same as an EMPTY geometry (in
the same way that NaN
and NA
can be
distinguished in R). The concept of a geometry is very
much tied to (and was inspired by) the definition of a geometry in the
(E)WKB, (E)WKT, and TWKB format specifications such that most of the
information provided by these formats is passed along to handlers if it
is available.
The wk_handle()
function is an S3 generic for which
methods are defined for wkb()
, wkt()
,
xy()
, rct()
, and could be implemented for any
number of in-memory and on-disk forms. When writing a new
reader, this is the method you’re defining for your
data type. While you don’t technically have to do implement the
wk_handle()
generic, doing so makes functions that use
handlers work for your data type without having to know
anything about your type/package! For example, the
wk_debug()
function uses this under the hood to support
multiple input types with one line of code.
wk_debug(wkt("POINT (0 1)"))
#> initialize (dirty = 0 -> 1)
#> vector_start: <Unknown type / 0>[1] <0x7ffeda70d1b0> => WK_CONTINUE
#> feature_start (1): <0x7ffeda70d1b0> => WK_CONTINUE
#> geometry_start (<none>): POINT[UNKNOWN] <0x7ffeda70d040> => WK_CONTINUE
#> coord (1): <0x7ffeda70d040> (0.000000 1.000000) => WK_CONTINUE
#> geometry_end (<none>) => WK_CONTINUE
#> feature_end (1): <0x7ffeda70d1b0> => WK_CONTINUE
#> vector_end: <0x7ffeda70d1b0>
#> deinitialize
#> NULL
wk_debug(as_wkb("POINT (0 1)"))
#> initialize (dirty = 0 -> 1)
#> vector_start: <Unknown type / 0>[1] <0x7ffeda70f950> => WK_CONTINUE
#> feature_start (1): <0x7ffeda70f950> => WK_CONTINUE
#> geometry_start (<none>): POINT[1] <0x7ffeda70f8a0> => WK_CONTINUE
#> coord (1): <0x7ffeda70f8a0> (0.000000 1.000000) => WK_CONTINUE
#> geometry_end (<none>) => WK_CONTINUE
#> feature_end (1): <0x7ffeda70f950> => WK_CONTINUE
#> vector_end: <0x7ffeda70f950>
#> deinitialize
#> NULL
As an example, I’ll write a handler that computes the 2D bounding box of its input (assuming Euclidean space). One way to go about this is to use the C++ extension of the C API. The approach is probably familiar: keep a running tally of the greatest and least values so far, updating every time information about a new coordinate is available.
#include <cmath>
#include "cpp11.hpp"
using namespace cpp11;
namespace writable = cpp11::writable;
#include "wk/experimental/wk-v1-handler-cpp11.hpp"
#include "wk-v1-impl.c"
class BBoxHandler: public WKVoidHandler {
public:
BBoxHandler(): xmin(R_PosInf), ymin(R_PosInf), xmax(R_NegInf), ymax(R_NegInf) {}
int coord(const wk_meta_t* meta, const double* coord, uint32_t coord_id) {
this->xmin = std::min(coord[0], this->xmin);
this->ymin = std::min(coord[1], this->ymin);
this->xmax = std::max(coord[0], this->xmax);
this->ymax = std::max(coord[1], this->ymax);
return WK_CONTINUE;
}
SEXP vector_end(const wk_vector_meta_t* meta) {
writable::doubles output = {this->xmin, this->ymin, this->xmax, this->ymax};
output.names() = {"xmin", "ymin", "xmax", "ymax"};
return output;
}
private:
double xmin, ymin, xmax, ymax;
};
[[cpp11::linking_to(wk)]]
[[cpp11::register]]
sexp cpp_bbox_handler_new() {
return WKHandlerFactory<BBoxHandler>::create_xptr(new BBoxHandler());
}
In this handler, we only care about two types of events: the
appearance of a coordinate and the end of the vector. In the wk API, the
vector_end()
method is where the return value is computed.
We need a tiny bit more code to generate the handler in the form it can
be used in R. This guarantees that other functions know that the object
you’ve returned from C++ is a pointer to a wk handler.
bbox_handler <- function() {
new_wk_handler(cpp_bbox_handler_new(), "bbox_wk_handler")
}
bbox_handler()
#> <bbox_wk_handler at 0x55f2f4bd77d0>
Once you have the object in R, you can use it with wk’s
handle_*()
functions such as handle_wkt()
or
handle_wkb()
:
The most important thing about a wk_handler
is that,
once created, it can only be used once:
handler <- bbox_handler()
wk_handle(wkt(), handler)
#> xmin ymin xmax ymax
#> Inf Inf -Inf -Inf
wk_handle(wkt(), handler)
#> Error: Can't re-use this wk_handler
If you need to program with user-supplied handlers, you can pass a
function wherever a wk_handler
was expected (the reader
will call the function to obtain the fresh handler).
handler <- function() bbox_handler()
wk_handle(wkt(), handler)
#> xmin ymin xmax ymax
#> Inf Inf -Inf -Inf
wk_handle(wkt(), handler)
#> xmin ymin xmax ymax
#> Inf Inf -Inf -Inf
When writing a handler, it is also important to consider cleaning up
the resources you allocate. With a C++ handler some of this is taken
care of for you because the deleter for the object is called at some
point after the object is garbage collected. This is taken care of in
the magical
WKHandlerFactory<T>::create_xptr(new T())
, which
registers the deleter and catches any exceptions you may have thrown so
that your handling functions are safe to call from C. The default
deleter will clean up any class members you’ve declared which is
sufficient for most handlers you might want to write.
One situation in which you will specifically not want to rely on the
deleter is a handler that writes to a file: because you have no control
over when the deleter will run, you may reserve file access for
longer than you intend. In these situations, you should open the file in
vector_start()
and clean it up in
deinitialize()
, which is guaranteed to run if
vector_start()
is called.
A few more things to keep in mind when writing handlers:
vector_start()
may not run at all (e.g., if a handler
object is created but never used).noexcept
(the deleter,
deinitialize()
, and error()
).Depending on your background and the other libraries with which you are working, it may be easier or faster to write a handler in C. The below example is a direct translation of the C++ bounding box handler from above.
#include <R.h>
#include <Rinternals.h>
#include <stdlib.h>
#include "wk-v1.h"
#include "wk-v1-impl.c"
#define MIN(a, b) (((a) < (b)) ? (a) : (b))
#define MAX(a, b) (((a) > (b)) ? (a) : (b))
typedef struct {
double xmin, ymin, xmax, ymax;
} bbox_handler_data_t;
int bbox_handler_coord(const wk_meta_t* meta, const double* coord,
uint32_t coord_id, void* handler_data) {
bbox_handler_data_t* data = (bbox_handler_data_t*) handler_data;
data->xmin = MIN(coord[0], data->xmin);
data->ymin = MIN(coord[1], data->ymin);
data->xmax = MAX(coord[0], data->xmax);
data->ymax = MAX(coord[1], data->ymax);
return WK_CONTINUE;
}
SEXP bbox_handler_vector_end(const wk_vector_meta_t* meta, void* handler_data) {
bbox_handler_data_t* data = (bbox_handler_data_t*) handler_data;
const char* names[] = {"xmin", "ymin", "xmax", "ymax", ""};
SEXP output = PROTECT(Rf_mkNamed(REALSXP, names));
REAL(output)[0] = data->xmin;
REAL(output)[1] = data->ymin;
REAL(output)[2] = data->xmax;
REAL(output)[3] = data->ymax;
UNPROTECT(1);
return output;
}
void bbox_handler_finalize(void* handler_data) {
bbox_handler_data_t* data = (bbox_handler_data_t*) handler_data;
free(data);
}
SEXP c_bbox_handler_new() {
wk_handler_t* handler = wk_handler_create();
handler->coord = &bbox_handler_coord;
handler->vector_end = &bbox_handler_vector_end;
bbox_handler_data_t* data = (bbox_handler_data_t*) malloc(sizeof(bbox_handler_data_t));
data->xmin = R_PosInf;
data->ymin = R_PosInf;
data->xmax = R_NegInf;
data->ymax = R_NegInf;
handler->handler_data = data;
return wk_handler_create_xptr(handler, R_NilValue, R_NilValue);
}
We need the same infrastructure on the R end
(new_wk_handler()
) to make sure read functions understand
that the value is a pointer to a freshly created handler:
bbox_handler_c <- function() {
new_wk_handler(.Call("c_bbox_handler_new"))
}
wk_handle(
wkt(c(NA, "POINT (0 1)", "GEOMETRYCOLLECTION (POINT (2 3))")),
bbox_handler_c()
)
#> xmin ymin xmax ymax
#> 0 1 2 3
The notes above about the lifecycle of a handler written in C++ apply
to a handler written in C except that for the C handler you will have to
explicitly free()
anything you malloc()
ed. In
the above example, the finalize()
method was used to free
the BboxHandlerData_t
. Another common pattern is to
allocate an R object in vector_start()
(because this is
where you have information about how many items are in the vector that
is about to be read). This pattern of creating objects isn’t
well-supported by the PROTECT()
/UNPROTECT()
pattern and is better-suited to R_PreserveObject()
and
R_ReleaseObject()
. You can also use the fact that
VECSXP
vectors protect their elements to keep your R
allocations from being garbage-collected while the handler is
running.
Note that you can and should use the R API within your functions!
This includes using Rf_error()
,
Rf_allocVector()
and all the other functions that might
longjmp
: any reader calling handler functions through the
wk C++ headers are designed with this in mind.
The wk API was optimized for the simplicity of writing handlers much
more than readers (after all, there are many more things one can
do with geometry than there are ways to store it). This means
that writing readers can be finicky! As an example, I’ll demonstrate a
reader that iterates over a data frame of coordinates (e.g.,
data.frame(x = c(1, 2, 3), y = c(4, 5, 6))
) as points.
#include "cpp11.hpp"
using namespace cpp11;
#include "wk/experimental/wk-v1-reader-cpp11.hpp"
#include "wk-v1-impl.c"
#define HANDLE_CONTINUE_OR_BREAK(expr) \
result = expr; \
if (result == WK_ABORT_FEATURE) continue; else if (result == WK_ABORT) break
[[cpp11::linking_to(wk)]]
[[cpp11::register]]
sexp cpp_handle_xy(data_frame input, sexp handler_xptr) {
doubles x = input["x"];
cpp11::doubles y = input["y"];
R_xlen_t n_features = x.size();
WKHandlerXPtr handler(handler_xptr);
wk_vector_meta_t vector_meta;
WK_VECTOR_META_RESET(vector_meta, WK_POINT);
vector_meta.size = n_features;
wk_meta_t geometry_meta;
WK_META_RESET(geometry_meta, WK_POINT);
geometry_meta.size = 1;
handler.vector_start(&vector_meta);
double coord[4];
int result;
for (R_xlen_t i = 0; i < n_features; i++) {
HANDLE_CONTINUE_OR_BREAK(handler.feature_start(&vector_meta, i));
HANDLE_CONTINUE_OR_BREAK(handler.geometry_start(&geometry_meta, WK_PART_ID_NONE));
coord[0] = x[i];
coord[1] = y[i];
HANDLE_CONTINUE_OR_BREAK(handler.coord(&geometry_meta, coord, 0));
HANDLE_CONTINUE_OR_BREAK(handler.geometry_end(&geometry_meta, WK_PART_ID_NONE));
HANDLE_CONTINUE_OR_BREAK(handler.feature_end(&vector_meta, i));
}
return handler.vector_end(&vector_meta);
}
The first thing you might notice is the
HANDLE_CONTINUE_OR_BREAK()
macro. You may have noticed that
the handler functions we wrote all return WK_CONTINUE
.
Sometimes it’s really nice when you’re writing a handler to be
able to tell the reader to stop! One example is the
wk_format_handler()
, which only needs the first few
coordinates to spit out a reasonable summary. When iterating over
features that could be gigabytes in size this saves a lot of time and
several important handlers need this for the framework to do its magic.
It does, however, make writing readers a lot harder.
You might wonder why I didn’t just use exceptions to implement this
feature in the C++ wrapper since they would simplify the code a great
deal. The initial version of the API did this and was really really
really slow for handlers that returned early. If you have a better
idea of how to make this work, I’m all ears. In practice, having a macro
that does an early return
, a continue
, or a
break
keeps the code implementing the readers from getting
out of hand.
The second C++-specific item is the WKHandlerXPtr
wrapper around the handler. Under the hood the handler is one of R’s
externalptr
objects that points to a
wk_handler_t
struct. The wrapper makes sure that the
handler is “fresh”, that it is actually an externalptr
, and
makes sure that any R errors resulting from its methods are converted to
C++ exceptions (using cpp11:safe()
).
On the R side there is usually a thin wrapper to validate arguments.
If your object has a class you should implement this as
wk_handle.your_class_name()
so that handler functions like
the ones we created above can accept your geometry class as input! It is
also important to use as_wk_handler()
on the
handler
argument as it applies consistent rules for
generating handlers everywhere (e.g., the function trick described
above).
handle_xy <- function(input, handler) {
input$x <- as.numeric(input$x)
input$y <- as.numeric(input$y)
cpp_handle_xy(input, as_wk_handler(handler))
}
From here, you can test! A good first handler to try is the
wkt_writer()
, which is sensitive to mistakes in the reader
and is unlikely to crash.
df <- data.frame(x = 1:3, y = 2:4)
handle_xy(df, wkt_writer())
#> <wk_wkt[3]>
#> [1] POINT (1 2) POINT (2 3) POINT (3 4)
Another useful handler is the wk_debug_filter()
, which
will give more detailed information about each call of a handler
method.
The above reader without the help of C++ and cpp11 looks similar, except we need some help from wk’s C API to fill out a couple of details.
#include <R.h>
#include <Rinternals.h>
#include "wk-v1.h"
#include "wk-v1-impl.c"
#define HANDLE_CONTINUE_OR_BREAK(expr) \
result = expr; \
if (result == WK_ABORT_FEATURE) continue; else if (result == WK_ABORT) break
SEXP c_handle_xy_impl(SEXP input, wk_handler_t* handler) {
double* x = REAL(VECTOR_ELT(input, 0));
double* y = REAL(VECTOR_ELT(input, 1));
R_xlen_t n_features = Rf_xlength(VECTOR_ELT(input, 0));
wk_vector_meta_t vector_meta;
WK_VECTOR_META_RESET(vector_meta, WK_POINT);
vector_meta.size = n_features;
wk_meta_t geometry_meta;
WK_META_RESET(geometry_meta, WK_POINT);
geometry_meta.size = 1;
handler->vector_start(&vector_meta, handler->handler_data);
double coord[4];
int result;
for (R_xlen_t i = 0; i < n_features; i++) {
HANDLE_CONTINUE_OR_BREAK(handler->feature_start(&vector_meta, i, handler->handler_data));
HANDLE_CONTINUE_OR_BREAK(handler->geometry_start(&geometry_meta, WK_PART_ID_NONE, handler->handler_data));
coord[0] = x[i];
coord[1] = y[i];
HANDLE_CONTINUE_OR_BREAK(handler->coord(&geometry_meta, coord, 0, handler->handler_data));
HANDLE_CONTINUE_OR_BREAK(handler->geometry_end(&geometry_meta, WK_PART_ID_NONE, handler->handler_data));
HANDLE_CONTINUE_OR_BREAK(handler->feature_end(&vector_meta, i, handler->handler_data));
}
return handler->vector_end(&vector_meta, handler->handler_data);
}
SEXP c_handle_xy(SEXP input, SEXP handler_xptr) {
return wk_handler_run_xptr(&c_handle_xy_impl, input, handler_xptr);
}
The main oddity here is that we need to use
wk_handler_run_xptr()
to run a handler. This function
performs a similar task as the WKHandlerXPtr
wrapper in
C++. Notably, it makes sure that the handler’s finalize()
method gets called no matter what. This pattern was inspired by
the excellent
blog post on this topic by Gábor Csárdi and Lionel Henry. It does
mean that you will need an “inner” function with signature
(SEXP, wk_handler_t*)
(e.g.,
c_handle_xy_impl()
), and an “outer” method that calls
wk_handler_run_xptr()
.
The wrapper on the R side is much the same, except you may want to perform more checks since these are comparatively harder to write using R’s C API.
handle_xy <- function(input, handler) {
input$x <- as.numeric(input$x)
input$y <- as.numeric(input$y)
.Call("c_handle_xy", input, as_wk_handler(handler))
}
handle_xy(df, wkt_writer())
#> <wk_wkt[3]>
#> [1] POINT (1 2) POINT (2 3) POINT (3 4)
There is no reason that an object can’t be both a reader and a handler. This idea was one of the motivating ideas behind developing wk: the ability to write composable transformations that allocate as little memory as possible. For many transformations, this is as little as one coordinate! As an example, we’ll create a filter that bumps coordinates by a fixed offset. I’ll demonstrate this in C++ because it is considerably easier to get the details right (if you want to write one of these in C you can copy/paste the identity handler source code from this package’s src directory as a starting point).
#include "cpp11.hpp"
using namespace cpp11;
#include "wk/experimental/wk-v1-filter-cpp11.hpp"
#include "wk-v1-impl.c"
class BumpFilter: public WKIdentityFilter {
public:
BumpFilter(sexp next, double dx, double dy):
WKIdentityFilter(next), dx(dx), dy(dy) {}
int coord(const wk_meta_t* meta, const double* coord, uint32_t coord_id) {
this->new_coord[0] = coord[0] + this->dx;
this->new_coord[1] = coord[1] + this->dy;
return WKIdentityFilter::coord(meta, this->new_coord, coord_id);
}
private:
double dx;
double dy;
double new_coord[4];
};
[[cpp11::linking_to(wk)]]
[[cpp11::register]]
sexp cpp_bump_filter_new(SEXP handler_xptr, double dx, double dy) {
return WKHandlerFactory<BumpFilter>::create_xptr(new BumpFilter(handler_xptr, dx, dy));
}
bump_filter <- function(handler, dx = 0, dy = 0) {
new_wk_handler(cpp_bump_filter_new(as_wk_handler(handler), dx, dy), "bump_filter")
}
wk_handle(
wkt("POINT (0 0)"),
bump_filter(wkt_writer(), dx = 2, dy = -6)
)
#> <wk_wkt[1]>
#> [1] POINT (2 -6)
While a filter might be cool, it also needs some R
infrastructure to make it useful for a user that doesn’t care about
readers, filters, and handlers. For example, the
wk_identity()
function uses some infrastructure to ensure
that the output type matches the input type:
wk_identity(wkt("POINT (1 1)"))
#> <wk_wkt[1]>
#> [1] POINT (1 1)
wk_identity(as_wkb("POINT (1 1)"))
#> <wk_wkb[1]>
#> [1] <POINT (1 1)>
When writing an R wrapper around a filter, you will probably want to use the following pattern:
bump <- function(handleable, dx = 0, dy = 0, ...) {
UseMethod("bump")
}
bump.default <- function(handleable, dx = 0, dy = 0, ...) {
result <- wk_handle(handleable, bump_filter(wk_writer(handleable), dx, dy), ...)
result <- wk_restore(handleable, result)
wk_set_crs(result, wk_crs(handleable))
}
There’s a lot packed into that three-line default method. First, we
use the wk_writer()
generic to pick a handler that
generates the object based on handleable
. This is defined
for most objects that can be used with wk_handler()
to
return the same type of object as the input. Second, we use
wk_handle()
to generate the result. Next, we call
wk_restore()
, which usually returns result
but
for data frames and sf objects attempts to replace the old geometry
column with the transformed version. Finally, because we don’t change
the CRS in our filter, we can reset the CRS of the result to match that
of the input. This will make your filter work with many types of
geometric objects!
library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
nc <- read_sf(system.file("shape/nc.shp", package = "sf"))
st_bbox(nc)
#> xmin ymin xmax ymax
#> -84.32385 33.88199 -75.45698 36.58965
bump(nc, 1, 1)
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -83.32385 ymin: 34.88199 xmax: -74.45698 ymax: 37.58965
#> Geodetic CRS: NAD27
#> # A tibble: 100 × 15
#> AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#> * <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1 10
#> 2 0.061 1.23 1827 1827 Alle… 37005 37005 3 487 0 10
#> 3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5 208
#> 4 0.07 2.97 1831 1831 Curr… 37053 37053 27 508 1 123
#> 5 0.153 2.21 1832 1832 Nort… 37131 37131 66 1421 9 1066
#> 6 0.097 1.67 1833 1833 Hert… 37091 37091 46 1452 7 954
#> 7 0.062 1.55 1834 1834 Camd… 37029 37029 15 286 0 115
#> 8 0.091 1.28 1835 1835 Gates 37073 37073 37 420 0 254
#> 9 0.118 1.42 1836 1836 Warr… 37185 37185 93 968 4 748
#> 10 0.124 1.43 1837 1837 Stok… 37169 37169 85 1612 1 160
#> # ℹ 90 more rows
#> # ℹ 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> # geometry <MULTIPOLYGON [°]>
The wrapper method should probably always be an S3 generic. This lets
you define faster implementations for simple objects like
xy()
and/or rct()
where the base-R
implementation is much faster.
The bump operation we just wrote is an example of a common type of filter: a coordinate transform. Transforms get used in all sorts of places in the spatial/geometry world and have their own type that makes it slightly less verbose to create them and easier for them to be used in other ways.
#include <R.h>
#include <Rinternals.h>
#include <stdlib.h>
#include "wk-v1.h"
#include "wk-v1-impl.c"
typedef struct {
double dx;
double dy;
} bump_trans_t;
int bump_trans_trans(R_xlen_t feature_id, const double* xyzm_in, double* xyzm_out, void* trans_data) {
bump_trans_t* data = (bump_trans_t*) trans_data;
xyzm_out[0] = xyzm_in[0] + data->dx;
xyzm_out[1] = xyzm_in[1] + data->dy;
return WK_CONTINUE;
}
void bump_trans_finalizer(void* trans_data) {
free(trans_data);
}
SEXP bump_trans_new(SEXP xy) {
double dx = REAL(xy)[0];
double dy = REAL(xy)[1];
wk_trans_t* trans = wk_trans_create();
trans->trans = &bump_trans_trans;
trans->finalizer = &bump_trans_finalizer;
bump_trans_t* data = (bump_trans_t*) malloc(sizeof(bump_trans_t));
if (data == NULL) {
free(trans);
Rf_error("Failed to alloc bump_trans_t*");
}
data->dx = dx;
data->dy = dy;
trans->trans_data = data;
return wk_trans_create_xptr(trans, R_NilValue, R_NilValue);
}
We need some R infrastructure to run these just like the C handlers:
bump_trans <- function(dx = 0, dy = 0) {
new_wk_trans(.Call("bump_trans_new", c(dx, dy)))
}
bump <- function(handleable, dx = 0, dy = 0) {
wk_transform(handleable, bump_trans(dx, dy))
}
bump(wkt("POINT (0 0)"), 12, 13)
#> <wk_wkt[1]>
#> [1] POINT (12 13)
The rules are still evolving since this is a new feature but at the
very least you should never throw exceptions or error in your transform
function (instead return WK_ABORT
and error in
vector_end()
).
The main issue with performance in wk handlers is that there are a
lot of function calls. I haven’t found a place where this
matters when writing C code, but when writing C++ some additional
wrapper code is used to isolate the C and C++ frames so that exceptions
aren’t thrown from C frames and longjmp
s generated by the R
API don’t jump over C++ object deleters. A quick test suggests that for
readers and filters written in C++ this overhead is about 1 second per 1
million handler method calls; for handlers this overhead is more like 1
second per 7 million handler method calls. For both, it’s likely that
whatever you are doing in C++ will be the limiting factor in your
handler, but for very simple filters/readers you might consider
rewriting your handler in C for performance. Future versions of wk will
hopefully eliminate this issue!
cpp_void_handler <- function() {
new_wk_handler(cpp_void_handler_new(), "cpp_void_handler")
}
cpp_identity_filter <- function(handler = wk_void_handler()) {
new_wk_handler(cpp_identity_filter_new(as_wk_handler(handler)), "cpp_identity_filter")
}
nc_wkb <- as_wkb(nc)
bench::mark(
wk_handle(nc_wkb, wk_void_handler()),
wk_handle(nc_wkb, cpp_void_handler()),
wk_handle(nc_wkb, wk_identity_filter(wk_void_handler())),
wk_handle(nc_wkb, cpp_identity_filter(wk_void_handler()))
)
#> # A tibble: 4 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:t> <bch:t> <dbl> <bch:byt> <dbl>
#> 1 wk_handle(nc_wkb, wk_void_handle… 44µs 46µs 20953. 0B 4.19
#> 2 wk_handle(nc_wkb, cpp_void_handl… 395.2µs 401.6µs 2468. 0B 0
#> 3 wk_handle(nc_wkb, wk_identity_fi… 60.4µs 65.6µs 15078. 0B 6.21
#> 4 wk_handle(nc_wkb, cpp_identity_f… 517.6µs 524.2µs 1877. 0B 2.02