Programming with the wk C and C++ API • wk

At the heart of the wk philosophy is the concept of a handler, whose job it is to respond to bits of geometric information as they are iterated over by the reader. These bits of information have a very specific structure and order such that the messages can be guaranteed to be backward-compatible for all time (for each version of the API). This means that the reader can focus on iterating over the data structure and handlers can focus on computing a value (usually) based on the geometries. The advantage of this system is that readers and handlers can be mixed and matched so that handlers can be used with many data structures! As an example, the wk package itself uses readers and handlers to power validation of and conversion among wkt(), wkb(), xy(), and rct() classes.

The wk API works with a vector of features: readers iterate over such vectors and handlers compute a value based on some or all of the features in the vector. Each feature can be NULL or contain exactly one geometry (although this geometry can be a collection or multi-geometry which can contain a tree of other geometries). A good way to visualize the structure, order, and type of messages passed to a handler is to use the wk_debug_filter() on some well-known text (WKT):

library(wk)
wk_handle(
  wkt(c(NA, "POINT (0 1)", "GEOMETRYCOLLECTION (POINT (2 3))")),
  wk_debug_filter()
)
#> initialize (dirty = 0  -> 1)
#> vector_start: <Unknown type / 0>[3] <0x7ffeda70d1b0> => WK_CONTINUE
#>   feature_start (1): <0x7ffeda70d1b0>  => WK_CONTINUE
#>     null_feature  => WK_CONTINUE
#>   feature_end (1): <0x7ffeda70d1b0>  => WK_CONTINUE
#>   feature_start (2): <0x7ffeda70d1b0>  => WK_CONTINUE
#>     geometry_start (<none>): POINT[UNKNOWN] <0x7ffeda70d040> => WK_CONTINUE
#>       coord (1): <0x7ffeda70d040> (0.000000 1.000000)  => WK_CONTINUE
#>     geometry_end (<none>)  => WK_CONTINUE
#>   feature_end (2): <0x7ffeda70d1b0>  => WK_CONTINUE
#>   feature_start (3): <0x7ffeda70d1b0>  => WK_CONTINUE
#>     geometry_start (<none>): GEOMETRYCOLLECTION[UNKNOWN] <0x7ffeda70d040> => WK_CONTINUE
#>       geometry_start (1): POINT[UNKNOWN] <0x7ffeda70cee0> => WK_CONTINUE
#>         coord (1): <0x7ffeda70cee0> (2.000000 3.000000)  => WK_CONTINUE
#>       geometry_end (1)  => WK_CONTINUE
#>     geometry_end (<none>)  => WK_CONTINUE
#>   feature_end (3): <0x7ffeda70d1b0>  => WK_CONTINUE
#> vector_end: <0x7ffeda70d1b0>
#> deinitialize
#> NULL

The concept of a vector was inspired by the sf::sfc() data type and draws heavily from the vctrs framework. The concept of a feature draws from the implementations of the simple features specification in most databases and the support in R for “missing” values, which are not quite the same as an EMPTY geometry (in the same way that NaN and NA can be distinguished in R). The concept of a geometry is very much tied to (and was inspired by) the definition of a geometry in the (E)WKB, (E)WKT, and TWKB format specifications such that most of the information provided by these formats is passed along to handlers if it is available.

R infrastructure

The wk_handle() function is an S3 generic for which methods are defined for wkb(), wkt(), xy(), rct(), and could be implemented for any number of in-memory and on-disk forms. When writing a new reader, this is the method you’re defining for your data type. While you don’t technically have to do implement the wk_handle() generic, doing so makes functions that use handlers work for your data type without having to know anything about your type/package! For example, the wk_debug() function uses this under the hood to support multiple input types with one line of code.

wk_debug(wkt("POINT (0 1)"))
#> initialize (dirty = 0  -> 1)
#> vector_start: <Unknown type / 0>[1] <0x7ffeda70d1b0> => WK_CONTINUE
#>   feature_start (1): <0x7ffeda70d1b0>  => WK_CONTINUE
#>     geometry_start (<none>): POINT[UNKNOWN] <0x7ffeda70d040> => WK_CONTINUE
#>       coord (1): <0x7ffeda70d040> (0.000000 1.000000)  => WK_CONTINUE
#>     geometry_end (<none>)  => WK_CONTINUE
#>   feature_end (1): <0x7ffeda70d1b0>  => WK_CONTINUE
#> vector_end: <0x7ffeda70d1b0>
#> deinitialize
#> NULL
wk_debug(as_wkb("POINT (0 1)"))
#> initialize (dirty = 0  -> 1)
#> vector_start: <Unknown type / 0>[1] <0x7ffeda70f950> => WK_CONTINUE
#>   feature_start (1): <0x7ffeda70f950>  => WK_CONTINUE
#>     geometry_start (<none>): POINT[1] <0x7ffeda70f8a0> => WK_CONTINUE
#>       coord (1): <0x7ffeda70f8a0> (0.000000 1.000000)  => WK_CONTINUE
#>     geometry_end (<none>)  => WK_CONTINUE
#>   feature_end (1): <0x7ffeda70f950>  => WK_CONTINUE
#> vector_end: <0x7ffeda70f950>
#> deinitialize
#> NULL

A handler in C++

As an example, I’ll write a handler that computes the 2D bounding box of its input (assuming Euclidean space). One way to go about this is to use the C++ extension of the C API. The approach is probably familiar: keep a running tally of the greatest and least values so far, updating every time information about a new coordinate is available.

#include <cmath>
#include "cpp11.hpp"
using namespace cpp11;
namespace writable = cpp11::writable;
#include "wk/experimental/wk-v1-handler-cpp11.hpp"
#include "wk-v1-impl.c"

class BBoxHandler: public WKVoidHandler {
public:
  BBoxHandler(): xmin(R_PosInf), ymin(R_PosInf), xmax(R_NegInf), ymax(R_NegInf) {}
  
  int coord(const wk_meta_t* meta, const double* coord, uint32_t coord_id) {
    this->xmin = std::min(coord[0], this->xmin);
    this->ymin = std::min(coord[1], this->ymin);
    this->xmax = std::max(coord[0], this->xmax);
    this->ymax = std::max(coord[1], this->ymax);
    return WK_CONTINUE; 
  }
  
  SEXP vector_end(const wk_vector_meta_t* meta) {
    writable::doubles output = {this->xmin, this->ymin, this->xmax, this->ymax};
    output.names() = {"xmin", "ymin", "xmax", "ymax"};
    return output;
  }
  
private:
  double xmin, ymin, xmax, ymax;
};

[[cpp11::linking_to(wk)]]
[[cpp11::register]]
sexp cpp_bbox_handler_new() {
  return WKHandlerFactory<BBoxHandler>::create_xptr(new BBoxHandler());
}

In this handler, we only care about two types of events: the appearance of a coordinate and the end of the vector. In the wk API, the vector_end() method is where the return value is computed. We need a tiny bit more code to generate the handler in the form it can be used in R. This guarantees that other functions know that the object you’ve returned from C++ is a pointer to a wk handler.

bbox_handler <- function() {
  new_wk_handler(cpp_bbox_handler_new(), "bbox_wk_handler")
}

bbox_handler()
#> <bbox_wk_handler at 0x55f2f4bd77d0>

Once you have the object in R, you can use it with wk’s handle_*() functions such as handle_wkt() or handle_wkb():

wk_handle(
  wkt(c(NA, "POINT (0 1)", "GEOMETRYCOLLECTION (POINT (2 3))")),
  bbox_handler()
)
#> xmin ymin xmax ymax 
#>    0    1    2    3

Lifecycle of a C++ handler

The most important thing about a wk_handler is that, once created, it can only be used once:

handler <- bbox_handler()
wk_handle(wkt(), handler)
#> xmin ymin xmax ymax 
#>  Inf  Inf -Inf -Inf
wk_handle(wkt(), handler)
#> Error: Can't re-use this wk_handler

If you need to program with user-supplied handlers, you can pass a function wherever a wk_handler was expected (the reader will call the function to obtain the fresh handler).

handler <- function() bbox_handler()
wk_handle(wkt(), handler)
#> xmin ymin xmax ymax 
#>  Inf  Inf -Inf -Inf
wk_handle(wkt(), handler)
#> xmin ymin xmax ymax 
#>  Inf  Inf -Inf -Inf

When writing a handler, it is also important to consider cleaning up the resources you allocate. With a C++ handler some of this is taken care of for you because the deleter for the object is called at some point after the object is garbage collected. This is taken care of in the magical WKHandlerFactory<T>::create_xptr(new T()), which registers the deleter and catches any exceptions you may have thrown so that your handling functions are safe to call from C. The default deleter will clean up any class members you’ve declared which is sufficient for most handlers you might want to write.

One situation in which you will specifically not want to rely on the deleter is a handler that writes to a file: because you have no control over when the deleter will run, you may reserve file access for longer than you intend. In these situations, you should open the file in vector_start() and clean it up in deinitialize(), which is guaranteed to run if vector_start() is called.

A few more things to keep in mind when writing handlers:

vector_start() may not run at all (e.g., if a handler object is created but never used).
You can and should throw exceptions to signify an error unless the method is marked noexcept (the deleter, deinitialize(), and error()).

A handler in C

Depending on your background and the other libraries with which you are working, it may be easier or faster to write a handler in C. The below example is a direct translation of the C++ bounding box handler from above.

#include <R.h>
#include <Rinternals.h>
#include <stdlib.h>
#include "wk-v1.h"
#include "wk-v1-impl.c"

#define MIN(a, b) (((a) < (b)) ? (a) : (b))
#define MAX(a, b) (((a) > (b)) ? (a) : (b)) 

typedef struct {
  double xmin, ymin, xmax, ymax;
} bbox_handler_data_t;

int bbox_handler_coord(const wk_meta_t* meta, const double* coord, 
                        uint32_t coord_id, void* handler_data) {
  bbox_handler_data_t* data = (bbox_handler_data_t*) handler_data;
  data->xmin = MIN(coord[0], data->xmin);
  data->ymin = MIN(coord[1], data->ymin);
  data->xmax = MAX(coord[0], data->xmax);
  data->ymax = MAX(coord[1], data->ymax);
  return WK_CONTINUE;
}

SEXP bbox_handler_vector_end(const wk_vector_meta_t* meta, void* handler_data) {
  bbox_handler_data_t* data = (bbox_handler_data_t*) handler_data;
  
  const char* names[] = {"xmin", "ymin", "xmax", "ymax", ""};
  SEXP output = PROTECT(Rf_mkNamed(REALSXP, names));
  REAL(output)[0] = data->xmin;
  REAL(output)[1] = data->ymin;
  REAL(output)[2] = data->xmax;
  REAL(output)[3] = data->ymax;
  
  UNPROTECT(1);
  return output;
}

void bbox_handler_finalize(void* handler_data) {
    bbox_handler_data_t* data = (bbox_handler_data_t*) handler_data;
    free(data);
}

SEXP c_bbox_handler_new() {
  wk_handler_t* handler = wk_handler_create();

  handler->coord = &bbox_handler_coord;
  handler->vector_end = &bbox_handler_vector_end;
  
  bbox_handler_data_t* data = (bbox_handler_data_t*) malloc(sizeof(bbox_handler_data_t));
  data->xmin = R_PosInf;
  data->ymin = R_PosInf;
  data->xmax = R_NegInf;
  data->ymax = R_NegInf;
  handler->handler_data = data;
  
  return wk_handler_create_xptr(handler, R_NilValue, R_NilValue);
}

We need the same infrastructure on the R end (new_wk_handler()) to make sure read functions understand that the value is a pointer to a freshly created handler:

bbox_handler_c <- function() {
  new_wk_handler(.Call("c_bbox_handler_new"))
}

wk_handle(
  wkt(c(NA, "POINT (0 1)", "GEOMETRYCOLLECTION (POINT (2 3))")),
  bbox_handler_c()
)
#> xmin ymin xmax ymax 
#>    0    1    2    3

Lifecycle of a C handler

The notes above about the lifecycle of a handler written in C++ apply to a handler written in C except that for the C handler you will have to explicitly free() anything you malloc()ed. In the above example, the finalize() method was used to free the BboxHandlerData_t. Another common pattern is to allocate an R object in vector_start() (because this is where you have information about how many items are in the vector that is about to be read). This pattern of creating objects isn’t well-supported by the PROTECT()/UNPROTECT() pattern and is better-suited to R_PreserveObject() and R_ReleaseObject(). You can also use the fact that VECSXP vectors protect their elements to keep your R allocations from being garbage-collected while the handler is running.

Note that you can and should use the R API within your functions! This includes using Rf_error(), Rf_allocVector() and all the other functions that might longjmp: any reader calling handler functions through the wk C++ headers are designed with this in mind.

A reader in C++

The wk API was optimized for the simplicity of writing handlers much more than readers (after all, there are many more things one can do with geometry than there are ways to store it). This means that writing readers can be finicky! As an example, I’ll demonstrate a reader that iterates over a data frame of coordinates (e.g., data.frame(x = c(1, 2, 3), y = c(4, 5, 6))) as points.

#include "cpp11.hpp"
using namespace cpp11;
#include "wk/experimental/wk-v1-reader-cpp11.hpp"
#include "wk-v1-impl.c"

#define HANDLE_CONTINUE_OR_BREAK(expr)                         \
  result = expr;                                               \
  if (result == WK_ABORT_FEATURE) continue; else if (result == WK_ABORT) break

[[cpp11::linking_to(wk)]]
[[cpp11::register]]
sexp cpp_handle_xy(data_frame input, sexp handler_xptr) {
  doubles x = input["x"];
  cpp11::doubles y = input["y"];
  R_xlen_t n_features = x.size();

  WKHandlerXPtr handler(handler_xptr);
  
  wk_vector_meta_t vector_meta;
  WK_VECTOR_META_RESET(vector_meta, WK_POINT);
  vector_meta.size = n_features;

  wk_meta_t geometry_meta;
  WK_META_RESET(geometry_meta, WK_POINT);
  geometry_meta.size = 1;

  handler.vector_start(&vector_meta);
  double coord[4];
  int result;
  
  for (R_xlen_t i = 0; i < n_features; i++) {
    HANDLE_CONTINUE_OR_BREAK(handler.feature_start(&vector_meta, i));
    HANDLE_CONTINUE_OR_BREAK(handler.geometry_start(&geometry_meta, WK_PART_ID_NONE));

    coord[0] = x[i];
    coord[1] = y[i];
    HANDLE_CONTINUE_OR_BREAK(handler.coord(&geometry_meta, coord, 0));

    HANDLE_CONTINUE_OR_BREAK(handler.geometry_end(&geometry_meta, WK_PART_ID_NONE));
    HANDLE_CONTINUE_OR_BREAK(handler.feature_end(&vector_meta, i));
  }

  return handler.vector_end(&vector_meta);
}

The first thing you might notice is the HANDLE_CONTINUE_OR_BREAK() macro. You may have noticed that the handler functions we wrote all return WK_CONTINUE. Sometimes it’s really nice when you’re writing a handler to be able to tell the reader to stop! One example is the wk_format_handler(), which only needs the first few coordinates to spit out a reasonable summary. When iterating over features that could be gigabytes in size this saves a lot of time and several important handlers need this for the framework to do its magic. It does, however, make writing readers a lot harder.

You might wonder why I didn’t just use exceptions to implement this feature in the C++ wrapper since they would simplify the code a great deal. The initial version of the API did this and was really really really slow for handlers that returned early. If you have a better idea of how to make this work, I’m all ears. In practice, having a macro that does an early return, a continue, or a break keeps the code implementing the readers from getting out of hand.

The second C++-specific item is the WKHandlerXPtr wrapper around the handler. Under the hood the handler is one of R’s externalptr objects that points to a wk_handler_t struct. The wrapper makes sure that the handler is “fresh”, that it is actually an externalptr, and makes sure that any R errors resulting from its methods are converted to C++ exceptions (using cpp11:safe()).

On the R side there is usually a thin wrapper to validate arguments. If your object has a class you should implement this as wk_handle.your_class_name() so that handler functions like the ones we created above can accept your geometry class as input! It is also important to use as_wk_handler() on the handler argument as it applies consistent rules for generating handlers everywhere (e.g., the function trick described above).

handle_xy <- function(input, handler) {
  input$x <- as.numeric(input$x)
  input$y <- as.numeric(input$y)
  cpp_handle_xy(input, as_wk_handler(handler))
}

From here, you can test! A good first handler to try is the wkt_writer(), which is sensitive to mistakes in the reader and is unlikely to crash.

df <- data.frame(x = 1:3, y = 2:4)
handle_xy(df, wkt_writer())
#> <wk_wkt[3]>
#> [1] POINT (1 2) POINT (2 3) POINT (3 4)

Another useful handler is the wk_debug_filter(), which will give more detailed information about each call of a handler method.

A reader in C

The above reader without the help of C++ and cpp11 looks similar, except we need some help from wk’s C API to fill out a couple of details.

#include <R.h>
#include <Rinternals.h>
#include "wk-v1.h"
#include "wk-v1-impl.c"

#define HANDLE_CONTINUE_OR_BREAK(expr)                         \
  result = expr;                                               \
  if (result == WK_ABORT_FEATURE) continue; else if (result == WK_ABORT) break

SEXP c_handle_xy_impl(SEXP input, wk_handler_t* handler) {
  double* x = REAL(VECTOR_ELT(input, 0));
  double* y = REAL(VECTOR_ELT(input, 1));
  R_xlen_t n_features = Rf_xlength(VECTOR_ELT(input, 0));
  
  wk_vector_meta_t vector_meta;
  WK_VECTOR_META_RESET(vector_meta, WK_POINT);
  vector_meta.size = n_features;

  wk_meta_t geometry_meta;
  WK_META_RESET(geometry_meta, WK_POINT);
  geometry_meta.size = 1;

  handler->vector_start(&vector_meta, handler->handler_data);
  double coord[4];
  int result;
  
  for (R_xlen_t i = 0; i < n_features; i++) {
    HANDLE_CONTINUE_OR_BREAK(handler->feature_start(&vector_meta, i, handler->handler_data));
    HANDLE_CONTINUE_OR_BREAK(handler->geometry_start(&geometry_meta, WK_PART_ID_NONE, handler->handler_data));

    coord[0] = x[i];
    coord[1] = y[i];
    HANDLE_CONTINUE_OR_BREAK(handler->coord(&geometry_meta, coord, 0, handler->handler_data));

    HANDLE_CONTINUE_OR_BREAK(handler->geometry_end(&geometry_meta, WK_PART_ID_NONE, handler->handler_data));
    HANDLE_CONTINUE_OR_BREAK(handler->feature_end(&vector_meta, i, handler->handler_data));
  }
  
  return handler->vector_end(&vector_meta, handler->handler_data);
}

SEXP c_handle_xy(SEXP input, SEXP handler_xptr) {
  return wk_handler_run_xptr(&c_handle_xy_impl, input, handler_xptr);
}

The main oddity here is that we need to use wk_handler_run_xptr() to run a handler. This function performs a similar task as the WKHandlerXPtr wrapper in C++. Notably, it makes sure that the handler’s finalize() method gets called no matter what. This pattern was inspired by the excellent blog post on this topic by Gábor Csárdi and Lionel Henry. It does mean that you will need an “inner” function with signature (SEXP, wk_handler_t*) (e.g., c_handle_xy_impl()), and an “outer” method that calls wk_handler_run_xptr().

The wrapper on the R side is much the same, except you may want to perform more checks since these are comparatively harder to write using R’s C API.

handle_xy <- function(input, handler) {
  input$x <- as.numeric(input$x)
  input$y <- as.numeric(input$y)
  .Call("c_handle_xy", input, as_wk_handler(handler))
}

handle_xy(df, wkt_writer())
#> <wk_wkt[3]>
#> [1] POINT (1 2) POINT (2 3) POINT (3 4)

Filters

There is no reason that an object can’t be both a reader and a handler. This idea was one of the motivating ideas behind developing wk: the ability to write composable transformations that allocate as little memory as possible. For many transformations, this is as little as one coordinate! As an example, we’ll create a filter that bumps coordinates by a fixed offset. I’ll demonstrate this in C++ because it is considerably easier to get the details right (if you want to write one of these in C you can copy/paste the identity handler source code from this package’s src directory as a starting point).

#include "cpp11.hpp"
using namespace cpp11;
#include "wk/experimental/wk-v1-filter-cpp11.hpp"
#include "wk-v1-impl.c"

class BumpFilter: public WKIdentityFilter {
public:
  BumpFilter(sexp next, double dx, double dy): 
    WKIdentityFilter(next), dx(dx), dy(dy) {}
  
  int coord(const wk_meta_t* meta, const double* coord, uint32_t coord_id) {
    this->new_coord[0] = coord[0] + this->dx;
    this->new_coord[1] = coord[1] + this->dy;
    return WKIdentityFilter::coord(meta, this->new_coord, coord_id);
  }
  
private:
  double dx;
  double dy;
  double new_coord[4];
};

[[cpp11::linking_to(wk)]]
[[cpp11::register]]
sexp cpp_bump_filter_new(SEXP handler_xptr, double dx, double dy) {
  return WKHandlerFactory<BumpFilter>::create_xptr(new BumpFilter(handler_xptr, dx, dy));
}

bump_filter <- function(handler, dx = 0, dy = 0) {
  new_wk_handler(cpp_bump_filter_new(as_wk_handler(handler), dx, dy), "bump_filter")
}

wk_handle(
  wkt("POINT (0 0)"),
  bump_filter(wkt_writer(), dx = 2, dy = -6)
)
#> <wk_wkt[1]>
#> [1] POINT (2 -6)

While a filter might be cool, it also needs some R infrastructure to make it useful for a user that doesn’t care about readers, filters, and handlers. For example, the wk_identity() function uses some infrastructure to ensure that the output type matches the input type:

wk_identity(wkt("POINT (1 1)"))
#> <wk_wkt[1]>
#> [1] POINT (1 1)
wk_identity(as_wkb("POINT (1 1)"))
#> <wk_wkb[1]>
#> [1] <POINT (1 1)>

When writing an R wrapper around a filter, you will probably want to use the following pattern:

bump <- function(handleable, dx = 0, dy = 0, ...) {
  UseMethod("bump")
}

bump.default <- function(handleable, dx = 0, dy = 0, ...) {
  result <- wk_handle(handleable, bump_filter(wk_writer(handleable), dx, dy), ...)
  result <- wk_restore(handleable, result)
  wk_set_crs(result, wk_crs(handleable))
}

There’s a lot packed into that three-line default method. First, we use the wk_writer() generic to pick a handler that generates the object based on handleable. This is defined for most objects that can be used with wk_handler() to return the same type of object as the input. Second, we use wk_handle() to generate the result. Next, we call wk_restore(), which usually returns result but for data frames and sf objects attempts to replace the old geometry column with the transformed version. Finally, because we don’t change the CRS in our filter, we can reset the CRS of the result to match that of the input. This will make your filter work with many types of geometric objects!

library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
nc <- read_sf(system.file("shape/nc.shp", package = "sf"))
st_bbox(nc)
#>      xmin      ymin      xmax      ymax 
#> -84.32385  33.88199 -75.45698  36.58965
bump(nc, 1, 1)
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -83.32385 ymin: 34.88199 xmax: -74.45698 ymax: 37.58965
#> Geodetic CRS:  NAD27
#> # A tibble: 100 × 15
#>     AREA PERIMETER CNTY_ CNTY_ID NAME  FIPS  FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#>  * <dbl>     <dbl> <dbl>   <dbl> <chr> <chr>  <dbl>    <int> <dbl> <dbl>   <dbl>
#>  1 0.114      1.44  1825    1825 Ashe  37009  37009        5  1091     1      10
#>  2 0.061      1.23  1827    1827 Alle… 37005  37005        3   487     0      10
#>  3 0.143      1.63  1828    1828 Surry 37171  37171       86  3188     5     208
#>  4 0.07       2.97  1831    1831 Curr… 37053  37053       27   508     1     123
#>  5 0.153      2.21  1832    1832 Nort… 37131  37131       66  1421     9    1066
#>  6 0.097      1.67  1833    1833 Hert… 37091  37091       46  1452     7     954
#>  7 0.062      1.55  1834    1834 Camd… 37029  37029       15   286     0     115
#>  8 0.091      1.28  1835    1835 Gates 37073  37073       37   420     0     254
#>  9 0.118      1.42  1836    1836 Warr… 37185  37185       93   968     4     748
#> 10 0.124      1.43  1837    1837 Stok… 37169  37169       85  1612     1     160
#> # ℹ 90 more rows
#> # ℹ 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> #   geometry <MULTIPOLYGON [°]>

The wrapper method should probably always be an S3 generic. This lets you define faster implementations for simple objects like xy() and/or rct() where the base-R implementation is much faster.

bump.wk_xy <- function(handleable, dx = 0, dy = 0) {
  data <- unclass(handleable)
  data$x <- data$x + dx
  data$y <- data$y + dy
  class(data) <- class(handleable)
  data
}

Transforms

The bump operation we just wrote is an example of a common type of filter: a coordinate transform. Transforms get used in all sorts of places in the spatial/geometry world and have their own type that makes it slightly less verbose to create them and easier for them to be used in other ways.

#include <R.h>
#include <Rinternals.h>
#include <stdlib.h>
#include "wk-v1.h"
#include "wk-v1-impl.c"

typedef struct {
  double dx;
  double dy;
} bump_trans_t;

int bump_trans_trans(R_xlen_t feature_id, const double* xyzm_in, double* xyzm_out, void* trans_data) {
  bump_trans_t* data = (bump_trans_t*) trans_data;
  xyzm_out[0] = xyzm_in[0] + data->dx;
  xyzm_out[1] = xyzm_in[1] + data->dy;
  return WK_CONTINUE;
}

void bump_trans_finalizer(void* trans_data) {
  free(trans_data);
}

SEXP bump_trans_new(SEXP xy) {
  double dx = REAL(xy)[0];
  double dy = REAL(xy)[1];
  
  wk_trans_t* trans = wk_trans_create();
  trans->trans = &bump_trans_trans;
  trans->finalizer = &bump_trans_finalizer;
  
  bump_trans_t* data = (bump_trans_t*) malloc(sizeof(bump_trans_t));
  if (data == NULL) {
    free(trans);
    Rf_error("Failed to alloc bump_trans_t*");
  }
  
  data->dx = dx;
  data->dy = dy;
  trans->trans_data = data;
  
  return wk_trans_create_xptr(trans, R_NilValue, R_NilValue);
}

We need some R infrastructure to run these just like the C handlers:

bump_trans <- function(dx = 0, dy = 0) {
  new_wk_trans(.Call("bump_trans_new", c(dx, dy)))
}

bump <- function(handleable, dx = 0, dy = 0) {
  wk_transform(handleable, bump_trans(dx, dy))
}

bump(wkt("POINT (0 0)"), 12, 13)
#> <wk_wkt[1]>
#> [1] POINT (12 13)

The rules are still evolving since this is a new feature but at the very least you should never throw exceptions or error in your transform function (instead return WK_ABORT and error in vector_end()).

Performance

The main issue with performance in wk handlers is that there are a lot of function calls. I haven’t found a place where this matters when writing C code, but when writing C++ some additional wrapper code is used to isolate the C and C++ frames so that exceptions aren’t thrown from C frames and longjmps generated by the R API don’t jump over C++ object deleters. A quick test suggests that for readers and filters written in C++ this overhead is about 1 second per 1 million handler method calls; for handlers this overhead is more like 1 second per 7 million handler method calls. For both, it’s likely that whatever you are doing in C++ will be the limiting factor in your handler, but for very simple filters/readers you might consider rewriting your handler in C for performance. Future versions of wk will hopefully eliminate this issue!

cpp_void_handler <- function() {
  new_wk_handler(cpp_void_handler_new(), "cpp_void_handler")
}

cpp_identity_filter <- function(handler = wk_void_handler()) {
  new_wk_handler(cpp_identity_filter_new(as_wk_handler(handler)), "cpp_identity_filter")
}

nc_wkb <- as_wkb(nc)

bench::mark(
  wk_handle(nc_wkb, wk_void_handler()),
  wk_handle(nc_wkb, cpp_void_handler()),
  wk_handle(nc_wkb, wk_identity_filter(wk_void_handler())),
  wk_handle(nc_wkb, cpp_identity_filter(wk_void_handler()))
)
#> # A tibble: 4 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                        <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 wk_handle(nc_wkb, wk_void_handle…    44µs    46µs    20953.        0B     4.19
#> 2 wk_handle(nc_wkb, cpp_void_handl… 395.2µs 401.6µs     2468.        0B     0   
#> 3 wk_handle(nc_wkb, wk_identity_fi…  60.4µs  65.6µs    15078.        0B     6.21
#> 4 wk_handle(nc_wkb, cpp_identity_f… 517.6µs 524.2µs     1877.        0B     2.02