--- title: "Operators" author: "John Mount, Nina Zumel" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Operators} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` `cdata` recommends an operator idiom to apply data transforms. The idea is simple, yet powerful. First let's start with some data. ```{r} d <- wrapr::build_frame( "model_id" , "measure", "value" | 1 , "AUC" , 0.7 | 1 , "R2" , 0.4 | 2 , "AUC" , 0.8 | 2 , "R2" , 0.5 ) knitr::kable(d) ``` In the above data we have two measurements each for two individuals (individuals identified by the "`model_id`" column). Using `cdata`'s `rowrecs_to_blocks_spec()` method we can capture a description of this record structure and transformation details. ```{r} library("cdata") transform <- rowrecs_to_blocks_spec( wrapr::qchar_frame( "measure", "value" | "AUC" , AUC | "R2" , R2 ), recordKeys = "model_id") print(transform) ``` Once we have this specification we can transform the data using operator notation. We can collect the record blocks into rows by a "factor-out" (or aggregation/projection) step. ```{r} knitr::kable(d) d2 <- d %//% t(transform) knitr::kable(d2) # (or using general pipe notation # includng the .() "execute immediately") d %.>% .(t(transform)) %.>% knitr::kable(.) ``` We can expand record rows into blocks by a "multiplication" (or join) step. ```{r} knitr::kable(d2) d3 <- d2 %**% transform knitr::kable(d3) # (or using general pipe notation) d2 %.>% transform %.>% knitr::kable(.) ``` (`%//%` and `%**%` being two operators introduced by the `cdata` package.) And the two specialized operators have an inverse/adjoint relation. ```{r} knitr::kable(d) # identity d4 <- d %//% t(transform) %**% transform knitr::kable(d4) ``` We can also pipe into the spec (and into its adjoint) using the [`wrapr`](https://github.com/WinVector/wrapr) [`dot pipe operator`](https://winvector.github.io/wrapr/reference/dot_arrow.html). ```{r} # reverse or adjoint/transpose operation specification t_record_spec <- t(transform) d %.>% t_record_spec %.>% knitr::kable(.) # using dot-pipe's bquote style .() execute immediate notation d %.>% .(t(transform)) %.>% knitr::kable(.) # identity d %.>% .(t(transform)) %.>% transform %.>% knitr::kable(.) ``` And, of course, the exact same functionality for database tables. ```{r} have_db <- requireNamespace("DBI", quietly = TRUE) && requireNamespace("RSQLite", quietly = TRUE) ``` ```{r eval=have_db} raw_connection <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") RSQLite::initExtension(raw_connection) db <- rquery::rquery_db_info( connection = raw_connection, is_dbi = TRUE, connection_options = rquery::rq_connection_tests(raw_connection)) d_td <- rquery::rq_copy_to(db, "d", d) ``` ```{r eval=have_db} ops <- d_td %//% t(transform) cat(format(ops)) rquery::execute(db, ops) %.>% knitr::kable(.) d_td %.>% .(t(transform)) %.>% rquery::execute(db, .) %.>% knitr::kable(.) ``` ```{r eval=have_db} DBI::dbDisconnect(raw_connection) ```