• About
  • Documentation

  • More Universes
  • Recent Updates
  • Leader board

  • All repositories
  • All packages
  • All articles
  • All datasets
  • All system Libraries
winvector
  • Builds
  • Packages
  • Articles
  • Datasets
  • Contribution
  • Badges
  • API
  • Feed

Links towinvector

wrapr - Wrap R Tools for Debugging and Parametric Programming

Tools for writing and debugging R code. Provides: '%.>%' dot-pipe (an 'S3' configurable pipe), unpack/to (R style multiple assignment/return), 'build_frame()'/'draw_frame()' ('data.frame' example tools), 'qc()' (quoting concatenate), ':=' (named map builder), 'let()' (converts non-standard evaluation interfaces to parametric standard evaluation interfaces, inspired by 'gtools::strmacro()' and 'base::bquote()'), and more.

Last updated

11.34 score 139 stars 13 dependents 412 scripts 5.8k downloads

vtreat - A Statistically Sound 'data.frame' Processor/Conditioner

A 'data.frame' processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. 'vtreat' prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems 'vtreat' defends against: 'Inf', 'NA', too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Reference: "'vtreat': a data.frame Processor for Predictive Modeling", Zumel, Mount, 2016, <DOI:10.5281/zenodo.1173313>.

Last updated

categorical-variablesmachine-learning-algorithmsnested-modelsprepare-data

11.10 score 285 stars 1 dependents 350 scripts 4.3k downloads

rquery - Relational Query Generator for Data Manipulation at Scale

A piped query generator based on Edgar F. Codd's relational algebra, and on production experience using 'SQL' and 'dplyr' at big data scale. The design represents an attempt to make 'SQL' more teachable by denoting composition by a sequential pipeline notation instead of nested queries or functions. The implementation delivers reliable high performance data processing on large data systems such as 'Spark', databases, and 'data.table'. Package features include: data processing trees or pipelines as observable objects (able to report both columns produced and columns used), optimized 'SQL' generation as an explicit user visible table modeling step, plus explicit query reasoning and checking.

Last updated

9.59 score 110 stars 3 dependents 127 scripts 3.9k downloads

rqdatatable - 'rquery' for 'data.table'

Implements the 'rquery' piped Codd-style query algebra using 'data.table'. This allows for a high-speed in memory implementation of Codd-style data manipulation tools.

Last updated

8.26 score 38 stars 2 dependents 127 scripts 4.2k downloads

WVPlots - Common Plots for Analysis

Select data analysis plots, under a standardized calling interface implemented on top of 'ggplot2' and 'plotly'. Plots of interest include: 'ROC', gain curve, scatter plot with marginal distributions, conditioned scatter plot with marginal densities, box and stem with matching theoretical distribution, and density with matching theoretical distribution.

Last updated

7.72 score 84 stars 313 scripts 930 downloads

cdata - Fluid Data Transformations

Supplies higher-order coordinatized data specification and fluid transform operators that include pivot and anti-pivot as special cases. The methodology is describe in 'Zumel', 2018, "Fluid data reshaping with 'cdata'", <https://winvector.github.io/FluidData/FluidDataReshapingWithCdata.html> , <DOI:10.5281/zenodo.1173299> . This package introduces the idea of explicit control table specification of data transforms. Works on in-memory data or on remote data using 'rquery' and 'SQL' database interfaces.

Last updated

7.62 score 44 stars 1 dependents 84 scripts 1.1k downloads

sigr - Succinct and Correct Statistical Summaries for Reports

Succinctly and correctly format statistical summaries of various models and tests (F-test, Chi-Sq-test, Fisher-test, T-test, and rank-significance). This package also includes empirical tests, such as Monte Carlo and bootstrap distribution estimates.

Last updated

6.95 score 28 stars 1 dependents 106 scripts 498 downloads

RcppDynProg - 'Rcpp' Dynamic Programming

Dynamic Programming implemented in 'Rcpp'. Includes example partition and out of sample fitting applications. Also supplies additional custom coders for the 'vtreat' package.

Last updated

datasciencemachinelearningcpp

5.61 score 15 stars 18 scripts 263 downloads