Package: vtreat 1.6.5

John Mount

vtreat: A Statistically Sound 'data.frame' Processor/Conditioner

A 'data.frame' processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. 'vtreat' prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems 'vtreat' defends against: 'Inf', 'NA', too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Reference: "'vtreat': a data.frame Processor for Predictive Modeling", Zumel, Mount, 2016, <doi:10.5281/zenodo.1173313>.

Authors:John Mount [aut, cre], Nina Zumel [aut], Win-Vector LLC [cph]

vtreat_1.6.5.tar.gz
vtreat_1.6.5.zip(r-4.5)vtreat_1.6.5.zip(r-4.4)vtreat_1.6.5.zip(r-4.3)
vtreat_1.6.5.tgz(r-4.4-any)vtreat_1.6.5.tgz(r-4.3-any)
vtreat_1.6.5.tar.gz(r-4.5-noble)vtreat_1.6.5.tar.gz(r-4.4-noble)
vtreat_1.6.5.tgz(r-4.4-emscripten)vtreat_1.6.5.tgz(r-4.3-emscripten)
vtreat.pdf |vtreat.html
vtreat/json (API)
NEWS

# Install 'vtreat' in R:
install.packages('vtreat', repos = c('https://winvector.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/winvector/vtreat/issues

On CRAN:

categorical-variablesmachine-learning-algorithmsnested-modelsprepare-data

10.75 score 283 stars 1 packages 328 scripts 2.1k downloads 3 mentions 53 exports 2 dependencies

Last updated 5 months agofrom:9e28ee2eae. Checks:OK: 7. Indexed: yes.

TargetResultDate
Doc / VignettesOKNov 10 2024
R-4.5-winOKNov 10 2024
R-4.5-linuxOKNov 10 2024
R-4.4-winOKNov 10 2024
R-4.4-macOKNov 10 2024
R-4.3-winOKNov 10 2024
R-4.3-macOKNov 10 2024

Exports:.wmeanapply_transformas_rquery_planBinomialOutcomeTreatmentbuildEvalSetscenter_scaleclassification_parametersdesign_missingness_treatmentdesignTreatmentsCdesignTreatmentsNdesignTreatmentsZfitfit_preparefit_transformflatten_fn_listget_feature_namesget_score_frameget_transformgetSplitPlanAppLabelskWayCrossValidationkWayStratifiedYkWayStratifiedYReplacemakekWayCrossValidationGroupedByColumnmaterialize_treatedmkCrossFrameCExperimentmkCrossFrameMExperimentmkCrossFrameNExperimentmultinomial_parametersMultinomialOutcomeTreatmentnovel_value_summaryNumericOutcomeTreatmentoneWayHoldoutpatch_columns_into_framepre_comp_xvalprepareproblemAppPlanregression_parametersrqdatatable_preparerquery_preparesolve_piecewisesolve_piecewisecspline_variablespline_variablecsquare_windowsquare_windowctrack_valuesunsupervised_parametersUnsupervisedTreatmentvalue_variables_Cvalue_variables_Nvariable_valuesvnamesvorig

Dependencies:digestwrapr

Multi Class vtreat

Rendered fromMultiClassVtreat.Rmdusingknitr::rmarkdownon Nov 10 2024.

Last update: 2018-10-01
Started: 2018-07-15

Saving Treatment Plans

Rendered fromSavingTreamentPlans.Rmdusingknitr::rmarkdownon Nov 10 2024.

Last update: 2020-08-12
Started: 2017-01-05

Variable Types

Rendered fromvtreatVariableTypes.Rmdusingknitr::rmarkdownon Nov 10 2024.

Last update: 2020-08-12
Started: 2016-03-18

vtreat cross frames

Rendered fromvtreatCrossFrames.Rmdusingknitr::rmarkdownon Nov 10 2024.

Last update: 2020-08-12
Started: 2016-04-08

vtreat data splitting

Rendered fromvtreatSplitting.Rmdusingknitr::rmarkdownon Nov 10 2024.

Last update: 2020-08-12
Started: 2016-06-13

vtreat Formal Article

Rendered fromvtreat_article.pdf.asisusingR.rsp::asison Nov 10 2024.

Last update: 2018-11-05
Started: 2018-11-05

vtreat grouping example

Rendered fromvtreatGrouping.Rmdusingknitr::rmarkdownon Nov 10 2024.

Last update: 2020-08-12
Started: 2016-06-15

vtreat overfit

Rendered fromvtreatOverfit.Rmdusingknitr::rmarkdownon Nov 10 2024.

Last update: 2020-08-12
Started: 2015-09-08

vtreat package

Rendered fromvtreat.Rmdusingknitr::rmarkdownon Nov 10 2024.

Last update: 2020-08-12
Started: 2015-01-20

vtreat Rare Levels

Rendered fromvtreatRareLevels.Rmdusingknitr::rmarkdownon Nov 10 2024.

Last update: 2019-03-31
Started: 2016-09-29

vtreat scale mode

Rendered fromvtreatScaleMode.Rmdusingknitr::rmarkdownon Nov 10 2024.

Last update: 2020-08-12
Started: 2016-04-18

vtreat significance

Rendered fromvtreatSignificance.Rmdusingknitr::rmarkdownon Nov 10 2024.

Last update: 2020-08-12
Started: 2016-05-07

vtreat Variable Importance

Rendered fromVariableImportance.Rmdusingknitr::rmarkdownon Nov 10 2024.

Last update: 2020-08-12
Started: 2018-12-18

Readme and manuals

Help Manual

Help pageTopics
vtreat: A Statistically Sound 'data.frame' Processor/Conditionervtreat-package vtreat
Transform second argument by first.apply_transform
Convert vtreatment plans into a sequence of rquery operations.as_rquery_plan
Stateful object for designing and applying binomial outcome treatments.BinomialOutcomeTreatment
Build set carve-up for out-of sample evaluation.buildEvalSets
Center and scale a set of variables.center_scale
vtreat classification parameters.classification_parameters
Design a simple treatment plan to indicate missingingness and perform simple imputation.design_missingness_treatment
Build all treatments for a data frame to predict a categorical outcome.designTreatmentsC
build all treatments for a data frame to predict a numeric outcomedesignTreatmentsN
Design variable treatments with no outcome variable.designTreatmentsZ
Fit first arguemnt to data in second argument.fit
Fit and prepare in a cross-validated manner.fit_prepare
Fit and transform in a cross-validated manner.fit_transform
Display treatment plan.format.vtreatment
Return feasible feature names.get_feature_names
Return score frame from vps.get_score_frame
Return underlying transform from vps.get_transform
read application labels off a split plan.getSplitPlanAppLabels
k-fold cross validation, a splitFunction in the sense of vtreat::buildEvalSetskWayCrossValidation
k-fold cross validation stratified on y, a splitFunction in the sense of vtreat::buildEvalSetskWayStratifiedY
k-fold cross validation stratified with replacement on y, a splitFunction in the sense of vtreat::buildEvalSets .kWayStratifiedYReplace
Make a categorical input custom coder.makeCustomCoderCat
Make a numeric input custom coder.makeCustomCoderNum
Build a k-fold cross validation splitter, respecting (never splitting) groupingColumn.makekWayCrossValidationGroupedByColumn
Run categorical cross-frame experiment.mkCrossFrameCExperiment
Function to build multi-outcome vtreat cross frame and treatment plan.mkCrossFrameMExperiment
Run a numeric cross frame experiment.mkCrossFrameNExperiment
vtreat multinomial parameters.multinomial_parameters
Stateful object for designing and applying multinomial outcome treatments.MultinomialOutcomeTreatment
Report new/novel appearances of character values.novel_value_summary
Stateful object for designing and applying numeric outcome treatments.NumericOutcomeTreatment
One way holdout, a splitFunction in the sense of vtreat::buildEvalSets.oneWayHoldout
Patch columns into data.frame.patch_columns_into_frame
Pre-computed cross-plan (so same split happens each time).pre_comp_xval
Apply treatments and restrict to useful variables.prepare
Function to apply mkCrossFrameMExperiment treatemnts.prepare.multinomial_plan
Prepare a simple treatment.prepare.simple_plan
Apply treatments and restrict to useful variables.prepare.treatmentplan
Print treatmentplan.print.multinomial_plan
Print treatmentplan.print.simple_plan
Print treatmentplan.print.treatmentplan
Print treatmentplan.print.vtreatment
check if appPlan is a good carve-up of 1:nRows into nSplits groupsproblemAppPlan
vtreat regression parameters.regression_parameters
Materialize a treated data frame remotely.materialize_treated rquery_prepare
Solve as piecewise linear problem, numeric target.solve_piecewise
Solve as piecewise logit problem, categorical target.solve_piecewisec
Spline variable numeric target.spline_variable
Spline variable categorical target.spline_variablec
Build a square windows variable, numeric target.square_window
Build a square windows variable, categorical target.square_windowc
Track unique character values for variables.track_values
vtreat unsupervised parameters.unsupervised_parameters
Stateful object for designing and applying unsupervised treatments.UnsupervisedTreatment
Value variables for prediction a categorical outcome.value_variables_C
Value variables for prediction a numeric outcome.value_variables_N
Return variable evaluations.variable_values
New treated variable names from a treatmentplan$treatment item.vnames
Original variable name from a treatmentplan$treatment item.vorig