Title: | 'Rcpp' Dynamic Programming |
---|---|
Description: | Dynamic Programming implemented in 'Rcpp'. Includes example partition and out of sample fitting applications. Also supplies additional custom coders for the 'vtreat' package. |
Authors: | John Mount [aut, cre], Nina Zumel [aut], Win-Vector LLC [cph] |
Maintainer: | John Mount <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 0.2.1 |
Built: | 2025-02-10 03:58:57 UTC |
Source: | https://github.com/winvector/rcppdynprog |
Rcpp dynamic programming solutions for partitioning and machine learning problems. Includes out of sample fitting applications. Also supplies additional custom coders for the vtreat package. Please see https://github.com/WinVector/RcppDynProg for details.
John Mount
Useful links:
Report bugs at https://github.com/WinVector/RcppDynProg/issues
Built matrix of total out of sample interval square error costs for held-out means. One indexed.
const_costs(y, w, min_seg, indices)
const_costs(y, w, min_seg, indices)
y |
NumericVector, values to group in order. |
w |
NumericVector, weights. |
min_seg |
positive integer, minimum segment size (>=1). |
indices |
IntegerVector, order list of indices to pair. |
xcosts NumericMatix, for j>=i xcosts(i,j) is the cost of partition element [i,...,j] (inclusive).
const_costs(c(1, 1, 2, 2), c(1, 1, 1, 1), 1, 1:4)
const_costs(c(1, 1, 2, 2), c(1, 1, 1, 1), 1, 1:4)
Built matrix of interval logistic costs for held-out means. One indexed.
const_costs_logistic(y, w, min_seg, indices)
const_costs_logistic(y, w, min_seg, indices)
y |
NumericVector, 0/1 values to group in order (should be in interval [0,1]). |
w |
NumericVector, weights (should be positive). |
min_seg |
positive integer, minimum segment size (>=1). |
indices |
IntegerVector, order list of indices to pair. |
xcosts NumericMatix, for j>=i xcosts(i,j) is the cost of partition element [i,...,j] (inclusive).
const_costs_logistic(c(0.1, 0.1, 0.2, 0.2), c(1, 1, 1, 1), 1, 1:4)
const_costs_logistic(c(0.1, 0.1, 0.2, 0.2), c(1, 1, 1, 1), 1, 1:4)
Built matrix of interval costs for held-out linear models. One indexed.
lin_costs(x, y, w, min_seg, indices)
lin_costs(x, y, w, min_seg, indices)
x |
NumericVector, x-coords of values to group. |
y |
NumericVector, values to group in order. |
w |
NumericVector, weights. |
min_seg |
positive integer, minimum segment size (>=1). |
indices |
IntegerVector, ordered list of indices to pair. |
xcosts NumericMatix, for j>=i xcosts(i,j) is the cost of partition element [i,...,j] (inclusive).
lin_costs(c(1, 2, 3, 4), c(1, 2, 2, 1), c(1, 1, 1, 1), 1, 1:4)
lin_costs(c(1, 2, 3, 4), c(1, 2, 2, 1), c(1, 1, 1, 1), 1, 1:4)
Built matrix of interval deviance costs for held-out logistic models. Fits are evaluated in-sample. One indexed.
lin_costs_logistic(x, y, w, min_seg, indices)
lin_costs_logistic(x, y, w, min_seg, indices)
x |
NumericVector, x-coords of values to group. |
y |
NumericVector, values to group in order (should be in interval [0,1]). |
w |
NumericVector, weights (should be positive). |
min_seg |
positive integer, minimum segment size (>=1). |
indices |
IntegerVector, ordered list of indices to pair. |
xcosts NumericMatix, for j>=i xcosts(i,j) is the cost of partition element [i,...,j] (inclusive).
lin_costs_logistic(c(1, 2, 3, 4, 5, 6, 7), c(0, 0, 1, 0, 1, 1, 0), c(1, 1, 1, 1, 1, 1, 1), 3, 1:7)
lin_costs_logistic(c(1, 2, 3, 4, 5, 6, 7), c(0, 0, 1, 0, 1, 1, 0), c(1, 1, 1, 1, 1, 1, 1), 3, 1:7)
vtreat
custom coder based on RcppDynProg::solve_for_partition()
.
piecewise_constant(varName, x, y, w = NULL)
piecewise_constant(varName, x, y, w = NULL)
varName |
character, name of variable to work on. |
x |
numeric, input values. |
y |
numeric, values to estimate. |
w |
numeric, weights. |
piecewise_constant("x", 1:8, c(-1, -1, -1, -1, 1, 1, 1, 1))
piecewise_constant("x", 1:8, c(-1, -1, -1, -1, 1, 1, 1, 1))
Build a piecewise constant fit coder with some parameters bound in.
piecewise_constant_coder( penalty = 1, min_n_to_chunk = 1000, min_seg = 10, max_k = 1000 )
piecewise_constant_coder( penalty = 1, min_n_to_chunk = 1000, min_seg = 10, max_k = 1000 )
penalty |
per-segment cost penalty. |
min_n_to_chunk |
minimum n to subdivied problem. |
min_seg |
positive integer, minimum segment size. |
max_k |
maximum segments to divide into. |
a vtreat coder
coder <- piecewise_constant_coder(min_seg = 1) coder("x", 1:8, c(-1, -1, -1, -1, 1, 1, 1, 1))
coder <- piecewise_constant_coder(min_seg = 1) coder("x", 1:8, c(-1, -1, -1, -1, 1, 1, 1, 1))
vtreat
custom coder based on RcppDynProg::solve_for_partition()
.
piecewise_linear(varName, x, y, w = NULL)
piecewise_linear(varName, x, y, w = NULL)
varName |
character, name of variable to work on. |
x |
numeric, input values. |
y |
numeric, values to estimate. |
w |
numeric, weights. |
piecewise_linear("x", 1:8, c(1, 2, 3, 4, 4, 3, 2, 1))
piecewise_linear("x", 1:8, c(1, 2, 3, 4, 4, 3, 2, 1))
Build a piecewise linear fit coder with some parameters bound in.
piecewise_linear_coder( penalty = 1, min_n_to_chunk = 1000, min_seg = 10, max_k = 1000 )
piecewise_linear_coder( penalty = 1, min_n_to_chunk = 1000, min_seg = 10, max_k = 1000 )
penalty |
per-segment cost penalty. |
min_n_to_chunk |
minimum n to subdivied problem. |
min_seg |
positive integer, minimum segment size. |
max_k |
maximum segments to divide into. |
a vtreat coder
coder <- piecewise_linear_coder(min_seg = 1) coder("x", 1:8, c(1, 2, 3, 4, 4, 3, 2, 1))
coder <- piecewise_linear_coder(min_seg = 1) coder("x", 1:8, c(1, 2, 3, 4, 4, 3, 2, 1))
compute the price of a partition solution (and check is valid).
score_solution(x, solution)
score_solution(x, solution)
x |
NumericMatix, for j>=i x(i,j) is the cost of partition element [i,...,j] (inclusive). |
solution |
vector of indices |
price
x <- matrix(c(1,1,5,1,1,0,5,0,1), nrow=3) s <- c(1, 2, 4) score_solution(x, s)
x <- matrix(c(1,1,5,1,1,0,5,0,1), nrow=3) s <- c(1, 2, 4) score_solution(x, s)
Solve for a good set of right-exclusive x-cuts such that the
overall graph of y~x is well-approximated by a piecewise linear
function. Solution is a ready for use with
with base::findInterval()
and stats::approx()
(demonstrated in the examples).
solve_for_partition( x, y, ..., w = NULL, penalty = 0, min_n_to_chunk = 1000, min_seg = 1, max_k = length(x) )
solve_for_partition( x, y, ..., w = NULL, penalty = 0, min_n_to_chunk = 1000, min_seg = 1, max_k = length(x) )
x |
numeric, input variable (no NAs). |
y |
numeric, result variable (no NAs, same length as x). |
... |
not used, force later arguments by name. |
w |
numeric, weights (no NAs, positive, same length as x). |
penalty |
per-segment cost penalty. |
min_n_to_chunk |
minimum n to subdivied problem. |
min_seg |
positive integer, minimum segment size. |
max_k |
maximum segments to divide into. |
a data frame appropriate for stats::approx().
# example data d <- data.frame( x = 1:8, y = c(1, 2, 3, 4, 4, 3, 2, 1)) # solve for break points soln <- solve_for_partition(d$x, d$y) # show solution print(soln) # label each point d$group <- base::findInterval( d$x, soln$x[soln$what=='left']) # apply piecewise approximation d$estimate <- stats::approx( soln$x, soln$pred, xout = d$x, method = 'linear', rule = 2)$y # show result print(d)
# example data d <- data.frame( x = 1:8, y = c(1, 2, 3, 4, 4, 3, 2, 1)) # solve for break points soln <- solve_for_partition(d$x, d$y) # show solution print(soln) # label each point d$group <- base::findInterval( d$x, soln$x[soln$what=='left']) # apply piecewise approximation d$estimate <- stats::approx( soln$x, soln$pred, xout = d$x, method = 'linear', rule = 2)$y # show result print(d)
Solve for a good set of right-exclusive x-cuts such that the
overall graph of y~x is well-approximated by a piecewise linear
function. Solution is a ready for use with
with base::findInterval()
and stats::approx()
(demonstrated in the examples).
solve_for_partitionc( x, y, ..., w = NULL, penalty = 0, min_n_to_chunk = 1000, min_seg = 1, max_k = length(x) )
solve_for_partitionc( x, y, ..., w = NULL, penalty = 0, min_n_to_chunk = 1000, min_seg = 1, max_k = length(x) )
x |
numeric, input variable (no NAs). |
y |
numeric, result variable (no NAs, same length as x). |
... |
not used, force later arguments by name. |
w |
numeric, weights (no NAs, positive, same length as x). |
penalty |
per-segment cost penalty. |
min_n_to_chunk |
minimum n to subdivied problem. |
min_seg |
positive integer, minimum segment size. |
max_k |
maximum segments to divide into. |
a data frame appropriate for stats::approx().
# example data d <- data.frame( x = 1:8, y = c(-1, -1, -1, -1, 1, 1, 1, 1)) # solve for break points soln <- solve_for_partitionc(d$x, d$y) # show solution print(soln) # label each point d$group <- base::findInterval( d$x, soln$x[soln$what=='left']) # apply piecewise approximation d$estimate <- stats::approx( soln$x, soln$pred, xout = d$x, method = 'constant', rule = 2)$y # show result print(d)
# example data d <- data.frame( x = 1:8, y = c(-1, -1, -1, -1, 1, 1, 1, 1)) # solve for break points soln <- solve_for_partitionc(d$x, d$y) # show solution print(soln) # label each point d$group <- base::findInterval( d$x, soln$x[soln$what=='left']) # apply piecewise approximation d$estimate <- stats::approx( soln$x, soln$pred, xout = d$x, method = 'constant', rule = 2)$y # show result print(d)
Solve a for a minimal cost partition of the integers [1,...,nrow(x)] problem where for j>=i x(i,j). is the cost of choosing the partition element [i,...,j]. Returned solution is an ordered vector v of length k<=kmax where: v[1]==1, v[k]==nrow(x)+1, and the partition is of the form [v[i], v[i+1]) (intervals open on the right).
solve_interval_partition(x, kmax)
solve_interval_partition(x, kmax)
x |
square NumericMatix, for j>=i x(i,j) is the cost of partition element [i,...,j] (inclusive). |
kmax |
int, maximum number of segments in solution. |
dynamic program solution.
costs <- matrix(c(1.5, NA ,NA ,1 ,0 , NA, 5, -1, 1), nrow = 3) solve_interval_partition(costs, nrow(costs))
costs <- matrix(c(1.5, NA ,NA ,1 ,0 , NA, 5, -1, 1), nrow = 3) solve_interval_partition(costs, nrow(costs))
Solve a for a minimal cost partition of the integers [1,...,nrow(x)] problem where for j>=i x(i,j). is the cost of choosing the partition element [i,...,j]. Returned solution is an ordered vector v of length k<=kmax where: v[1]==1, v[k]==nrow(x)+1, and the partition is of the form [v[i], v[i+1]) (intervals open on the right).
solve_interval_partition_k(x, kmax)
solve_interval_partition_k(x, kmax)
x |
square NumericMatix, for j>=i x(i,j) is the cost of partition element [i,...,j] (inclusive). |
kmax |
int, maximum number of segments in solution. |
dynamic program solution.
costs <- matrix(c(1.5, NA ,NA ,1 ,0 , NA, 5, -1, 1), nrow = 3) solve_interval_partition(costs, nrow(costs))
costs <- matrix(c(1.5, NA ,NA ,1 ,0 , NA, 5, -1, 1), nrow = 3) solve_interval_partition(costs, nrow(costs))
Not working yet.
solve_interval_partition_no_k(x)
solve_interval_partition_no_k(x)
x |
square NumericMatix, for j>=i x(i,j) is the cost of partition element [i,...,j] (inclusive). |
Solve a for a minimal cost partition of the integers [1,...,nrow(x)] problem where for j>=i x(i,j). is the cost of choosing the partition element [i,...,j]. Returned solution is an ordered vector v of length k where: v[1]==1, v[k]==nrow(x)+1, and the partition is of the form [v[i], v[i+1]) (intervals open on the right).
dynamic program solution.
costs <- matrix(c(1.5, NA ,NA ,1 ,0 , NA, 5, -1, 1), nrow = 3) solve_interval_partition(costs, nrow(costs))
costs <- matrix(c(1.5, NA ,NA ,1 ,0 , NA, 5, -1, 1), nrow = 3) solve_interval_partition(costs, nrow(costs))