Fit Network-assisted Random Forest+ (NeRF+) with Cross-Validation
nerfplus_cv.RdFit Network-assisted Random Forest+ (NeRF+) with Cross-Validation
Usage
nerfplus_cv(
x,
y,
A = NULL,
nodeids = NULL,
cv = 5,
cv_foldids = NULL,
family = c("linear", "logistic"),
include_raw = TRUE,
include_netcoh = TRUE,
embedding = NULL,
embedding_options = list(ndim = 2, regularization = 0.5, varimax = FALSE, center =
TRUE, scale = TRUE),
standardize_x = TRUE,
normalize_stump = FALSE,
sample_split = c("none", "oob", "inbag"),
ntrees = 500,
ntrees_cv = ntrees,
mtry = NULL,
lambdas_netcoh,
lambdas_embed = NULL,
lambdas_raw = NULL,
lambdas_stump,
lambdas_l = 0.05,
parallel = FALSE,
num.threads = 1,
...
)Arguments
- x
A numeric matrix or data frame of predictors (features); size n x p. Should be centered so that each column has mean 0.
- y
A numeric vector of responses of length n. Should be centered so that the mean is 0.
- A
An adjacency matrix representing the network structure.
- nodeids
(Optional) vector of node IDs of length n. If provided, node IDs indicate the rows of A, corresponding to each sample. If not provided, the rows of A are assumed to be in the same order as the rows of x and y.
- cv
Number of cross-validation folds. Default is 5.
- cv_foldids
(Optional) List of length
cv, where each component in the list is a vector of sample indices in that fold. IfNULL(default), cross-validation folds will be created randomly.- family
A character string indicating the type of model to fit. Currently, only "linear" and "logistic" are supported.
- include_raw
Logical indicating whether to include the raw covariates in the NeRF+ model. Default is
TRUE.- include_netcoh
Logical indicating whether to include the individual node effects and network cohesion regularization in the NeRF+ model. Default is
TRUE.- embedding
Embedding type(s), at least one of "adjacency", "laplacian", score", or NULL (i.e., do not include any network embedding features). Alternatively, can directly input an n x d matrix of network embedding features corresponding to
x.- embedding_options
A list of options for the network embedding. Ignored if
embedding = NULL. If provided, the list should contain the following components:ndim: Number of dimensions in the embedding (default is 2).regularization: Regularization parameter for the adjacency matrix (default is 0.5).varimax: Whether to apply varimax rotation to the embedding (default is FALSE).center: Whether to center the embedding so that each column has mean 0 (default is TRUE).scale: Whether to scale the embedding so that first embedding component column has SD 1 (default is TRUE). All other embedding components are scaled, proportional to their eigenvalues.
- standardize_x
Logical indicating whether to standardize the covariates so that each column has mean 0 and SD 1. Default is
TRUE.- normalize_stump
Logical indicating whether to normalize the decision stump features by number of samples in children nodes. Default is
FALSE.- sample_split
Character string indicating how to split the samples for training the model; one of "none" (default), "oob", or "inbag". If "none", all samples are used for estimating coefficients in NeRF+. If "oob", only out-of-bag samples are used for estimating coefficients in NeRF+. If "inbag", only in-bag samples are used for estimating coefficients in NeRF+.
- ntrees
Number of trees in ensemble.
- ntrees_cv
Number of trees that will be tuned using cross-validation. Default is
ntrees(i.e., every tree will be tuned). Reduce this number to speed up the cross-validation process. For all trees that aren't tuned, the hyperparameter will be chosen randomly from the tuned trees.- mtry
Number of features to consider at each split. Default is the number of features / 3 for regression and the square root of the number of features for classification.
- lambdas_netcoh
Vector of regularization parameters for the network cohesion term.
- lambdas_embed
Vector of regularization parameters for the network embedding features. If
NULL, the regularization parameter corresponding to the network embedding features will be equal to the regularization parameter for the raw covariates.- lambdas_raw
Vector of regularization parameters for the raw covariate features. If
NULL, the regularization parameter for the raw covariates will be equal to the regularization parameter for the decision stump features.- lambdas_stump
Vector of regularization parameters for the decision stump features.
- lambdas_l
Vector of regularization parameters for the graph Laplacian.
- parallel
Logical indicating whether to use parallel processing.
- num.threads
Number of threads to use for parallel processing. Default is 1. Ignored if
parallel = FALSE.- ...
Additional arguments passed to the
ranger::ranger()function for fitting the random forest model.
Value
A list containing the following:
rf_fit: The fitted random forest model object fromranger::ranger().nerfplus_fits: A list of fitted NeRF+ models for each tree in the random forest using the tuned hyperparameters. Each element of the list is a fitted model object that can be used to make predictions.cv_losses: A list of ntrees_cv data frames containing the cross-validation losses for each tree and each fold. Each item in the list corresponds to a tree in the random forest. Each row in the data frame corresponds to a different set of hyperparameters.best_cv_params: A data frame containing the used hyperparameters for each tree in the random forest.tree_infos: A list of tree information objects for each tree in the random forest.pre_rf_preprocessing_info: A list containing preprocessing information for the NeRF+ model; output offit_pre_rf_preprocessing().regularization_params: A list containing the regularization parameters used in the NeRF+ modelmodel_info: A list containing information about the model, such asfamily,include_raw,include_netcoh,normalize_stump, andsample_split.unordered_factors: A character vector of variable names that are unordered factors.