Fit Network-assisted Random Forest+ (NeRF+)
nerfplus.RdFit Network-assisted Random Forest+ (NeRF+)
Usage
nerfplus(
x,
y,
A = NULL,
nodeids = NULL,
family = c("linear", "logistic"),
include_raw = TRUE,
include_netcoh = TRUE,
embedding = NULL,
embedding_options = list(ndim = 2, regularization = 0.5, varimax = FALSE, center =
TRUE, scale = TRUE),
standardize_x = TRUE,
normalize_stump = FALSE,
sample_split = c("none", "oob", "inbag"),
ntrees = 500,
mtry = NULL,
lambda_netcoh,
lambda_embed = lambda_raw,
lambda_raw = lambda_stump,
lambda_stump,
lambda_l = 0.05,
parallel = FALSE,
num.threads = 1,
...
)Arguments
- x
A numeric matrix or data frame of predictors (features); size n x p. Should be centered so that each column has mean 0.
- y
A numeric vector of responses of length n. Should be centered so that the mean is 0.
- A
An adjacency matrix representing the network structure.
- nodeids
(Optional) vector of node IDs of length n. If provided, node IDs indicate the rows of A, corresponding to each sample. If not provided, the rows of A are assumed to be in the same order as the rows of x and y.
- family
A character string indicating the type of model to fit. Currently, only "linear" and "logistic" are supported.
- include_raw
Logical indicating whether to include the raw covariates in the NeRF+ model. Default is
TRUE.- include_netcoh
Logical indicating whether to include the individual node effects and network cohesion regularization in the NeRF+ model. Default is
TRUE.- embedding
Embedding type(s), at least one of "adjacency", "laplacian", score", or NULL (i.e., do not include any network embedding features). Alternatively, can directly input an n x d matrix of network embedding features corresponding to
x.- embedding_options
A list of options for the network embedding. Ignored if
embedding = NULL. If provided, the list should contain the following components:ndim: Number of dimensions in the embedding (default is 2).regularization: Regularization parameter for the adjacency matrix (default is 0.5).varimax: Whether to apply varimax rotation to the embedding (default is FALSE).center: Whether to center the embedding so that each column has mean 0 (default is TRUE).scale: Whether to scale the embedding so that first embedding component column has SD 1 (default is TRUE). All other embedding components are scaled, proportional to their eigenvalues.
- standardize_x
Logical indicating whether to standardize the covariates so that each column has mean 0 and SD 1. Default is
TRUE.- normalize_stump
Logical indicating whether to normalize the decision stump features by number of samples in children nodes. Default is
FALSE.- sample_split
Character string indicating how to split the samples for training the model; one of "none" (default), "oob", or "inbag". If "none", all samples are used for estimating coefficients in NeRF+. If "oob", only out-of-bag samples are used for estimating coefficients in NeRF+. If "inbag", only in-bag samples are used for estimating coefficients in NeRF+.
- ntrees
Number of trees in ensemble.
- mtry
Number of features to consider at each split. Default is the number of features / 3 for regression and the square root of the number of features for classification.
- lambda_netcoh
Regularization parameter for the network cohesion term. Can be either a scalar or a vector of length
ntrees, specifying the regularization parameter for each tree. Ignored ifinclude_netcoh = FALSE.- lambda_embed
Regularization parameter for the network embedding features. Default is same as
lambda_raw. Can be either a scalar or a vector of lengthntrees, specifying the regularization parameter for each tree. Ignored ifembedding = NULL.- lambda_raw
Regularization parameter for the raw covariates. Default is same as
lambda_stump. Can be either a scalar or a vector of lengthntrees, specifying the regularization parameter for each tree. Ignored ifinclude_raw = FALSE.- lambda_stump
Regularization parameter for the decision stump features. Can be either a scalar or a vector of length
ntrees, specifying the regularization parameter for each tree.- lambda_l
(Optional) Regularization parameter for the graph Laplacian. Default is 0.05. Can be either a scalar or a vector of length
ntrees, specifying the regularization parameter for each tree.- parallel
Logical indicating whether to use parallel processing.
- num.threads
Number of threads to use for parallel processing. Default is 1. Ignored if
parallel = FALSE.- ...
Additional arguments passed to the
ranger::ranger()function for fitting the random forest model.
Value
A list containing the following:
rf_fit: The fitted random forest model object fromranger::ranger().nerfplus_fits: A list of fitted NeRF+ models for each tree in the random forest. Each element of the list is a fitted model object that can be used to make predictions.tree_infos: A list of tree information objects for each tree in the random forest.pre_rf_preprocessing_info: A list containing preprocessing information for the NeRF+ model; output offit_pre_rf_preprocessing().regularization_params: A list containing the regularization parameters used in the NeRF+ modelmodel_info: A list containing information about the model, such asfamily,include_raw,include_netcoh,normalize_stump, andsample_split.unordered_factors: A character vector of variable names that are unordered factors.
Examples
data(example_data)
nerfplus_out <- nerfplus(
x = example_data$x, y = example_data$y, A = example_data$A,
lambda_netcoh = 1,
lambda_embed = 0.1,
lambda_raw = 2,
lambda_stump = 3,
family = "linear", embedding = "laplacian", sample_split = "none"
)