Main Content

kfoldPredict

Predict responses for observations in cross-validated quantile regression model

Since R2025a

    Description

    predictedY = kfoldPredict(CVMdl) returns responses predicted by the cross-validated quantile regression model CVMdl. For every fold, kfoldPredict predicts the responses for validation-fold observations using a model trained on training-fold observations. CVMdl.X and CVMdl.Y contain both sets of observations.

    example

    predictedY = kfoldPredict(CVMdl,Name=Value) specifies additional options using one or more name-value arguments. For example, you can specify the quantiles for which to return predictions.

    [predictedY,crossingIndicator] = kfoldPredict(___) additionally returns a matrix crossingIndicator whose entries indicate whether predictions for the specified quantiles cross each other.

    example

    Examples

    collapse all

    Create a cross-validated quantile regression model. Compare the predicted response values to the true response values.

    Simulate 1000 observations from the model y=1+0.05x+sin(x)/x+ϵ where:

    • x is a 1000-by-1 vector of evenly spaced values between –10 and 10.

    • ϵ is a 1000-by-1 vector of random normal errors with mean 0 and standard deviation 0.2.

    rng("default"); % For reproducibility
    n = 1000;
    x = linspace(-10,10,n)';
    y = 1 + 0.05*x + sin(x)./x + 0.2*randn(n,1);

    Create a 5-fold cross-validated quantile neural network regression model. Use the default quantile value, which corresponds to the median.

    CVMdl = fitrqnet(x,y,KFold=5)
    CVMdl = 
      RegressionPartitionedQuantileModel
        CrossValidatedModel: 'QuantileNeuralNetwork'
             PredictorNames: {'x1'}
               ResponseName: 'Y'
            NumObservations: 1000
                      KFold: 5
                  Partition: [1×1 cvpartition]
          ResponseTransform: 'none'
                  Quantiles: 0.5000
    
    
      Properties, Methods
    
    

    CVMdl is a RegressionPartitionedQuantileModel object that contains five trained CompactRegressionQuantileNeuralNetwork model objects (CVMdl.Trained). Each of the five models is trained using approximately 4/5 of the observations in x.

    Predict the median response values using the cross-validated quantile regression model. The predicted response values are the predictions on the holdout (validation) observations. In other words, the software obtains each prediction by using a model that was trained without the corresponding observation.

    predictedY = kfoldPredict(CVMdl);

    Plot the true response values and the predicted response values for the cross-validated model.

    plot(x,y,".");
    hold on
    plot(x,predictedY,".");
    xlabel("x")
    ylabel("y")
    title("Cross-Validation Predictions")
    legend(["True","Predicted"])
    hold off

    Figure contains an axes object. The axes object with title Cross-Validation Predictions, xlabel x, ylabel y contains 2 objects of type line. One or more of the lines displays its values using only markers These objects represent True, Predicted.

    The five CompactRegressionQuantileNeuralNetwork models seem generally to agree, but the predictions differ slightly in the predictor data range from 0 to 10.

    You cannot use the cross-validated model directly to make predictions on new data. If you want to predict response values for a new data set, you can train a new quantile regression model using all the data in x and then use the predict object function. For example, predict response values for each even integer between –10 and 10.

    Mdl = fitrqnet(x,y);
    xnew = (-10:2:10)';
    predictedNew = predict(Mdl,xnew)
    predictedNew = 11×1
    
        0.6360
        0.6340
        0.6320
        0.6300
        1.3421
        2.0209
        1.5462
        0.9962
        1.2118
        1.4273
        1.6429
          ⋮
    
    

    Alternatively, you can use the individual compact models in the Trained property of the cross-validated model and then combine the predictions (for example, through averaging). For example, predict average response values for each even integer between –10 and 10.

    predictions = zeros(length(xnew),CVMdl.KFold);
    for i = 1:CVMdl.KFold
        predictions(:,i) = predict(CVMdl.Trained{i},xnew);
    end
    averagePredictions = mean(predictions,2)
    averagePredictions = 11×1
    
        0.6399
        0.6332
        0.6264
        0.6215
        1.3391
        2.0521
        1.5724
        0.9853
        1.2277
        1.4360
        1.6341
          ⋮
    
    

    Create a cross-validated quantile regression model. Find the test folds that contain observations whose predictions cross each other.

    Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables Acceleration, Cylinders, Displacement, and so on, as well as the response variable MPG.

    load carbig
    cars = table(Acceleration,Cylinders,Displacement, ...
        Horsepower,Model_Year,Origin,Weight,MPG);

    Categorize the cars based on whether they were made in the USA.

    cars.Origin = categorical(cellstr(cars.Origin));
    cars.Origin = mergecats(cars.Origin,["France","Japan",...
        "Germany","Sweden","Italy","England"],"NotUSA");

    Train a cross-validated quantile neural network regression model. Use the 0.25, 0.50, and 0.75 quantiles (that is, the lower quartile, median, and upper quartile). To improve the model fit, standardize the numeric predictors before training. Use a 3-fold cross-validation.

    rng(0,"twister") % For reproducibility
    CVMdl = fitrqnet(cars,"MPG",Quantiles=[0.25 0.5 0.75], ...
        Standardize=true,KFold=3);

    CVMdl is a RegressionPartitionedQuantileModel object.

    Determine if any of the predictions for the quantiles in Mdl.Quantiles cross each other by using kfoldPredict. The crossingIndicator output argument contains a value of 1 (true) for any observation with quantile predictions that cross.

    [~,crossingIndicator] = kfoldPredict(CVMdl);
    sum(crossingIndicator)
    ans = 
    3
    

    In this example, eight of the observations in cars have quantile predictions that cross each other.

    Find the test sets that contain the eight observations.

    idx = test(CVMdl.Partition,"all");
    observations = idx(crossingIndicator,:)
    observations = 3×3 logical array
    
       1   0   0
       1   0   0
       1   0   0
    
    

    The majority of the eight observations are in the first test set. Therefore, most of the quantile crossings in CVMdl are produced by the first compact model in the object (CVMdl.Trained{1}), because it provides the predictions for the observations in the first test set.

    Input Arguments

    collapse all

    Cross-validated quantile regression model, specified as a RegressionPartitionedQuantileModel object.

    Name-Value Arguments

    collapse all

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: kfoldPredict(CVMdl,Quantiles=0.5,PredictionForMissingValue=NaN) specifies to return the predictions for the 0.5 quantile (median) and use NaN predictions for observations that have missing predictor values.

    Quantiles for which to compute predictions, specified as a vector of values in Mdl.Quantiles. The software returns predictions only for the quantiles specified in Quantiles.

    Example: Quantiles=[0.4 0.6]

    Data Types: single | double | char | string

    Predicted response value to use for observations with missing predictor values, specified as "quantile", a numeric scalar, or a numeric vector.

    ValueDescription
    "quantile"kfoldPredict uses the specified quantile of the observed response values in the training-fold data as the predicted response value for observations with missing predictor values.
    Numeric scalar or vector

    • If PredictionForMissingValue is a scalar, then kfoldPredict uses the value as the predicted response value for observations with missing predictor values. The function uses the same value for all quantiles.

    • If PredictionForMissingValue is a vector, its length must be equal to the number of quantiles specified by the Quantiles name-value argument. kfoldPredict uses element i in the vector as the quantile i predicted response value for observations with missing predictor values.

    Example: PredictionForMissingValue=NaN

    Data Types: single | double | char | string

    Output Arguments

    collapse all

    Predicted response, returned as an n-by-q numeric vector, where n is the number of observations in CVMdl.X and q in the number of quantiles specified by the Quantiles name-value argument.

    If you use a holdout validation technique to create CVMdl (that is, if CVMdl.KFold is 1), then predictedY has NaN values for training-fold observations.

    Quantile crossing indicator, returned as a logical vector. Each entry corresponds to an observation in CVMdl.X. A value of 1 (true) indicates that the corresponding observation has predictions that cross each other. That is, two quantiles q1 and q2 exist in Quantiles such that q1 < q2 and predictedYq1 > predictedYq2.

    Algorithms

    kfoldPredict computes predictions according to the predict object function of the trained compact models in CVMdl (CVMdl.Trained). For more information, see the model-specific predict function reference pages in the following table.

    Model Typepredict Function
    Quantile linear regression modelpredict
    Quantile neural network model for regressionpredict

    Version History

    Introduced in R2025a