STA 210 - Fall 2022 - Cross validation

Spending our data

We have already established that the idea of data spending where the test set was recommended for obtaining an unbiased estimate of performance.
However, we usually need to understand the effectiveness of the model before using the test set.
Typically we can’t decide on which final model to take to the test set without making model assessments.
Remedy: Resampling to make model assessments on training data in a way that can generalize to new data.

Resampling for model assessment

Resampling is only conducted on the training set. The test set is not involved. For each iteration of resampling, the data are partitioned into two subsamples:

The model is fit with the analysis set. Model fit statistics such as , AIC, and BIC are calculated based on this fit.
The model is evaluated with the assessment set.

Resampling for model assessment

Source: Kuhn and Silge. Tidy modeling with R.

Analysis and assessment sets

Analysis set is analogous to training set.
Assessment set is analogous to test set.
The terms analysis and assessment avoids confusion with initial split of the data.
These data sets are mutually exclusive.

Cross validation

More specifically, v-fold cross validation – commonly used resampling technique:

Randomly split your training data into v partitions
Use v-1 partitions for analysis, and the remaining 1 partition for analysis (model fit + model fit statistics)
Repeat v times, updating which partition is used for assessment each time

Let’s give an example where v = 3…

Cross validation, step 1

Randomly split your training data into 3 partitions:

Split data

set.seed(345)
folds <- vfold_cv(office_train, v = 3)
folds

#  3-fold cross-validation 
# A tibble: 3 × 2
  splits          id   
  <list>          <chr>
1 <split [92/47]> Fold1
2 <split [93/46]> Fold2
3 <split [93/46]> Fold3

Cross validation, steps 2 and 3

Use v-1 partitions for analysis, and the remaining 1 partition for assessment
Repeat v times, updating which partition is used for assessment each time

Fit resamples

# Function to get Adj R-sq, AIC, BIC
calc_model_stats <- function(x) {
  glance(extract_fit_parsnip(x)) |>
    select(adj.r.squared, AIC, BIC)
}

set.seed(456)

# Fit model and calculate statistics for each fold
office_fit_rs1 <- office_wflow1 |>
  fit_resamples(resamples = folds, 
                control = control_resamples(extract = calc_model_stats))

office_fit_rs1

# Resampling results
# 3-fold cross-validation 
# A tibble: 3 × 5
  splits          id    .metrics         .notes           .extracts       
  <list>          <chr> <list>           <list>           <list>          
1 <split [92/47]> Fold1 <tibble [2 × 4]> <tibble [0 × 3]> <tibble [1 × 2]>
2 <split [93/46]> Fold2 <tibble [2 × 4]> <tibble [0 × 3]> <tibble [1 × 2]>
3 <split [93/46]> Fold3 <tibble [2 × 4]> <tibble [0 × 3]> <tibble [1 × 2]>

Cross validation, now what?

We’ve fit a bunch of models
Now it’s time to use them to collect metrics (e.g., $R^2$, AIC, RMSE, etc. ) on each model and use them to evaluate model fit and how it varies across folds

Collect and RMSE from CV

# Produces summary across all CV
collect_metrics(office_fit_rs1)

# A tibble: 2 × 6
  .metric .estimator  mean     n std_err .config             
  <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
1 rmse    standard   0.353     3  0.0117 Preprocessor1_Model1
2 rsq     standard   0.539     3  0.0378 Preprocessor1_Model1

Note: These are calculated using the assessment data

Deeper look into and RMSE

cv_metrics1 <- collect_metrics(office_fit_rs1, summarize = FALSE) 

cv_metrics1

# A tibble: 6 × 5
  id    .metric .estimator .estimate .config             
  <chr> <chr>   <chr>          <dbl> <chr>               
1 Fold1 rmse    standard       0.355 Preprocessor1_Model1
2 Fold1 rsq     standard       0.525 Preprocessor1_Model1
3 Fold2 rmse    standard       0.373 Preprocessor1_Model1
4 Fold2 rsq     standard       0.481 Preprocessor1_Model1
5 Fold3 rmse    standard       0.332 Preprocessor1_Model1
6 Fold3 rsq     standard       0.610 Preprocessor1_Model1

Better tabulation of and RMSE from CV

cv_metrics1 |>
  mutate(.estimate = round(.estimate, 3)) |>
  pivot_wider(id_cols = id, names_from = .metric, values_from = .estimate) |>
  kable(col.names = c("Fold", "RMSE", "R-squared"))

Fold	RMSE	R-squared
Fold1	0.355	0.525
Fold2	0.373	0.481
Fold3	0.332	0.610

How does RMSE compare to y?

Cross validation RMSE stats:

cv_metrics1 |>
  filter(.metric == "rmse") |>
  summarise(
    min = min(.estimate),
    max = max(.estimate),
    mean = mean(.estimate),
    sd = sd(.estimate)
  )

# A tibble: 1 × 4
    min   max  mean     sd
  <dbl> <dbl> <dbl>  <dbl>
1 0.332 0.373 0.353 0.0202

Training data IMDB score stats:

office_episodes |>
  
  summarise(
    min = min(imdb_rating),
    max = max(imdb_rating),
    mean = mean(imdb_rating),
    sd = sd(imdb_rating)
  )

# A tibble: 1 × 4
    min   max  mean    sd
  <dbl> <dbl> <dbl> <dbl>
1   6.7   9.7  8.25 0.535

Collect , AIC, BIC from CV

map_df(office_fit_rs1$.extracts, ~ .x[[1]][[1]]) |>
  bind_cols(Fold = office_fit_rs1$id)

# A tibble: 3 × 4
  adj.r.squared   AIC   BIC Fold 
          <dbl> <dbl> <dbl> <chr>
1         0.585  70.3 101.  Fold1
2         0.615  63.0  93.4 Fold2
3         0.553  77.6 108.  Fold3

Note: These are based on the model fit from the analysis data

Cross validation jargon

Referred to as v-fold or k-fold cross validation
Also commonly abbreviated as CV

Cross validation in practice

To illustrate how CV works, we used v = 3:
- Analysis sets are 2/3 of the training set
- Each assessment set is a distinct 1/3
- The final resampling estimate of performance averages each of the 3 replicates
This was useful for illustrative purposes, but v = 3 is a poor choice in practice
Values of v are most often 5 or 10; we generally prefer 10-fold cross-validation as a default

Cross validation

Announcements

Spring 2023 Statistics classes

Topics

Computational setup

Data & goal

Modeling prep

Split data into training and testing

Specify model

Model 1

From Lab 04

Create recipe

Preview recipe

Create workflow

Fit model to training data

Cross validation

Spending our data

Resampling for model assessment

Resampling for model assessment

Analysis and assessment sets

Cross validation

Cross validation, step 1

Split data

Cross validation, steps 2 and 3

Fit resamples

Cross validation, now what?

Collect and RMSE from CV

Deeper look into and RMSE

Better tabulation of and RMSE from CV

How does RMSE compare to y?

Collect , AIC, BIC from CV

Cross validation jargon

Cross validation in practice

Application exercise

Recap