Main Content
Results for
Base case:
Suppose you need to do a computation many times. We are going to assume that this computation cannot be vectorized. The simplest case is to use a for loop:
number_of_elements = 1e6;
test_fcn = @(x) sqrt(x) / x;
tic
for i = 1:number_of_elements
x(i) = test_fcn(i);
end
t_forward = toc;
disp(t_forward + " seconds")
Preallocation:
This can easily be sped up by preallocating the variable that houses results:
tic
x = zeros(number_of_elements, 1);
for i = 1:number_of_elements
x(i) = test_fcn(i);
end
t_forward_prealloc = toc;
disp(t_forward_prealloc + " seconds")
In this example, preallocation speeds up the loop by a factor of about three to four (running in R2024a). Comment below if you get dramatically different results.
disp(sprintf("%.1f", t_forward / t_forward_prealloc))
Run it in reverse:
Is there a way to skip the explicit preallocation and still be fast? Indeed, there is.
clear x
tic
for i = number_of_elements:-1:1
x(i) = test_fcn(i);
end
t_backward = toc;
disp(t_backward + " seconds")
By running the loop backwards, the preallocation is implicitly performed during the first iteration and the loop runs in about the same time (within statistical noise):
disp(sprintf("%.2f", t_forward_prealloc / t_backward))
Do you get similar results when running this code? Let us know your thoughts in the comments below.
Beneficial side effect:
Have you ever had to use a for loop to delete elements from a vector? If so, keeping track of index offsets can be tricky, as deleting any element shifts all those that come after. By running the for loop in reverse, you don't need to worry about index offsets while deleting elements.