rng('default') unreliable
Show older comments
I have some lengthy code (too long to post here) which executes some sampling based computations. At the beginning of the code, I set
rng('default')
to ensure reproducibility of results for debugging purposes (afaik, this means a Mersenne twister with seed 0). This is the only place in the code where rng is invoked. During program execution, some quantities are sampled using normrnd, which are subsequently used as inputs for the computations. The thing is, I've noticed on multiple occasions now that running the exact same code several times does NOT always yield the exact same results.
I always have one instance of Matlab running in GUI mode. Alternatively, I can also run the code in batch mode in the background, which I do most of the time. Although I change absolutely nothing in the code, the sample values occasionally (not always) differ from run to run. I can't figure out what's going on, but since my input does not change, it must have something to do with the rng. Has someone experienced the same problem? Could it be some kind of crossfire between GUI and batch mode? Or does rng('default') involve time or some other "random" variable?
Answers (2)
rng('default') sets the random number generator to the initial state. This is a reliable procedure.
If your code replies different results for the random numbers, it must contain another source of randomness. If this effect occurs only sometimes, it does not seem to be a problem related to the value of the current time. This would be something like:
% Not likely to be the problem:
itime = now;
while now - itime < 5
a = rand;
end
Such dependencies on the time should be observable in all runs.
Another problem can be using different inputs relying ob the path "D:" : This is the last folder used on the disk D and can be influenced by external software. Brrr, such evil.
I assume, you have to check the code and search for the source of the problem. While the RNG is known not to create magical output, you either process different inputs (without knowing it) or there is another source of entropy.
8 Comments
Jan
on 14 Jun 2022
Hi Jan,
thanks for your answer. I'd really like to figure this one out. Here are my thoughts:
- Looking at a multitude of results, it seems time can be ruled out as a factor (as you said, rng('default') should not involve time)
- I do invoke some inputs from path "F:\some_folder\..." (which I dead certain never changed during my tests). My code is also stored there. "F:" is my last local drive (I also have a "D:" drive). Could that be part of the problem (i. e. have an influence on rng)?
- Could it be that running a job in batch mode makes a difference (different rng behaviour)?
- What other sources of entropy could there possibly be? Resetting rng and then drawing the same random numbers over and over should leave no room for entropy (at least not user induced). Or are there commands other than rng which introduce a random component?
I am certain the problem is not input related because I do nothing else than running a job, looking at the results, then running the exact same job again and looking at the results again. I could just accept it as randomness (the sample values always lie in the specified range), but unexpected behaviour is a red light for me, so I'd really like to know the cause.
Jan
on 14 Jun 2022
@broken_arrow: Distinguish "D-Drive", "D:" and "D:\". The first is a drive letter, the last ist the path to the root folder on the disk D and "D:" is the last used folder on drive D, which is very volatile and therefore it should be avoided strictly. If you code contains a line like: cd('F:') you cannot be sure, where you get, while cd('F:\') is stable.
Relying on the current folder is a bad programming pattern, because a callback of a timer or GUI can change the current folder. So use absolute paths for all files in general.
But it is unlikely that your code creates a wrong output if the current folder is changed unexpectedly. A crash is more likely.
The most frequent cause of entropy is using orphaned pointers or uninitialized variables in C-Mex functions. Do you call C-mex functions?
In Matlab R2009a, the sum() command was implemented using multithreading above the limit of 88999 elements. Then the partial results were added in the temporal order. This could cause different results between repeated calls, see https://www.mathworks.com/support/bugreports/532399 . Maybe you are using another function, which has an equivalent bug.
It would be useful to narrow down the code, which causes the difference.
Steven Lord
on 14 Jun 2022
Let's check a couple things.
First run this command then run your code and make sure that MATLAB only stops once, immediately before the rng('default') call in your code. If it stops more than once, look at how the code is calling rng. If it calls rng('shuffle') that would be your time dependence.
dbstop in rng
Once MATLAB stops in the rng call you can use the Continue button in the Editor toolstrip or the following command to continue execution.
dbcont
Second, when you say you run your code in batch mode do you mean you launch another session of MATLAB using the -batch startup option or do you use the batch function from Parallel Computing Toolbox? If the latter see this documentation page.
broken_arrow
on 14 Jun 2022
Edited: broken_arrow
on 14 Jun 2022
broken_arrow
on 15 Jun 2022
" Maybe the fieldnames function should be modified to return the field names in alphabetical (or some other) order."
Structure fields actually have an order (by default the order they were created in), therefore FIELDNAMES should return them in exactly that order. This is critical when using things like STRUCT2CELL and CELL2STRUCT together with the fieldnames.
You can always reorder the fields yourself:
Steven Lord
on 15 Jun 2022
Or you could sort the output of fieldnames before deciding which field gets which random vector if you don't need or want to reorder the struct array itself.
broken_arrow
on 17 Jun 2022
broken_arrow
on 15 Jun 2022
Edited: broken_arrow
on 15 Jun 2022
Categories
Find more on Loops and Conditional Statements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!