Problem evaluating very small numbers

19 views (last 30 days)
Marc Laub
Marc Laub on 29 Dec 2023
Commented: Steven Lord on 1 Jan 2024
Hey, following problem
I trie to calculate probabilities from a bernoulli process.
So lets say I have a pool with 2 kind of balls in it, A and B. P_A=A/(A+B) is the probability to pick A and P_B=B/(A+B) is the probability to pick B.
My question know is to calculate the probability to pick a certain composition.
So when I take a bucket and draw some balls, my expected percentage of A in the bucket is P_A.
The spread, standart deviation is given by :sigma=sqrt(var)=sqrt(P_A*P_B/N), with N beeing the sample size in the bucket.
Now I just sweep over the composition space and evaluate my propabilities with given parameters and put them into a norma distribution.
My problem now is, that sample size N is dependent on the ratio in the bucket itsself and there are ceratin ratios that are (since it is a thermodynamic problem) impossible.
This leads to the fact that compositions where the probability would be highest is impossible (sigma=0) and the only compositions that are theoretically possible have such a low probability, that their probabiity is evaluated as zero, but in fact must be larger than zero.
Then pikcing rand composition ccording to their probability leads to an error since there is no probability value larger 0...
Is there some trick to avaluate those non zero probabilities and overcome the limit of eps(0)=4.9407e-324
Just some example numbers:
Cmp= 0.96; %expected value
x=linspace(0,1,100);
N=[-1040.60913764593 -1274.45152017157 -1517.53917725789 -1795.99182909355 -2122.34693486558 -2510.06682714684 -2975.64759019587 -3540.23709594117 -4231.60030785356 -5086.84402905997 -6156.38134667049 -7509.86596572323 -9245.29486431790 -11503.3241150237 -14490.4056493614 -18517.3387573243 -24065.7714122282 -31907.5747396386 -43329.2815472576 -60577.7164494251 -87804.5311543582 -133234.395760636 -214664.544494865 -375335.735182308 -738466.463858388 -1748414.20821032 -5774747.55005333 -41795919.1425974 -90501964828.2929 72025044.7727275 8009861.69662709 2313015.83600968 972090.250808377 500079.010918636 291994.059652964 185962.451073684 126170.032691603 89826.2196141791 66420.8965345887 50639.6682593742 39593.8852638151 31619.0994776238 25708.9843417277 21230.0434065884 17769.5292145238 15050.4283306574 12881.9596492194 11129.7129415162 9697.06783111667 8513.31060620962 7525.84673350056 6694.98454205075 5990.37155119071 5388.51592935915 4871.03437008720 4423.39490718085 4034.00244586203 3693.52515537666 3394.39247493320 3130.41695426230 2896.50651023482 2688.44342683401 2502.71313027739 2336.37044190681 2186.93430415773 2052.30432191963 1930.69415199415 1820.57800282139 1720.64740916629 1629.77611481347 1546.99139540023 1471.45052919991 1402.42140865373 1339.26650328765 1281.42955245236 1228.42449663107 1179.82625816233 1135.26306309632 1094.41006084980 1056.98405150267 1022.73917534666 991.463458577286 962.976145552797 937.125784786936 913.789076401132 892.870538113866 874.303112577631 858.049933356184 844.107611471935 832.511638100216 823.344895877077 816.750981076786 812.955382999989 812.300289399375 815.304778866772 822.776840476043 836.045264946388 857.525633118955 892.577334329355 966.725416708172];
var=Cmp*(1-Cmp)/N
simga=real(sqrt(var));
P=1./(sigma*sqrt(2*pi)).*exp(-0.5*((x-Cmp)./sigma).^2);
P(30:end) are evaluated as zero, where in fact must be >0 and those values are important for me, because with those values being zero, there is no solution, wher ein fact there is a solution, just with a very small probability.
  1 Comment
Steven Lord
Steven Lord on 1 Jan 2024
So lets say I have a pool with 2 kind of balls in it, A and B. P_A=A/(A+B) is the probability to pick A and P_B=B/(A+B) is the probability to pick B.
What are typical values for A and B for your problem? It's not clear to me how the "example numbers" in your code segment are related to the problem you described. You have defined a value N, but it's also not clear to me how you can have a sample size of -1040.60913764593.
Are you drawing with or without replacement?
This leads to the fact that compositions where the probability would be highest is impossible (sigma=0) and the only compositions that are theoretically possible have such a low probability, that their probabiity is evaluated as zero, but in fact must be larger than zero.
By the infinite monkey theorem there is a non-zero probability of a monkey sitting at a typewriter and producing the complete works of William Shakespeare. But one of the examples on the orders of magnitude (for numbers) Wikipedia page is for the probability of a monkey reproducing just one of those works, Hamlet. That probability (which is almost certainly never going to happen in this universe) is much, MUCH greater than one of the probabilities Walter Roberson computed in a comments (of 1e-864000000)

Sign in to comment.

Answers (3)

Ayush
Ayush on 29 Dec 2023
I understand that you are finding your probabilities being rounded down to zero due to numerical precision issues, you might consider working with log-probabilities instead. When you need to compare probabilities or normalize them, you can do so in the log space, which can help manage issues with numerical underflow.
To calculate the log of the value you may read: https://www.mathworks.com/help/matlab/ref/log.html
Thanks,
Ayush
  1 Comment
Marc Laub
Marc Laub on 31 Dec 2023
I tried and saved my probability matrices as log-probabilities.
I then tried to spread the log-probabilities onto the whole space from -10^-308 till +10^308, instead of from -10^-308 to 1 as it would be for a normalized function. problem now is, that the function randsample intrinsic normalizes the probabilities which destroys all efford.... since it needs to take the some over all probabilities for the normaization and the sum of numbers that are spread up to +10^308 is unfortunately Inf... so all probabilities are Inf after normalization within the randperm function.....

Sign in to comment.


Hassaan
Hassaan on 29 Dec 2023
% Assuming you have some values for p_A, p_B, and N
p_A = 0.96; % Just an example value
p_B = 1 - p_A;
N = 100; % Example sample size
% Calculating variance using the provided formula
variance = p_A * p_B / N;
% If the variance is too small, consider using vpa for higher precision
variance = vpa(variance, 50); % 50 digits of precision
% Calculating standard deviation
std_dev = sqrt(variance);
% If you need to evaluate the probability density function (pdf)
% at a particular point x, you could use the normpdf function.
% However, if the std_dev is too small, this could be problematic.
% Instead, you can compute the log-pdf.
% Example x value
x = 0.95;
% Log-pdf calculation
log_pdf = -0.5 * ((x - p_A) / std_dev)^2 - log(std_dev) - 0.5 * log(2*pi);
% If you need the actual pdf value, convert back from the log scale
pdf = exp(log_pdf);
disp(pdf)
17.872927470476997409064296230165
------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
  3 Comments
Walter Roberson
Walter Roberson on 31 Dec 2023
variance = vpa(variance, 50);
You cannot just introduce vpa at that point and expect it to improve precision.
Walter Roberson
Walter Roberson on 31 Dec 2023
Q = @(v) sym(v);
% Assuming you have some values for p_A, p_B, and N
p_A = Q(0.96); % Just an example value
p_B = 1 - p_A;
N = Q(100); % Example sample size
% Calculating variance using the provided formula
variance = p_A * p_B / N
variance = 
% Calculating standard deviation
std_dev = sqrt(variance)
std_dev = 
% If you need to evaluate the probability density function (pdf)
% at a particular point x, you could use the normpdf function.
% However, if the std_dev is too small, this could be problematic.
% Instead, you can compute the log-pdf.
% Example x value
x = Q(0.95);
% Log-pdf calculation
log_pdf = -Q(0.5) * ((x - p_A) / std_dev)^2 - log(std_dev) - Q(0.5) * log(2*Q(pi));
% If you need the actual pdf value, convert back from the log scale
pdf = exp(log_pdf);
disp(pdf)

Sign in to comment.


Walter Roberson
Walter Roberson on 31 Dec 2023
Is there some trick to avaluate those non zero probabilities and overcome the limit of eps(0)=4.9407e-324
Not without using the Symbolic Toolbox, which you have already indicated you are not willing to do.
  4 Comments
Walter Roberson
Walter Roberson on 1 Jan 2024
Q = @(v) sym(v);
Cmp = Q(0.96); %expected value
x = Q(linspace(0,1,101));
N = Q([-1040.60913764593 -1274.45152017157 -1517.53917725789 -1795.99182909355 -2122.34693486558 -2510.06682714684 -2975.64759019587 -3540.23709594117 -4231.60030785356 -5086.84402905997 -6156.38134667049 -7509.86596572323 -9245.29486431790 -11503.3241150237 -14490.4056493614 -18517.3387573243 -24065.7714122282 -31907.5747396386 -43329.2815472576 -60577.7164494251 -87804.5311543582 -133234.395760636 -214664.544494865 -375335.735182308 -738466.463858388 -1748414.20821032 -5774747.55005333 -41795919.1425974 -90501964828.2929 72025044.7727275 8009861.69662709 2313015.83600968 972090.250808377 500079.010918636 291994.059652964 185962.451073684 126170.032691603 89826.2196141791 66420.8965345887 50639.6682593742 39593.8852638151 31619.0994776238 25708.9843417277 21230.0434065884 17769.5292145238 15050.4283306574 12881.9596492194 11129.7129415162 9697.06783111667 8513.31060620962 7525.84673350056 6694.98454205075 5990.37155119071 5388.51592935915 4871.03437008720 4423.39490718085 4034.00244586203 3693.52515537666 3394.39247493320 3130.41695426230 2896.50651023482 2688.44342683401 2502.71313027739 2336.37044190681 2186.93430415773 2052.30432191963 1930.69415199415 1820.57800282139 1720.64740916629 1629.77611481347 1546.99139540023 1471.45052919991 1402.42140865373 1339.26650328765 1281.42955245236 1228.42449663107 1179.82625816233 1135.26306309632 1094.41006084980 1056.98405150267 1022.73917534666 991.463458577286 962.976145552797 937.125784786936 913.789076401132 892.870538113866 874.303112577631 858.049933356184 844.107611471935 832.511638100216 823.344895877077 816.750981076786 812.955382999989 812.300289399375 815.304778866772 822.776840476043 836.045264946388 857.525633118955 892.577334329355 966.725416708172]);
var = Cmp*(1-Cmp)./N;
sigma = real(sqrt(var));
part1 = 1./(sigma*sqrt(2*Q(pi)));
part2 = exp(-0.5*((x(:)-Cmp)./sigma).^2);
P = part1 .* part2;
nnz(isnan(double(P))) / numel(P) * 100
ans = 29.0000
29% of your solution comes out NaN
Walter Roberson
Walter Roberson on 1 Jan 2024
format long g
Q = @(v) sym(v);
Cmp = Q(0.96); %expected value
x = Q(linspace(0,1,101));
N = Q([-1040.60913764593 -1274.45152017157 -1517.53917725789 -1795.99182909355 -2122.34693486558 -2510.06682714684 -2975.64759019587 -3540.23709594117 -4231.60030785356 -5086.84402905997 -6156.38134667049 -7509.86596572323 -9245.29486431790 -11503.3241150237 -14490.4056493614 -18517.3387573243 -24065.7714122282 -31907.5747396386 -43329.2815472576 -60577.7164494251 -87804.5311543582 -133234.395760636 -214664.544494865 -375335.735182308 -738466.463858388 -1748414.20821032 -5774747.55005333 -41795919.1425974 -90501964828.2929 72025044.7727275 8009861.69662709 2313015.83600968 972090.250808377 500079.010918636 291994.059652964 185962.451073684 126170.032691603 89826.2196141791 66420.8965345887 50639.6682593742 39593.8852638151 31619.0994776238 25708.9843417277 21230.0434065884 17769.5292145238 15050.4283306574 12881.9596492194 11129.7129415162 9697.06783111667 8513.31060620962 7525.84673350056 6694.98454205075 5990.37155119071 5388.51592935915 4871.03437008720 4423.39490718085 4034.00244586203 3693.52515537666 3394.39247493320 3130.41695426230 2896.50651023482 2688.44342683401 2502.71313027739 2336.37044190681 2186.93430415773 2052.30432191963 1930.69415199415 1820.57800282139 1720.64740916629 1629.77611481347 1546.99139540023 1471.45052919991 1402.42140865373 1339.26650328765 1281.42955245236 1228.42449663107 1179.82625816233 1135.26306309632 1094.41006084980 1056.98405150267 1022.73917534666 991.463458577286 962.976145552797 937.125784786936 913.789076401132 892.870538113866 874.303112577631 858.049933356184 844.107611471935 832.511638100216 823.344895877077 816.750981076786 812.955382999989 812.300289399375 815.304778866772 822.776840476043 836.045264946388 857.525633118955 892.577334329355 966.725416708172]);
var = Cmp*(1-Cmp)./N;
sigma = real(sqrt(var));
part1 = 1./(sigma*sqrt(2*Q(pi)));
part2 = exp(-0.5*((x(:)-Cmp)./sigma).^2);
P = part1 .* part2;
minP = min(P,[],'all')
minP = 
double(minP)
ans =
0
maxP = max(P,[],'all')
maxP = 
double(maxP)
ans =
17277.7116645823
simplify(log(minP))
ans = 
double(ans)
ans =
-864300527.515557

Sign in to comment.

Products


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!