Why does comparison tool show modification when the values look the same?

1 view (last 30 days)
I'm comparing two .mat files, each containing a 2D matrix (2416 x 843) which should be identical. The left one is generated by a MATLAB script and weighs 585 KB. The right one by a Python script and weighs 550 KB. The MATLAB comparison tool highlights some values (including all NaNs, by the looks of it) as modified, though they appear to be the same (see screenshot). This only seems to happen above a certain volume of data (each value in this matrix represents the average of 9 values; the issue does not occur for averages of 8 values or fewer). This does not happen with any other files/variables in this workflow.
Does anyone have any ideas why this might be happening?
Edit to add: I have also tried loading these two variables in the command window and compare them in a few different ways (examples below), all of which seem to suggest that MATLAB identifies them as equal.
>> sum(meanSampleAge - py_meanSampleAge, "all", "omitnan")
ans =
0
>> py_meanSampleAge(1642, 550) == meanSampleAge(1642, 550) % highlighted cell in the image above, same for others
ans =
logical
1

Answers (2)

Taylor
Taylor on 18 Apr 2024
It could be a data type issue. Double check what data type Python is saving as (MATLAB defaults to double precision). Here is an example of how sneaky data precision issues can be.
Pi is equal to pi of course
pi == pi
ans = logical
1
But what if we use different precisions of pi?
double(pi) == single(pi)
ans = logical
0
Well let's just convert that single-precision pi back to double
double(pi) == double(single(pi))
ans = logical
0
So if your data is saved by Python as single and MATLAB tries to automatically convert it to double, they values may still not show up as "equal".
  4 Comments
Taylor
Taylor on 19 Apr 2024
Interesting. I'm hoenstly not entirely sure what is happening here. The NaNs being highlighted makes sense because NaNs are undefined values.
nan == nan
nan ~= nan
This is why isequaln exists. I'm not sure why the numeric values are being identified as not equal/modified. You could do more digging by inspecting the values at a specific index of the two matrices. You could also look at passing the data directly from Python to MATLAB using the MATLAB Engine in Python.

Sign in to comment.


Abigail Schiller
Abigail Schiller on 1 May 2024
Followup question:
In the following case (X is a 2-D matrix):
>> sum(X, 'all', 'omitnan') == nansum(nansum(X))
ans =
logical
0
>> sum(X, 'all', 'omitnan') - nansum(nansum(X))
ans =
-1.1642e-10
To my understanding, the above code contains two methods (sum(X, 'all', 'omitnan') and nansum(nansum(X))) for summing all the non-NaN values in matrix X. I thought these would yield identical values, but the results seem to differ on the order of ten to the power of minus ten (the same thing happens with equivalent methods in Python). Something similar happens with sum(X./1e6) vs sum(X)/1e6. Is this difference expected, or am I doing something wrong?
Reason I'm asking: When I compare MATLAB vs Python outputs, as in my original question, the difference is several orders of magnitude smaller, around ten to the power of minus fourteen. If this difference between the two MATLAB methods is expected, then the difference between MATLAB and Python is negligible by comparison - I can chalk it up to numerical instability and disregard it.

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!