treating NaN as a unique value (instead of as a distinct)
    54 views (last 30 days)
  
       Show older comments
    
In
there is the following example:
Unique Values in Array Containing NaNs
A = [5 5 NaN NaN];
C = unique(A)
C =
     5   NaN   NaN
unique treats NaN values as distinct.
IS it possible to treat NaN as a unique value so at to have
C=5 NaN
1 Comment
  Malcolm Lidierth
      
 on 2 Jul 2012
				NaNs are treated as "bigger" than +Inf by Arrays.sort() in Java and by MATLAB unique/sort too by the look of it so you can just trim the output array.
If MATLAB NaN does not return a constant NaN bit pattern (it probably does), java.lang.Double.NaN will do. But NaNs are NaNs so each is treated as unique even if the bit pattern is the same.
Accepted Answer
  Kye Taylor
      
 on 2 Jul 2012
        
      Edited: Kye Taylor
      
 on 2 Jul 2012
  
      Write this function...
function y = myUnique(x)
  y = unique(x);
  if any(isnan(y))
    y(isnan(y)) = []; % remove all nans
    y(end+1) = NaN; % add the unique one.
  end
end
More Answers (3)
  Walter Roberson
      
      
 on 2 Jul 2012
        Stealing ideas and optimizing...
function y = myUnique(x)
  y = unique(x);
  y(isnan(y(1:end-1))) = [];
end
5 Comments
  Walter Roberson
      
      
 on 2 Jul 2012
				Note to people trying to understand my code: it makes use of an obscure trick. When you index with a logical vector, the vector you are indexing with can be shorter than the vector being indexed. The "missing" logical values are treated as false. By only applying isnan() to the elements up to one before the end of the vector, I prevent the last element of the vector from being tested for NaN, so I am preventing that last element from being deleted. This has the effect of preserving one NaN from being deleted.
  James Tursa
      
      
 on 2 Jul 2012
        
      Edited: James Tursa
      
      
 on 3 Jul 2012
  
      Assuming the input A is double class and all the NaN values have the same underlying bit pattern (which seems to be true of the MATLAB functions):
C = typecast(unique(typecast(A,'uint64')),'double');
If you are working with single class variables then:
C = typecast(unique(typecast(A,'uint32')),'single');
The above code has two extra data copies involved. If you don't want to absorb the time/resource penalty of these data copies, you can use my TYPECASTX function from the FEX which returns a shared data copy of the input:
C = typecastx(unique(typecastx(A,'uint64')),'double');
If you are working with single class variables then:
C = typecastx(unique(typecastx(A,'uint32')),'single');
TYPECASTX can be found here:
---------------------------------
WARNING -- WARNING:
---------------------------------
The above code should not be used because the UNIQUE function does not work with uint64 and int64 class inputs. I am leaving my post here for reference, but do not use the above code. See the discussion in the comments below.
4 Comments
  James Tursa
      
      
 on 3 Jul 2012
				FOLLOW-UP:
After stepping though the UNIQUE code, it turns out the result differences are not because of changes in the UNIQUE code itself, but in the underlying double -- uint64 conversions behind the scenes. That is, both the R2010a (and prior) versions and R2010b (and later) versions of UNIQUE do the conversions to/from double inside the code without regard to input class (which seems to be a bug to me in both cases). But the conversion code itself has apparently changed. I get different answers with the following:
format hex
-inf
typecast(ans,'uint64')
double(ans)
uint64(ans)
On R2010a and earlier, a nearby number (likely the result of some rouding scheme in the background) is produced, not the original number. On R2010b and later, the original number is reproduced. It may be that if I had used a different number to start with, the R2010a and earlier versions would reproduce it and the R2010b and later would not. I don't know. I haven't had the time to fully test this out yet, and don't know the extent of the uint64 and int64 arithmetic/conversion changes that were made.
This begs the question, however, of how many other MATLAB functions have conversions to/from double in the background that would render them buggy when used with uint64 or int64 class data.
The above was done on a 32-bit WinXP machine.
  James Tursa
      
      
 on 3 Jul 2012
				
      Edited: James Tursa
      
      
 on 3 Jul 2012
  
			FOLLOW-UP #2:
As I suspected, it wasn't hard to come up with uint64 numbers that did not work for R2010b and later. Bottom line is UNIQUE is buggy for uint64 and int64 class inputs in all versions of MATLAB as far as I can tell because of the underlying silent conversion to / from double.
  Sean de Wolski
      
      
 on 2 Jul 2012
        Replace the NaNs with an obscure number that you check to make sure is not present first. This will give you the full functionality of unique
function [u, ia, ic] = nanunique(varargin)
  x = varargin{1};
  t = rand;
  while any(x(:)==t)
      t = rand;
  end
  x(isnan(x)) = t;
  [u, ia, ic] = unique(x,varargin{2:end});
  u(u==t)=nan;
end
Then call it with something like:
nanunique([5 5 2 7 nan nan 5])
2 Comments
  Sean de Wolski
      
      
 on 2 Jul 2012
				NOTE, this gives full functionality for double precision numbers, I am not checking for cellstrs or other wierd things but this could be done easily.
See Also
Categories
				Find more on Logical in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!





