## Cosine distance range interpretation

### Louis (view profile)

on 13 Dec 2013
Latest activity Edited by Roger Stafford

on 14 Dec 2013

### Roger Stafford (view profile)

I am trying to use the cosine distance in pdist2. I am confused about it's output. As far as I know it should be between 0 and 1. Since Matlab uses 1-(cosine), then 1 would be the highest variability while 0 would be the lowest. However the output seems to range from 0.5 to 1.5 or something along that!
Can somebody please advise me on how to interpret its output and why ?

dpb

### dpb (view profile)

on 13 Dec 2013
Looking at the m-file, it doesn't appear to do what it says, precisely...
...
case 'cos' % Cosine
[X,Y,flag] = normalizeXY(X,Y);
...
case 'cor' % Correlation
X = bsxfun(@minus,X,mean(X,2));
Y = bsxfun(@minus,Y,mean(Y,2));
[X,Y,flag] = normalizeXY(X,Y);
...
case 'spe'
X = tiedrank(X')'; % treat rows as a series
Y = tiedrank(Y')';
X = X - (p+1)/2; % subtract off the (constant) mean
Y = Y - (p+1)/2;
[X,Y,flag] = normalizeXY(X,Y);
...
case {'cos' 'cor' 'spe'} % Cosine, Correlation, Rank Correlation
% This assumes that data have been appropriately preprocessed
for i = 1:ny
d = zeros(nx,1,outClass);
for q = 1:p
d = d + (X(:,q).*Y(i,q));
end
...
There's some other normalization and ordering but no cos() in sight. The difference between the various alternatives seems only in the precondition of the input values before the distance computation for the three cases here.
I don't have time at the moment to try to actually read this more thoroughly; perhaps the above will give you some klews...

### Roger Stafford (view profile)

on 14 Dec 2013
Edited by Roger Stafford

on 14 Dec 2013