Calculate similarity between data columns in the data matrix

Hello
I have an Excel file containing recording data. The first four columns are log data:
  • Column 1 logs the stimulus IDs over time. Values such as 1, 2, 3, 4, etc., represent stimulus IDs, while 0 indicates no stimulus was presented.
  • Column 2 logs the intervals of stimulus presentation:
  • 1 = pre-stimulus interval,
  • 2 = stimulus interval,
  • 3 = post-stimulus interval.
I need to calculate the correlation of data in each column (from column 6 to the last column) with the data in column 5 during the stimulus interval (i.e., where the value in column 2 equals 2) and do it only for specific stimuli in the analysis, such as Stimulus 1 and Stimulus 3 (Column 1). I would like to save the correlation coefficient for each column. Could you help me implement this in MATLAB?
I am attaching example data file below
Many Thanks in Advance!

 Accepted Answer

First, a vocabulary note: "Similarity" and "correlation" don't mean quite the same thing. You can use the corrcoef function to get a correlation matrix, which gives the bivariate Pearson correlation coefficients for all the pairs of columns in the inputted matrix. If I'm understanding you correctly, what you want is something like this (I'm doing it for Stimulus 3, but you can straightforwardly adapt the code for any stimulus number just by replacing 3 with the desired stimulus number):
% import data from Excel file into matrix
% (column 1 is stimulus ID number, column 2 is interval ID number,
% additional columns are additional variables)
data = readmatrix('Test3.xlsx') ;
% reduced matrix, which only contains rows from data where the column 1 value is 3
% and the column 2 value is 2, i.e., only contains data for Stimulus 3
% during Interval 2 (Interval 2 is the interval during the stimulus itself)
dataDuringStimulus3 = data( data(:, 1) == 3 & data(:, 2) == 2, : ) ;
% correlation matrix for columns 5:end in that reduced matrix
% (e.g., row 1 or column 1 of this correlation matrix gives the correlations of
% variable 5 with variables 5:end; row 2 or column 2 of this correlation
% matrix gives the correlations of variable 6 with variables 5:end; etc., where
% "variable" number refers to the column number in the original data matrix)
correlationMatrixDuringStimulus3 = corrcoef( dataDuringStimulus3(:, 5:end) ) ;
% vector giving correlation of variable 5 with each subsequent variable
% (i.e., with variables 6, 7, 8, etc.) during Stimulus 3; we just extract the
% first row from the correlation matrix and omit the first value (i.e., we omit
% the correlation of variable 5 with itself, which is of course 1)
correlationOfVar5WithEachSubsequentVarDuringStimulus3 = ...
correlationMatrixDuringStimulus3(1, 2:end) ;
Here's the same thing, but for Stimulus 1 and 3 pooled together:
% import data from Excel file into matrix
% (column 1 is stimulus ID number, column 2 is interval ID number,
% additional columns are additional variables)
data = readmatrix('Test3.xlsx') ;
% reduced matrix, which only contains rows from data where the column 1 value is 1
% or 3 and the column 2 value is 2, i.e., only contains data for Stimulus 1 or 3
% during Interval 2 (Interval 2 is the interval during the stimulus itself)
dataDuringStimulus1or3 = data( data(:, 1) == 1 | data(:, 1) == 3 & ...
data(:, 2) == 2, : ) ;
% correlation matrix for columns 5:end in that reduced matrix
% (e.g., row 1 or column 1 of this correlation matrix gives the correlations of
% variable 5 with variables 5:end; row 2 or column 2 of this correlation
% matrix gives the correlations of variable 6 with variables 5:end; etc., where
% "variable" number refers to the column number in the original data matrix)
correlationMatrixDuringStimulus1or3 = corrcoef( dataDuringStimulus1or3(:, 5:end) ) ;
% vector giving correlation of variable 5 with each subsequent variable
% (i.e., with variables 6, 7, 8, etc.) during Stimulus 1 or 3; we just extract the
% first row from the correlation matrix and omit the first value (i.e., we omit
% the correlation of variable 5 with itself, which is of course 1)
correlationOfVar5WithEachSubsequentVarDuringStimulus1or3 = ...
correlationMatrixDuringStimulus1or3(1, 2:end) ;

More Answers (1)

Hi @EK
From my understanding, you want to calculate the correlation between a specific column ("column 5") and other columns in your dataset during specific conditions: when a stimulus is presented (interval value 2) and for certain stimulus IDs (1 and 3).
1. Select columns for stimulus IDs, intervals, and the column of interest (column 5).
dataArray = table2array(data);
stimulusID = dataArray(:, 1);
interval = dataArray(:, 2);
column5 = dataArray(:, 5);
2. Identify rows where the stimulus interval is 2 and the stimulus ID is either 1 or 3.
filterIdx = (interval == 2) & (stimulusID == 1 | stimulusID == 3);
3. Loop through each column from column 6 to the last column and calculate the correlation with column 5 for the filtered data.
numColumns = size(dataArray, 2);
correlationCoefficients = zeros(1, numColumns - 5);
for col = 6:numColumns
columnData = dataArray(filterIdx, col);
column5Data = column5(filterIdx);
correlationCoefficients(col - 5) = corr(column5Data, columnData);
end
Hope this helps!

3 Comments

I'm guessing the OP wanted to compute the correlations for each stimulus separately, rather than pooling the Stimulus 1 and Stimulus 3 data together, though it wasn't entirely clear.
But either way, just calling the corrcoef function once to obtain the full correlation matrix would be more efficient than calling the corr function repeatedly in a for-loop to compute the correlations one at a time (also, corr requires the Statistics and Machine Learning toolbox). So I'd suggest replacing your entire Step 3 with something like this, which produces the same result:
% correlation matrix for variable 5:end in filtered data matrix
correlationMatrix = corrcoef( dataArray(filterIdx, 5:end) ) ;
% extract row 1 from that correlation matrix (and omit first value, which is
% just the correlation of variable 5 with itself) so we have a vector of
% correlations between variable 5 and each subsequent respective variable
correlationCoefficients = correlationMatrix(1, 2:end) ;
Sorry, I could not run your code. Pooling the Stimulus 1 and Stimulus 3 data together is correct.
I edited my accepted answer so it also includes a version for Stimulus 1 and 3 pooled together.

Sign in to comment.

Products

Release

R2022a

Asked:

EK
on 4 Dec 2024

Commented:

on 7 Dec 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!