Using histogram to find the frequency of combinations within the data set

Question

0 votes

I am new to Matlab and just started getting into generating histograms. I understand how to create a histogram that determines the frequency of values..

But how can I create a histogram that is looking for specific combinations of data?

For example I have the following data set:

someData = [1,3,4,4,2,1,1,0]

How would I be able to create a histogram that determines the frequency of one value in the data set is exactly equal to the next value within the same dataset??

In the above example the answer should be 2. (4 repeat and are adjacent to each other. 1 repeats and is adjacent to each other)

Help is MOST appreciated!

1 Comment
Show -1 older comments Hide -1 older comments

Clifford Shelton on 1 Aug 2012

Edited: Clifford Shelton on 1 Aug 2012

Open in MATLAB Online

I'm so lost...I'm thinking it may look something like this:

 for i=1:8133
      hist ((someData (i)==someData(i+1)/Total)

But I really have no clue AND I'm a beginner. Sorry if this questions seems elementary

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Image Analyst on 1 Aug 2012

0 votes

You can do this with the gray level co-occurrence matrix (GLCM). It's done by the function graycomatrix() in the Image Processing Toolbox. It's like a 2D histogram that counts how many times each gray level (numerical value) occurs next to every other gray level (value).

12 Comments
Show 10 older comments Hide 10 older comments

Image Analyst on 1 Aug 2012

Edited: Image Analyst on 1 Aug 2012

Open in MATLAB Online

You can definitely do that but with different functions, and you might have to check for each particular pairing one at a time. You can get the number of connected groups of a particular number with bwlabel like this:

[labeledArray numberOfRegions] = bwlabel(yourArray == theNumber);

numberOfRegions would be the number of times that 5's are connected to each other, and labeledArray says where in the array they occur. If you then want to find their area, like how many of groups of three 5's are there, how many groups of six 5's are there, etc. then you need to get the area of the regions like this:

measurements = regionprops(labeledArray, 'Area');
allAreas = [measurements.Area];

Or you can do it all in one line (without using bwlabel):

measurements = regionprops(yourArray == theNumber, 'Area');
allAreas = [measurements.Area];

That will give you the area (number of elements) of each connected group of 5's. This is called "connected components labeling" and it very common and straightforward. If you wanted to do something like how many combinations of [2 3 1 3] exist, then you'd first have to use a filter to find those locations, and give a logical matrix of [true true true true] when that pattern occurs and false where it doesn't occur, then you'd call regionprops on the logical matrix.

Clifford Shelton on 1 Aug 2012

Edited: Clifford Shelton on 1 Aug 2012

ok..this is very useful! I'm getting a step closer. however, I'm still a little confused about what the labeledArray is. Could you give some quick example code so I can get this through my thick skull:

Let's say:

my data set values are [1,4,5,0,1,2,2,5,7,10)

How would the syntax of the code look to know the frequency for:

1.) the number of times a value is exactly equal to its adjacent value (the answer should be 1)

2.) the number of times the adjacent value is exactly one unit more in value (the answer should be 3)

3.) the number of times here three adjacent values together have an average value of exactly 2 (the answer should be 1)

I'm a little hazy on what the "area" exactly is. I need to see an example code to sink this stuff in.

could you provide a quick example of how to create a filter to check the true,false values of a combination I am trying to find in the data set. I understand the concept theoretically, but am a little confused how to code it. Thanks for all your help!!!!

Image Analyst on 1 Aug 2012

Open in MATLAB Online

Wow, you keep making it more and more specialized and esoteric. Each of those would require different operations because each request is so different, but they can all be done. See my BlobsDemo example for labelling in my File Exchange.

For your example [1,4,5,0,1,2,2,5,7,10] would have two non-zero regions: [1 4 5] and [1 2 2 5 7 10]. If you wanted to find the 2's you'd do bwlabel(a==2) and you'd get 1 region with area 2 because there are two pixels with the value 2. Trying it with all the other values (which you can get with unique()) will get you one region for 7 and 10 with area of 1 each, and two regions for 1, 4, and 5 - each of those regions would have an area of 1 because those values are by themselves, not connected to any other elements of those values. Just type these into the command line and see what pops out:

bwlabel(yourMatrix == 1)
bwlabel(yourMatrix == 2)
bwlabel(yourMatrix == 4)
bwlabel(yourMatrix == 5)
bwlabel(yourMatrix == 7)
bwlabel(yourMatrix == 10)

Again, if you use unique() you could get all the unique values and you could put that in a loop with only one call to bwlabel inside the loop.

For your (2) I suggest you use diff() first, then regionprops(), or paly around with graycomatrix().

For (3) you'd need to run that through conv() or imfilter() first to get the average in a sliding window, then use regionprops().

Image Analyst on 3 Aug 2012

Open in MATLAB Online

For an average that moves over by 1 every time, use conv().

dataSet = [1,4,6,2,2,7,8,4,4]
slidingAverage = conv(dataSet, [1 1 1]/3, 'valid')

If you want to move in "jumps" of 3, then you'll have to use blockproc(). Here's a 2D demo for blockproc that you can probably easily adapt to 1D signals if you need to:

function blockproc_demo()
try
  clc; % Clear the command window.
  close all; % Close all figures (except those of imtool.)
  workspace; % Make sure the workspace panel is showing.
  fontSize = 20;
  % Change the current folder to the folder of this m-file.
  if(~isdeployed)
    cd(fileparts(which(mfilename)));
  end
  % Read in standard MATLAB demo image.
  grayImage = imread('cameraman.tif');
  [rows columns numberOfColorChannels] = size(grayImage);
  % Display the original image.
  subplot(2, 2, 1);
  imshow(grayImage, []);
  caption = sprintf('Original Image\n%d by %d pixels', ...
  rows, columns);
  title(caption, 'FontSize', fontSize);
  % Enlarge figure to full screen.
  set(gcf, 'Position', get(0,'Screensize'));
  set(gcf, 'name','Image Analysis Demo', 'numbertitle','off')
  % Block process the image.
  windowSize = 3;
  % Each 3x3 block will get replaced by one value.
  % Output image will be smaller by a factor of windowSize.
  myFilterHandle = @myFilter;
  blockyImage = blockproc(grayImage,[windowSize windowSize], myFilterHandle);
  [rowsP columnsP numberOfColorChannelsP] = size(blockyImage);
  % Display the processed image.
  % It is smaller, but the display routine imshow() replicates
  % the image so that it looks bigger than it really is.
  subplot(2, 2, 2);
  imshow(blockyImage, []);
  caption = sprintf('Image Processed in %d by %d Blocks\n%d by %d pixels\nCustom Box Filter', ...
  windowSize, windowSize, rowsP, columnsP);
  title(caption, 'FontSize', fontSize);
  % Now let's do it an alternate way where we use an anonymous function.
  % We'll take the standard deviation in the blocks.
  windowSize = 8;
  myFilterHandle2 = @(block_struct) ...
  std2(block_struct.data) * ones(size(block_struct.data));
  blockyImageSD = blockproc(grayImage, [windowSize windowSize], myFilterHandle2);
  [rowsSD columnsSD numberOfColorChannelsSD] = size(blockyImageSD);
  subplot(2, 2, 4);
  imshow(blockyImageSD, []);
  caption = sprintf('Image Processed in %d by %d Blocks\n%d by %d pixels\nAnonymous Standard Deviation Filter', ...
  windowSize, windowSize, rowsSD, columnsSD);
  title(caption, 'FontSize', fontSize);
  % Note: the image size of blockyImageSD is 256x256, NOT smaller.
  % That's because we're returning an 8x8 array instead of a scalar.
  uiwait(msgbox('Done with demo'));
  catch ME
    errorMessage = sprintf('Error in blockproc_demo():\n\nError Message:\n%s', ME.message);
    uiwait(warndlg(errorMessage));
  end
  return;
% Takes one 3x3 block of image data and multiplies it
% element by element by the kernel and
% returns a single value.
function singleValue = myFilter(blockStruct)
  try
    % Assign default value.
    % Will be used near sides of image (due to boundary effects),
    % or in the case of errors, etc.
    singleValue = 0;
    % Create a 2D filter.
    kernel = [0 0.2 0; 0.2 0.2 0.2; 0 0.2 0];
    % kernel = ones(blockStruct.blockSize); % Box filter.
    % Make sure filter size matches image block size.
    if any(blockStruct.blockSize ~= size(kernel))
      % If any of the dimensions don't match.
      % You'll get here near the edges,
      % if the image is not a multiple of the block size.
      % warndlg('block size does not match kernel size');
      return;
    end
    % Size matches if we get here, so we're okay.
    % Extract our block out of the structure.
    array3x3 = blockStruct.data;
    % Do the filtering. Multiply by kernel and sum.
    singleValue = sum(sum(double(array3x3) .* kernel));
  catch ME
    % Some kind of problem...
    errorMessage = sprintf('Error in myFilter():\n\nError Message:\n%s', ME.message);
    % uiwait(warndlg(errorMessage));
    fprintf(1, '%s\n', errorMessage);
  end
  return;

Clifford Shelton on 3 Aug 2012

Open in MATLAB Online

Thanks for all the example code. I'm learning A LOT through this exchange!! Alas, I'm stuck on applying your example to my data set. I understand what the blockproc does..but I am hazy on what this @myfilter business is.

Here is the code I've attempted to make:

% Block process the data.
  dataChunk = 10;
  % Each 10x1 block will get replaced by one value. 
  % Output data will be smaller by a factor of dataChunk.
  myFilterHandle = @myFilter; % have NO clue here
  groupsOfTen = blockproc(myData,[dataChunk 1], myFilterHandle);

But this causes an error about the 'fun' in the statement. Which I think is related to the @myFilter part that I have NO clue about what it means.

I'm guessing that after I'm able to get this step in the process to work, I then do conv() to get the averages of each value in the output for groupsOfTen?

Help is appreciated!

Sign in to comment.

Using histogram to find the frequency of combinations within the data set

1 Comment
Show -1 older comments Hide -1 older comments

Answers (1)

12 Comments
Show 10 older comments Hide 10 older comments

Categories

Tags

Community Treasure Hunt

Using histogram to find the frequency of combinations within the data set

1 Comment Show -1 older comments Hide -1 older comments

Answers (1)

12 Comments Show 10 older comments Hide 10 older comments

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

12 Comments
Show 10 older comments Hide 10 older comments