Using histogram to find the frequency of combinations within the data set

I am new to Matlab and just started getting into generating histograms. I understand how to create a histogram that determines the frequency of values..
But how can I create a histogram that is looking for specific combinations of data?
For example I have the following data set:
someData = [1,3,4,4,2,1,1,0]
How would I be able to create a histogram that determines the frequency of one value in the data set is exactly equal to the next value within the same dataset??
In the above example the answer should be 2. (4 repeat and are adjacent to each other. 1 repeats and is adjacent to each other)
Help is MOST appreciated!

1 Comment

I'm so lost...I'm thinking it may look something like this:
for i=1:8133
hist ((someData (i)==someData(i+1)/Total)
But I really have no clue AND I'm a beginner. Sorry if this questions seems elementary

Sign in to comment.

Answers (1)

You can do this with the gray level co-occurrence matrix (GLCM). It's done by the function graycomatrix() in the Image Processing Toolbox. It's like a 2D histogram that counts how many times each gray level (numerical value) occurs next to every other gray level (value).

12 Comments

ok...thanks for that tip! I will look into it. What about if I wanted to see the frequency for other types of combinations in a data set.
For example, the frequency that the next value is exactly 3 units more than it's adjacent value.
Or, the frequency that the average of 3 consecutive values are 5.
I'm reading the GCLM documentation ..and I'm not sure if it can do the above examples.
Thanks for you help!
You can definitely do that but with different functions, and you might have to check for each particular pairing one at a time. You can get the number of connected groups of a particular number with bwlabel like this:
[labeledArray numberOfRegions] = bwlabel(yourArray == theNumber);
numberOfRegions would be the number of times that 5's are connected to each other, and labeledArray says where in the array they occur. If you then want to find their area, like how many of groups of three 5's are there, how many groups of six 5's are there, etc. then you need to get the area of the regions like this:
measurements = regionprops(labeledArray, 'Area');
allAreas = [measurements.Area];
Or you can do it all in one line (without using bwlabel):
measurements = regionprops(yourArray == theNumber, 'Area');
allAreas = [measurements.Area];
That will give you the area (number of elements) of each connected group of 5's. This is called "connected components labeling" and it very common and straightforward. If you wanted to do something like how many combinations of [2 3 1 3] exist, then you'd first have to use a filter to find those locations, and give a logical matrix of [true true true true] when that pattern occurs and false where it doesn't occur, then you'd call regionprops on the logical matrix.
ok..this is very useful! I'm getting a step closer. however, I'm still a little confused about what the labeledArray is. Could you give some quick example code so I can get this through my thick skull:
Let's say:
my data set values are [1,4,5,0,1,2,2,5,7,10)
How would the syntax of the code look to know the frequency for:
1.) the number of times a value is exactly equal to its adjacent value (the answer should be 1)
2.) the number of times the adjacent value is exactly one unit more in value (the answer should be 3)
3.) the number of times here three adjacent values together have an average value of exactly 2 (the answer should be 1)
I'm a little hazy on what the "area" exactly is. I need to see an example code to sink this stuff in.
could you provide a quick example of how to create a filter to check the true,false values of a combination I am trying to find in the data set. I understand the concept theoretically, but am a little confused how to code it. Thanks for all your help!!!!
Wow, you keep making it more and more specialized and esoteric. Each of those would require different operations because each request is so different, but they can all be done. See my BlobsDemo example for labelling in my File Exchange.
For your example [1,4,5,0,1,2,2,5,7,10] would have two non-zero regions: [1 4 5] and [1 2 2 5 7 10]. If you wanted to find the 2's you'd do bwlabel(a==2) and you'd get 1 region with area 2 because there are two pixels with the value 2. Trying it with all the other values (which you can get with unique()) will get you one region for 7 and 10 with area of 1 each, and two regions for 1, 4, and 5 - each of those regions would have an area of 1 because those values are by themselves, not connected to any other elements of those values. Just type these into the command line and see what pops out:
bwlabel(yourMatrix == 1)
bwlabel(yourMatrix == 2)
bwlabel(yourMatrix == 4)
bwlabel(yourMatrix == 5)
bwlabel(yourMatrix == 7)
bwlabel(yourMatrix == 10)
Again, if you use unique() you could get all the unique values and you could put that in a loop with only one call to bwlabel inside the loop.
For your (2) I suggest you use diff() first, then regionprops(), or paly around with graycomatrix().
For (3) you'd need to run that through conv() or imfilter() first to get the average in a sliding window, then use regionprops().
Ok..I'll play around with this as best as I can. Thanks for the help!!
ok...I'm stuck. The data set I'm playing with has 8134 values so it's kinda long. I'm not sure how to actually make this bwlabel convenient and useful with this large data set.
So. For example I enter the code: bwlabel(myData == 1)
and I get back the matrix of over 8000 values. So when I get this result: (0,0,0,0,0,1,0) I understand that there is the value of 1 in the 6th position of the data set.
However...with a data set of over 8,000 values manually counting all the times I visually see two 1s next to each other is too much.
So how can I quickly calculate the number of times that 1 and 1 are next to each other?
7 lines of code should do it. Try something like this (untested)
labeledImage = myData == 1;
blobMeasurements= regionprops(labeledImage, 'Area');
% Now I'll demonstrate how to select certain blobs based using the ismember function.
% Let's say that we wanted to find only those blobs
% with an area of 2.
allBlobAreas = [blobMeasurements.Area];
% Get a list of the blobs that meet our criteria and we need to keep.
allowableAreaIndexes = allBlobAreas == 2; % Take only area=2 objects.
keeperIndexes = find(allowableAreaIndexes);
% Extract only those blobs that meet our criteria, and
% eliminate those blobs that don't meet our criteria.
% Note how we use ismember() to do this.
keeperBlobsImage = ismember(labeledImage, keeperIndexes);
% Now we have an array with only the area=2 blobs.
% Count them with the bwlabel function
[labeled2 numberOfRegions] = bwlabel(keeperBlobsImage );
% numberOfRegions will be the number of times
% that there are 2 1's in a row.
ok!! I think I'm starting to get it! The example code really helps!! I've played around with it....and for some reason towards the end of your example the values zero out. But it's ok..becuase I know how to use it now for what I need.
However, I'm still at a loss about how to calculate the average values of (for example) 10 adjacent steps.
You are my hero!
If you give a longer dataset with multiple regions, rather than just one, and say which number of adjacent pairs of numbers you want then I might be able to figure out what's going wrong. You might want to look at BlobsDemo in my File Exchange: http://www.mathworks.com/matlabcentral/fileexchange/?term=authorid%3A31862. That's the basis for everything I've done in this thread. It has a 2D example with coins on a uniform background - essentially the same as your situation except your situation is 1D. Though, and I'm not sure many people know this, all the image processing functions work on 1D arrays. A 1D array (single row image) is just as valid an image as one with many rows.
Yeah..I was studying your BlobsDemo yesterday. I wouldn't have thought about using the image processing toolbox for what I'm trying to do at all! Being that I'm a beginner....I still lack the skill to look at your demo and get the eureka feeling of knowing how to apply it to my specific task.
let's say this is my data set
dataSet = (1,4,6,2,2,7,8,4,4)
and I want to calculate the average in values of every three adjacent steps. Yesterday you mentioned: "you'd need to run that through conv() or imfilter() first to get the average in a sliding window, then use regionprops()"
However, I have no clue as to how to apply your suggestion. Some example code to make it clearer would be most appreciated!
For an average that moves over by 1 every time, use conv().
dataSet = [1,4,6,2,2,7,8,4,4]
slidingAverage = conv(dataSet, [1 1 1]/3, 'valid')
If you want to move in "jumps" of 3, then you'll have to use blockproc(). Here's a 2D demo for blockproc that you can probably easily adapt to 1D signals if you need to:
function blockproc_demo()
try
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
workspace; % Make sure the workspace panel is showing.
fontSize = 20;
% Change the current folder to the folder of this m-file.
if(~isdeployed)
cd(fileparts(which(mfilename)));
end
% Read in standard MATLAB demo image.
grayImage = imread('cameraman.tif');
[rows columns numberOfColorChannels] = size(grayImage);
% Display the original image.
subplot(2, 2, 1);
imshow(grayImage, []);
caption = sprintf('Original Image\n%d by %d pixels', ...
rows, columns);
title(caption, 'FontSize', fontSize);
% Enlarge figure to full screen.
set(gcf, 'Position', get(0,'Screensize'));
set(gcf, 'name','Image Analysis Demo', 'numbertitle','off')
% Block process the image.
windowSize = 3;
% Each 3x3 block will get replaced by one value.
% Output image will be smaller by a factor of windowSize.
myFilterHandle = @myFilter;
blockyImage = blockproc(grayImage,[windowSize windowSize], myFilterHandle);
[rowsP columnsP numberOfColorChannelsP] = size(blockyImage);
% Display the processed image.
% It is smaller, but the display routine imshow() replicates
% the image so that it looks bigger than it really is.
subplot(2, 2, 2);
imshow(blockyImage, []);
caption = sprintf('Image Processed in %d by %d Blocks\n%d by %d pixels\nCustom Box Filter', ...
windowSize, windowSize, rowsP, columnsP);
title(caption, 'FontSize', fontSize);
% Now let's do it an alternate way where we use an anonymous function.
% We'll take the standard deviation in the blocks.
windowSize = 8;
myFilterHandle2 = @(block_struct) ...
std2(block_struct.data) * ones(size(block_struct.data));
blockyImageSD = blockproc(grayImage, [windowSize windowSize], myFilterHandle2);
[rowsSD columnsSD numberOfColorChannelsSD] = size(blockyImageSD);
subplot(2, 2, 4);
imshow(blockyImageSD, []);
caption = sprintf('Image Processed in %d by %d Blocks\n%d by %d pixels\nAnonymous Standard Deviation Filter', ...
windowSize, windowSize, rowsSD, columnsSD);
title(caption, 'FontSize', fontSize);
% Note: the image size of blockyImageSD is 256x256, NOT smaller.
% That's because we're returning an 8x8 array instead of a scalar.
uiwait(msgbox('Done with demo'));
catch ME
errorMessage = sprintf('Error in blockproc_demo():\n\nError Message:\n%s', ME.message);
uiwait(warndlg(errorMessage));
end
return;
% Takes one 3x3 block of image data and multiplies it
% element by element by the kernel and
% returns a single value.
function singleValue = myFilter(blockStruct)
try
% Assign default value.
% Will be used near sides of image (due to boundary effects),
% or in the case of errors, etc.
singleValue = 0;
% Create a 2D filter.
kernel = [0 0.2 0; 0.2 0.2 0.2; 0 0.2 0];
% kernel = ones(blockStruct.blockSize); % Box filter.
% Make sure filter size matches image block size.
if any(blockStruct.blockSize ~= size(kernel))
% If any of the dimensions don't match.
% You'll get here near the edges,
% if the image is not a multiple of the block size.
% warndlg('block size does not match kernel size');
return;
end
% Size matches if we get here, so we're okay.
% Extract our block out of the structure.
array3x3 = blockStruct.data;
% Do the filtering. Multiply by kernel and sum.
singleValue = sum(sum(double(array3x3) .* kernel));
catch ME
% Some kind of problem...
errorMessage = sprintf('Error in myFilter():\n\nError Message:\n%s', ME.message);
% uiwait(warndlg(errorMessage));
fprintf(1, '%s\n', errorMessage);
end
return;
Thanks for all the example code. I'm learning A LOT through this exchange!! Alas, I'm stuck on applying your example to my data set. I understand what the blockproc does..but I am hazy on what this @myfilter business is.
Here is the code I've attempted to make:
% Block process the data.
dataChunk = 10;
% Each 10x1 block will get replaced by one value.
% Output data will be smaller by a factor of dataChunk.
myFilterHandle = @myFilter; % have NO clue here
groupsOfTen = blockproc(myData,[dataChunk 1], myFilterHandle);
But this causes an error about the 'fun' in the statement. Which I think is related to the @myFilter part that I have NO clue about what it means.
I'm guessing that after I'm able to get this step in the process to work, I then do conv() to get the averages of each value in the output for groupsOfTen?
Help is appreciated!

Sign in to comment.

Asked:

on 1 Aug 2012

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!