Main Content

setdiff

(Not Recommended) Set difference for dataset array observations

The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Description

C = setdiff(A,B) for dataset arrays A and B returns the set of observations that are in A but not B, with repetitions removed. The observations in the dataset array C are sorted.

example

C = setdiff(A,B,vars) returns the set of observations that are in A but not B, considering only the variables specified in vars, with repetitions removed. The observations in the dataset array C are sorted by these variables. The values for variables not specified in vars for each observation in C are taken from the corresponding observation in A. If there are multiple observations in A that correspond to an observation in C, those values are taken from the first occurrence.

C = setdiff(A,B,vars,setOrder) returns the observations in C in the order specified by setOrder.

[C,iA] = setdiff(___) also returns the index vector iA such that C = A(iA,:). If there are repeated observations in A, then setdiff returns the index of the first occurrence. You can use any of the previous input arguments.

Examples

collapse all

Create a scalar structure array, and then convert it into two dataset arrays.

S(1,1).Name = 'CLARK';
S(1,1).Gender = 'M';
S(1,1).SystolicBP = 124;
S(1,1).DiastolicBP = 93;

S(2,1).Name = 'BROWN';
S(2,1).Gender = 'F';
S(2,1).SystolicBP = 122;
S(2,1).DiastolicBP = 80;

S(3,1).Name = 'MARTIN';
S(3,1).Gender = 'M';
S(3,1).SystolicBP = 130;
S(3,1).DiastolicBP = 92;

A = struct2dataset(S(1:2));
B = struct2dataset(S(2:3));

The intersection of A and B is the second observation, with last name BROWN.

Return the set difference of A and B.

[C,iA] = setdiff(A,B)
C = 
    Name             Gender       SystolicBP    DiastolicBP
    {'CLARK'}        {'M'}        124           93         

iA = 
1

The first observation in A is not present in B.

Input Arguments

collapse all

Input arrays, specified as dataset objects.

Variable names, specified as a string array, cell array of character vectors, or vector of integers containing variable column numbers. vars indicates the variables in A and B that setdiff considers.

Specify vars as [] to use its default value of all variables.

Flag indicating sorting order for observations in the resulting array C, specified as 'sorted' or 'stable'.

'sorted'Observations in C are in sorted order (default).
'stable'Observations in C are in the same order that they appear in A.

Output Arguments

collapse all

Dataset containing observations that belong to A but not B, with repetitions removed, returned as a dataset object. C is in sorted order (by default), or the order specified by setOrder.

Index vector indicating observations from A that are in C, returned as a vector of integers. The vector iA contains the index to the first occurrence of any repeated observations in A.

Version History

Introduced in R2012b