union

Class: dataset

(Not Recommended) Set union for dataset array observations

The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Syntax

C = union(A,B)
C = union(A,B,vars)
C = union(A,B,vars,setOrder)
[C,iA,iB] = union(___)

Description

C = union(A,B) for dataset arrays A and B returns the combined set of observations from the two arrays, with repetitions removed. The observations in the dataset array C are sorted.

C = union(A,B,vars) returns the combined set of observations from the two arrays, with repetitions of unique combinations of the variables specified in vars removed. The observations in the dataset array C are sorted by those variables.

The values for variables not specified in vars for each observation in C are taken from the corresponding observation in A or B, or from A if there are common observations in both A and B. If there are multiple observations in A or B that correspond to an observation in C, those values are taken from the first occurrence.

C = union(A,B,vars,setOrder) returns the observations in C in the order specified by setOrder.

[C,iA,iB] = union(___) also returns index vectors iA and iB such that C is a sorted combination of the values A(iA,:) and B(iB,:). If there are common observations in A and B, then union returns only the index from A, in iA. If there are repeated observations in A or B, then the index of the first occurrence is returned. You can use any of the previous input arguments.

Input Arguments

A,B

Input dataset arrays.

vars

String array or cell array of character vectors containing variable names, or a vector of integers containing variable column numbers. vars indicates the variables for which union removes repetitions of unique combinations of the variables.

Specify vars as [] to use its default value of all variables.

setOrder

Flag indicating the sorting order for the observations in C. The possible values of setOrder are:

 'sorted' Observations in C are in sorted order (default). 'stable' Observations in C are in the same order that they appear in A, then B.

Output Arguments

 C Dataset array with the combined observations of A and B, with repetitions removed. C is in sorted order (by default), or the order specified by setOrder. iA Index vector, indicating the observations in A that contribute to the union. iA contains the index to the first occurrence of any repeated observations in A. iB Index vector, indicating the observations in B that contribute to the union. If there are common observations in A and B, then union returns only the index from A, in iA. iB contains the index to the first occurrence of any repeated observations in B.

Examples

expand all

A = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx'));
B = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx'),'Sheet',2);
[length(A) length(B)]
ans =

14     8

The first dataset array, A, has 14 observations. The second dataset array, B, has 8 observations.

Return the union.

C = union(A,B);
length(C)
ans =

21

The union of the two dataset arrays has 21 observations, indicating that there was one observation replicated in A and B.