Main Content

KeyValueStore

Store key-value pairs for use with mapreduce

Description

The mapreduce function automatically creates a KeyValueStore object during execution and uses it to store key-value pairs added by the map and reduce functions. Although you never need to explicitly create a KeyValueStore object to use mapreduce, you do need to use the add and addmulti object functions to interact with this object in the map and reduce functions.

Creation

The mapreduce function automatically creates KeyValueStore objects during execution.

Object Functions

addAdd single key-value pair to KeyValueStore
addmultiAdd multiple key-value pairs to KeyValueStore

Examples

collapse all

The following map function uses the add function to add key-value pairs one at a time to an intermediate KeyValueStore object (named intermKVStore).

function MeanDistMapFun(data, info, intermKVStore)
    distances = data.Distance(~isnan(data.Distance));
    sumLenKey = 'sumAndLength';
    sumLenValue = [sum(distances), length(distances)];
    add(intermKVStore, sumLenKey, sumLenValue);
end

The following map function uses addmulti to add several key-value pairs to an intermediate KeyValueStore object (named intermKVStore). Note that this map function collects multiple keys in the intermKeys variable, and multiple values in the intermVals variable. This prepares a single call to addmulti to add all of the key-value pairs at once. It is a best practice to use a single call to addmulti rather than using add in a loop.

function meanArrivalDelayByDayMapper(data, ~, intermKVStore)
% Mapper function for the MeanByGroupMapReduceExample.

% Copyright 2014 The MathWorks, Inc.

% Data is an n-by-2 table: first column is the DayOfWeek and the second
% is the ArrDelay. Remove missing values first.
delays = data.ArrDelay;
day = data.DayOfWeek;
notNaN =~isnan(delays);
day = day(notNaN);
delays = delays(notNaN);

% find the unique days in this chunk
[intermKeys,~,idx] = unique(day, 'stable');

% group delays by idx and apply @grpstatsfun function to each group
intermVals = accumarray(idx,delays,size(intermKeys),@countsum);
addmulti(intermKVStore,intermKeys,intermVals);

function out = countsum(x)
n = length(x); % count
s = sum(x); % mean
out = {[n, s]};

Extended Capabilities

Thread-Based Environment
Run code in the background using MATLAB® backgroundPool or accelerate code with Parallel Computing Toolbox™ ThreadPool.

Version History

Introduced in R2014b