Hadoop Compiler

Package MATLAB programs for deployment to Hadoop clusters as MapReduce programs

The Hadoop Compiler app will be removed in a future release. To create standalone MATLAB® MapReduce applications, or deployable archives from MATLAB map and reduce functions, use the mcc command. For details, see Compatibility Considerations.

Description

The Hadoop Compiler app packages MATLAB map and reduce functions into a deployable archive. You can incorporate the archive into a Hadoop® mapreduce job by passing it as a payload argument to job submitted to a Hadoop cluster.

Open the Hadoop Compiler App

  • MATLAB Toolstrip: On the Apps tab, under Application Deployment, click the app icon.

  • MATLAB command prompt: Enter hadoopCompiler.

Parameters

Function for the mapper, specified as a character vector.

Function for the reducer, specified as a character vector.

A file containing a datastore representing the data to be processed, specified as a character vector.

In most cases, you will start off by working on a small sample dataset residing on a local machine that is representative of the actual dataset on the cluster. This sample dataset has the same structure and variables as the actual dataset on the cluster. By creating a datastore object to the dataset residing on your local machine you are taking a snapshot of that structure. By having access to this datastore object, a Hadoop job executing on the cluster will know how to access and process the actual dataset residing on HDFS™.

Format of output from Hadoop mapreduce job, specified as a keyvalue or tabular text.

Additional parameters to configure how Hadoop executes the job, specified as a character vector. For more information, see Configuration File for Creating Deployable Archive Using the mcc Command.

Files that must be included with generated artifacts, specified as a list of files.

Settings

Flags controlling the behavior of the compiler, specified as a character vector.

Folder where files for testing are stored, specified as a character vector.

Folder where generated artifacts are stored, specified as a character vector.

Compatibility Considerations

expand all

Not recommended starting in R2020a

Introduced in R2014b