Main Content

Workflow to Incorporate MATLAB Map and Reduce Functions into a Hadoop Job

  1. Write mapper and reducer functions in MATLAB®.

  2. Create a MAT-file that contains a datastore that describes the structure of the data and the names of the variables to analyze. The datastore in the MAT-file can be created from a test data set that is representative of the actual data set.

  3. Create a text file that contains Hadoop® settings such as the name of the mapper, reducer, and the type of data being analyzed.

  4. Use the mcc command to package the components into a deployable archive. Thiss generates a deployable archive (.ctf file) that can be incorporated into a Hadoop mapreduce job.

  5. Incorporate the deployable archive into a Hadoop mapreduce job using the hadoop command and syntax.

    Execution Signature

    Key

    LetterDescription
    AHadoop command
    BJAR option
    CThe standard name of the JAR file. All applications have the same JAR: mwmapreduce.jar.The path to the JAR is also fixed relative to the MATLAB Runtime location.
    DThe standard name of the driver. All applications have the same driver name: MWMapReduceDriver
    EA generic option specifying the MATLAB Runtime location as a key-value pair.
    FDeployable archive (.ctf file) generated by the mcc is passed as a payload argument to the job.
    GLocation of input files on HDFS™.
    HLocation on HDFS where output can be written.

To simplify the inclusion of the deployable archive (.ctf file) into a Hadoop mapreduce job, the mcc command generates a shell script alongside the deployable archive. The shell script has the following naming convention: run_<deployableArchiveName>.sh

To run the deployable archive using the shell script, use the following syntax:

See Also

Topics