MATLAB with Hadoop and Spark
MATLAB leverages the computational power of distributed clusters to perform efficient computations on big data for machine learning and data mining algorithms, with applications ranging from predictive maintenance to event detection.
Algorithms prototyped in MATLAB on small sets of representative data on your favorite workstation can then be scaled to run on your entire dataset located in HDFS, Hive, Amazon S3, or Azure Blob in production without re-implementing.
Running MATLAB on Hadoop
Running MATLAB on Hadoop involves configuring MathWorks products on compute and edge nodes of the cluster and submitting Spark or Map Reduce jobs to the cluster via the YARN scheduler. MathWorks Consultants guide you through installation and commissioning, providing installation, test scripts, and documentation which allow you to repeat the process as you grow the cluster in size.
To extract the most value out of your data on big data platforms, careful consideration is needed as to how data should be stored and extracted for analysis in MATLAB. MathWorks Consultants work with your architects to define a data strategy for ingesting and loading data onto a Hadoop-enabled cluster, advising on appropriate file formats and extraction patterns. In addition, MathWorks Consultants help you meet additional non-functional requirements around your analysis and data processing, such as logging, handling error conditions, and recording data provenance.
A proof-of-concept application or skeleton framework can be invaluable for evaluating the benefit of a big data solution. MathWorks Consultants can work with your code and refactor it to scale for parallel computation, and for executing efficiently on a Hadoop cluster. Once the code has been optimized, MathWorks Consultants come onsite to demonstrate and test the application, and provide a thorough handover of the code and the techniques used, for the application to be used as a reference or starting point for you to elaborate and customize.
MathWorks Consultants can create a set of examples from your data, to demonstrate how MATLAB's tall array and map-reduce capabilities can be used to solve representative data analysis problems in your organization. Consultants can deliver onsite workshops to give engineers and data scientists hands-on experience with step-by-step and in-depth exercises.
MathWorks Consultants help you to:
- Install and commission MathWorks products on a Hadoop enabled cluster, including those using Kerberos authentication.
- Work with teams using MathWorks and IT architects on solutions that work within your existing Enterprise System architecture.
- Advise on a strategy to extract value from data using MATLAB's tall array and map-reduce capabilities.
- Demonstrate how to move from prototype algorithm development to full-scale deployment by working with your data to create reference applications and hands-on exercises.
- Build in-house competency and accelerate your development efforts through project-based coaching sessions and knowledge transfer.
Find additional resources here:
Rory Adams is a senior consultant engineer specializing in data analysis, software development, and application deployment. He works with customers to understand and resolve their technical and business challenges with a focus on mathematical modeling, application development, parallel computing, and physical modeling. Rory holds a PhD in theoretical physics and an M.Sc. in applied mathematics from the University of Cape Town, South Africa.