Transfer Data To Amazon S3 Buckets and Access Data Using MATLAB Datastore
To work with data in the cloud, you can upload to Amazon S3, then use datastores to access the data in S3 from the workers in your cluster.
For efficient file transfers to and from Amazon S3, download and install the AWS Command Line Interface tool from https://aws.amazon.com/cli/.
Specify your AWS Access Key ID, Secret Access Key, and Region (or Session Token if you are using an AWS temporary token) of the bucket as system environment variables.
For example, on Linux, macOS, or Unix with Bourne-based shell:If you are using a C-based shell, replace
export AWS_ACCESS_KEY_ID="YOUR_AWS_ACCESS_KEY_ID" export AWS_SECRET_ACCESS_KEY="YOUR_AWS_SECRET_ACCESS_KEY" export AWS_DEFAULT_REGION="us-east-1"
setenvin the command above.
set AWS_ACCESS_KEY_ID="YOUR_AWS_ACCESS_KEY_ID" set AWS_SECRET_ACCESS_KEY="YOUR_AWS_SECRET_ACCESS_KEY" set AWS_DEFAULT_REGION="us-east-1"
To permanently set these environment variables, set them in your user or system environment.
For MATLAB® releases prior to R2020a, use
Create a bucket for your data. Either use the AWS S3 web page or a command like the following:
aws s3 mb s3://mynewbucket
Upload your data using a command like the following:For example:
aws s3 cp mylocaldatapath s3://mynewbucket --recursive
aws s3 cp path/to/cifar10/in/the/local/machine s3://MyExampleCloudData/cifar10/ --recursive
After creating a cloud cluster, to copy your AWS credentials to your cluster workers, in MATLAB, select Parallel > Create and Manage Clusters. In the Cluster Profile Manager, select your cloud cluster profile. Scroll to the
EnvironmentVariablesproperty and add (environment variable name only) AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION. If you are using AWS temporary credentials, also add AWS_SESSION_TOKEN. Note that you should set the value of the AWS environment variables in the shell before starting the MATLAB session or set them directly in MATLAB using the
setenvcommand before using the cluster.
After you store your data in Amazon S3, you can use datastores to access the data from
your cluster workers. Simply create a datastore pointing to the URL of the S3 bucket. For
example, the following sample code shows using an
access an S3 bucket. Replace
's3://MyExampleCloudData/cifar10' with the URL
of your S3
imds = imageDatastore('s3://MyExampleCloudData/cifar10',... 'IncludeSubfolders',true, ... 'LabelSource','foldernames');
imageDatastoreto read data from the cloud in your desktop client MATLAB, or when running code on your cluster workers, without changing your code. For details, see Work with Remote Data (MATLAB).
For a step-by-step example showing deep learning using data stored in Amazon S3, see the white paper Deep Learning with MATLAB and Multiple GPUs.
- Copy Data from Amazon S3 Account to Your Cluster (Cloud Integrations)
- Transfer Data with Job Methods and Properties (Cloud Integrations)
- Download SSH Key Identity File (Cloud Integrations)
- Transfer Data with Standard Utilities (Cloud Integrations)
- Retrieve Data from Persisted Storage Without Starting a Cluster (Cloud Integrations)