Main Content

TallDatastore

Datastore for checkpointing tall arrays

Description

TallDatastore objects are for recreating tall arrays from binary files written to disk by the write function. You can use the object to recreate the original tall array, or you can access and manage the data by specifying TallDataStore properties and using the object functions.

Creation

Create TallDatastore objects using the datastore function. For example, tds = datastore(location,"Type","tall") creates a datastore from a collection of files specified by location.

Properties

expand all

Files included in the datastore, resolved as a character vector, cell array of character vectors, string scalar, or string array, where each character vector or string is a full path to a file.

The location argument of the datastore function defines the Files property when the datastore is created. The location argument contains full paths to files on a local file system, a network file system, or a supported remote location such as Amazon S3™, Windows Azure® Blob Storage, and HDFS™. For more information, see Work with Remote Data.

The files must be either MAT-files or Sequence files generated by the write function.

Example: ["C:\dir\data\file1.ext";"C:\dir\data\file2.ext"]

Example: ["s3://bucketname/path_to_files/your_file01.ext";"s3://bucketname/path_to_files/your_file02.ext"]

Data Types: char | cell | string

File type, specified as either "mat" for MAT-files or "seq" for sequence files. By default, the type of file in the provided location determines the FileType.

Data Types: char | string

Maximum number of data rows to read in a call to the read or preview functions, specified as a positive integer. When the datastore function creates a TallDatastore, it determines and assigns the best possible value for ReadSize.

Alternate file system root paths, specified as the name-value argument consisting of "AlternateFileSystemRoots" and a string vector or a cell array. Use "AlternateFileSystemRoots" when you create a datastore on a local machine, but need to access and process the data on another machine (possibly of a different operating system). Also, when processing data using the Parallel Computing Toolbox™ and the MATLAB® Parallel Server™, and the data is stored on your local machines with a copy of the data available on different platform cloud or cluster machines, you must use "AlternateFileSystemRoots" to associate the root paths.

  • To associate a set of root paths that are equivalent to one another, specify "AlternateFileSystemRoots" as a string vector. For example,

    ["Z:\datasets","/mynetwork/datasets"]

  • To associate multiple sets of root paths that are equivalent for the datastore, specify "AlternateFileSystemRoots" as a cell array containing multiple rows where each row represents a set of equivalent root paths. Specify each row in the cell array as either a string vector or a cell array of character vectors. For example:

    • Specify "AlternateFileSystemRoots" as a cell array of string vectors.

      {["Z:\datasets", "/mynetwork/datasets"];...
       ["Y:\datasets", "/mynetwork2/datasets","S:\datasets"]}

    • Alternatively, specify "AlternateFileSystemRoots" as a cell array of cell array of character vectors.

      {{'Z:\datasets','/mynetwork/datasets'};...
       {'Y:\datasets', '/mynetwork2/datasets','S:\datasets'}}

The value of "AlternateFileSystemRoots" must satisfy these conditions:

  • Contains one or more rows, where each row specifies a set of equivalent root paths.

  • Each row specifies multiple root paths and each root path must contain at least two characters.

  • Root paths are unique and are not subfolders of one another.

  • Contains at least one root path entry that points to the location of the files.

For more information, see Set Up Datastore for Processing on Different Machines or Clusters.

Example: ["Z:\datasets","/mynetwork/datasets"]

Data Types: string | cell

Object Functions

hasdataDetermine if data is available to read
numpartitionsNumber of datastore partitions
partitionPartition a datastore
previewPreview subset of data in datastore
readRead data in datastore
readallRead all data in datastore
resetReset datastore to initial state
transformTransform datastore
combineCombine data from multiple datastores
isPartitionableDetermine whether datastore is partitionable
isSubsettableDetermine whether datastore is subsettable
isShuffleableDetermine whether datastore is shuffleable

Examples

collapse all

Use TallDatastore objects to reconstruct tall arrays directly from files on disk rather than re-executing all of the commands that produced the tall array. Create a tall array and save it to disk using write function. Retrieve the tall array using datastore and then convert it back to tall.

Create a simple tall double.

t = tall(rand(500,1))
t =

  500×1 tall double column vector

    0.8147
    0.9058
    0.1270
    0.9134
    0.6324
    0.0975
    0.2785
    0.5469
      :
      :

Save the results to a new folder named Example_Folder.

location = fullfile(matlabroot,"toolbox","matlab","demos","Folder1");
write(location, t);
Writing tall data to folder H:\matlab\toolbox\matlab\demos\Folder1
Evaluating tall expression using the Local MATLAB Session:
- Pass 1 of 1: Completed in 0.063 sec
Evaluation completed in 0.16 sec

To recover the tall array that was written to disk, first create a new datastore that references the same directory. Then convert the datastore into a tall array.

tds = datastore(location);
t1 = tall(tds)
t1 =

  M×1 tall double column vector

    0.8147
    0.9058
    0.1270
    0.9134
    0.6324
    0.0975
    0.2785
    0.5469
      :
      :

Version History

Introduced in R2016b