Overview of Memory-Mapping
What Is Memory-Mapping?
Memory-mapping is a mechanism that maps a portion of a file,
or an entire file, on disk to a range of addresses within an application's
address space. The application can then access files on disk in the
same way it accesses dynamic memory. This makes file reads and writes
faster in comparison with using functions such as fread
and fwrite
.
Benefits of Memory-Mapping
The principal benefits of memory-mapping are efficiency, faster file access, the ability to share memory between applications, and more efficient coding.
Faster File Access
Accessing files via memory map is faster than using I/O functions
such as fread
and fwrite
. Data are read and written using
the virtual memory capabilities that are built in to the operating
system rather than having to allocate, copy into, and then deallocate
data buffers owned by the process.
MATLAB® does not access data from the disk when the map is first constructed. It only reads or writes the file on disk when a specified part of the memory map is accessed, and then it only reads that specific part. This provides faster random access to the mapped data.
Efficiency
Mapping a file into memory allows access to data in the file as if that data had been read into an array in the application's address space. Initially, MATLAB only allocates address space for the array; it does not actually read data from the file until you access the mapped region. As a result, memory-mapped files provide a mechanism by which applications can access data segments in an extremely large file without having to read the entire file into memory first.
Efficient Coding Style
Memory-mapping in your MATLAB application enables you to
access file data using standard MATLAB indexing operations. Once
you have mapped a file to memory, you can read the contents of that
file using the same type of MATLAB statements used to read variables
from the MATLAB workspace. The contents of the mapped file appear
as if they were an array in the currently active workspace. You simply
index into this array to read or write the desired data from the file.
Therefore, you do not need explicit calls to the fread
and fwrite
functions.
In MATLAB, if x
is a memory-mapped variable,
and y
is the data to be written to a file, then
writing to the file is as simple as
x.Data = y;
Sharing Memory Between Applications
Memory-mapped files also provide a mechanism for sharing data between applications, as shown in the figure below. This is achieved by having each application map sections of the same file. You can use this feature to transfer large data sets between MATLAB and other applications.
Also, within a single application, you can map the same segment of a file more than once.
When to Use Memory-Mapping
Just how much advantage you get from mapping a file to memory depends mostly on the size and format of the file, the way in which data in the file is used, and the computer platform you are using.
When Memory-Mapping Is Most Useful
Memory-mapping works best with binary files, and in the following scenarios:
For large files that you want to access randomly one or more times
For small files that you want to read into memory once and access frequently
For data that you want to share between applications
When you want to work with data in a file as if it were a MATLAB array
When the Advantage Is Less Significant
The following types of files do not fully use the benefits of memory-mapping:
Formatted binary files like HDF or TIFF that require customized readers are not good for memory-mapping. Describing the data contained in these files can be a very complex task. Also, you cannot access data directly from the mapped segment, but must instead create arrays to hold the data.
Text or ASCII files require that you convert the text in the mapped region to an appropriate type for the data to be meaningful. This takes up additional address space.
Files that are larger than several hundred megabytes in size consume a significant amount of the virtual address space needed by MATLAB to process your program. Mapping files of this size may result in MATLAB reporting out-of-memory errors more often. This is more likely if MATLAB has been running for some time, or if the memory used by MATLAB becomes fragmented.
Maximum Size of a Memory Map
Due to limits set by the operating system and MATLAB, the maximum amount of data you can map with a single instance of a memory map is 2 gigabytes on 32-bit systems, and 256 terabytes on 64-bit systems. If you need to map more than this limit, you can either create separate maps for different regions of the file, or you can move the window of one map to different locations in the file.
Byte Ordering
Memory-mapping works only with data that have the same byte
ordering scheme as the native byte ordering of your operating system.
For example, because both Linus Torvalds' Linux® and Microsoft® Windows® systems
use little-endian byte ordering, data created on a Linux system
can be read on Windows systems. You can use the computer
function to determine the native
byte ordering of your current system.