Limit total GPU memory usage

8 views (last 30 days)
Jesse Ziser
Jesse Ziser on 23 Jun 2017
Edited: Joss Knight on 6 Jul 2017
I'm using the Matlab Parallel Toolbox to do work on GPUs. I need to find a way to place a limit on the total GPU memory usage of a Matlab process.
I know about feature('GpuAllocPoolSizeKb', X) and that's what I'm using for the time being, but that only limits the pool size, which means I have to know the maximum non-pool usage of my Matlab code ahead of time in order to figure out what pool size to choose in order to limit my total (pool + non-pool) usage to a specific number. This maximum non-pool usage turns out to be very difficult to estimate beforehand.
The reason I need to limit my GPU memory usage is that I am running many simultaneous Matlab processes on a large multicore server with many GPUs. Without limiting the GPU memory usage, the Matlab processes on one GPU all compete with one another, and since Matlab uses "lazy" garbage collection, it doesn't take long before a few processes squeeze out another process and cause it to run out of memory and crash. This is completely unnecessary since most of the memory they are taking up is actually no longer in use and just represents freed allocations that have not yet been garbage collected.
My group is trying to determine whether the Parallel Toolbox will be a good purchase for other groups in our lab too, and this issue is overwhelmingly our biggest headache so far. It has caused a great deal of frustration. Hopefully some better solution is possible!
Thank you

Answers (1)

Joss Knight
Joss Knight on 23 Jun 2017
Since several releases ago MATLAB releases variables as soon as they go out of scope, are overwritten, cleared (using clear or clearvars), or set to empty. Even if you have MATLAB R2015a or earlier, you should still find that overwriting or setting gpuArray variables to empty will aggressively release GPU memory (back to the pool, or back to the system if the pool is full). I'd be interested to see any examples you have of gpuArrays not being released when they are no longer being referenced.
There is no trivial way to restrict the GPU memory available to each MATLAB process. My best suggestion would be to write a gpuArray allocator class that keeps track of all the allocations. However, for task parallel work you may find it's just as easy to make your applications fault tolerant to GPU memory shortages. So for instance, you might catch parallel:gpu:array:OOM errors and handle them (perhaps by waiting for memory to come available).
It's not usual to configure a machine to allow multiple processes to share the same GPU so this is why I don't have a better answer for you. The 'normal' approach would be to use the NVIDIA tools to restrict GPU access to a single process per node, and divide your cluster or your tasks appropriately; or you can put the GPU in exclusive mode and force other MATLABs to wait for the GPU to come available if they need it.
The point is that not only does CUDA provide no means of partitioning memory on a per-process basis, it also has a single process execution model; unless you're using NVIDIA's multi-process server all GPU kernels from different processes are launched sequentially. Even if you are using MPS, the GPU is not really efficient unless it is being fully utilised, so in a typical scenario work from each process will always be serialised. This isn't really anything to do with MATLAB, there just aren't anything more than very crude system tools to help with this. Although when a single client MATLAB is in control of a whole node there are some places we could do better at dividing up work - suggestions welcome!
The word as I understand it from NVIDIA is that they don't have any kind of virtualization technology for compute similar to what they have for graphics. I'm seeing this kind of thing more often though, so maybe they will start to think about how to do better in that regard.
  4 Comments
Joss Knight
Joss Knight on 6 Jul 2017
Okay, well, there's too much here to answer well in a MATLAB Answers thread. I suggest you contact support and we can look at doing an investigation into your problem.
I think I understand what you mean by 'garbage collection' - you mean MATLAB releasing its pooled GPU memory. Well, this happens whenever you call reset(gpuDevice), or deselect and reselect your GPU e.g. by gpuDevice([]); gpuDevice(1);. Perhaps that will help you. You should be able to safely change the pool size when you've done this. Personally, however, I'd just keep the pool size low and let the processes do more raw allocations, rather than trying to tune it dynamically.
But are you certain the pool is the fundamental problem here? If enough processes try to run a GPU operation at the same time and the data size is sufficient to make using the GPU worthwhile, then you can get out-of-memory regardless. Even writing your own CUDA, even using Unified Memory you couldn't avoid that eventuality. You can't prevent one process requesting memory before another has released it.
To say you get a speed-up running GPU operations simultaneously on multiple MATLAB processes is difficult to respond to without knowing what you're doing. A lot of the GPU functions do a significant amount of work on the CPU, so maybe the benefit is really coming from CPU parallelism.
Joss Knight
Joss Knight on 6 Jul 2017
Edited: Joss Knight on 6 Jul 2017
I don't quite understand your last point about having one MATLAB process be in charge of a whole node. If you had multiple GPUs then you could assign each worker to a different one and they wouldn't interfere. What I was getting at was, admittedly coming from a place of complete ignorance of how your environment is managed, is having two completely separate clusters, one with no GPUs and one with, sharing resources and with the latter only having one worker per node. A user wanting to run GPU code would have to request workers from the GPU-enabled cluster.
Your parfor issue doesn't sound unlikely when your workers are all sharing a GPU. The fundamental answer to your question of the "right way" to parallelize GPU computation in MATLAB is, at the moment, to author highly vectorized data parallel MATLAB code that ensures the GPU is continuously occupied and fully utilized from the host MATLAB. Doing this from multiple processes isn't something that any environment, MATLAB or otherwise, supports well. I'm hoping that NVIDIA will eventually provide virtualization environments for compute similar to what they do graphics, restricting each process's access to GPU resources. This is something we are actively interested in ourselves so I hope we'll have better answers in the future.

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!