Running more parallel lightweight threads/pr​ocesses/fu​nctions than available workers

12 views (last 30 days)
Parallel computing question: What architectural approaches are available in MATLAB when the number of desired parallel processes exceeds the number of workers allowed in a parallel pool? I'd like to setup a flexible set of signal processing chains for post processing raw radar data (no real-time requirements). I have multiple streams (channels) of incoming raw data packetized into messages by each radar transmission event. Each stream needs to pass through a multi-step signal processing chain where the output of one step feeds the input of one or more others. Each source stream needs to go through a different progression of steps. Some processing steps need to collect input messages from multiple sources. I envision a config file defining all the processors that are needed and identify which source streams they need as input and which output streams they will produce. This is a classic publish/subscribe network. Perhaps there is a master process the instantiates and registers the processors and passes the messages between them. I also envision the processors are continuously running, waiting for a next message to process. One reason for this approach is that some steps may require multiple messages from one or more source streams before they can produce an output message (simplistic example would be a sliding median over N-events). Access to a stream's history could enable this, but would be inefficient. Perhaps a class object with storage and the processor as a method would work, or a function with persistent memory. The problem I'm having is understanding how to get this to work within the confines of MATLAB's parallel processing construct when the total number of signal processor elements is far greater than the number of workers allowed in a parallel pool. They are individually lightweight processes, but I need a lot of them running in parallel and preferrably utilizing the resources of my computer efficiently. Broad stroke solution approaches are welcome.

Answers (1)

Nithin
Nithin on 31 Oct 2025
Hi @Ray,
No, you cannot run more parallel threads, processes, or functions simultaneously than there are available workers in MATLAB's parallel pool. However, you can schedule more tasks than you have workers, and MATLAB will queue them executing as many in parallel as you have workers and running the rest as soon as workers become available.
To implement the required architecture and work within MATLAB’s constraint that only as many tasks as there are workers can run in parallel, you can use a master scheduler with "parfeval" to flexibly manage a much larger logical network of processing elements.
In this approach, you create a master process that reads your configuration file to build a dependency graph representing all processors and their required inputs and outputs. The master process maintains a queue of tasks that are ready to run, meaning all their input data is available. As workers become free, the master submits new processing tasks to the parallel pool using "parfeval", which allows you to queue far more tasks than there are workers.
Each processor can be implemented as a stateless function or a stateful object, handling any required history or multi-message logic internally. When a task completes, the master collects its output, updates the dependency graph, and checks whether any downstream processors now have all their inputs satisfied, queuing those for execution. This design efficiently utilizes available workers, dynamically schedules processing steps, and allows you to model complex publish/subscribe or DAG-style processing chains, even when the number of logical processors far exceeds the number of available parallel workers.
In summary, this approach maximizes resource utilization while supporting flexible, scalable processing chains in MATLAB. For more information, refer to the following documentation: https://www.mathworks.com/help/matlab/ref/parfeval.html
  1 Comment
Ray
Ray on 3 Nov 2025
Thanks for the thoughtful reply. If I'm understanding your solution, the "master process" would need to generate, pass, receive, and hold all of the data from all of the signal processor blocks for each iteration of their execution. So, as a silly example, say one of my processors computes a sliding average of N samples. The master process would wait until it had collected the first N outputs from the previous processor, then feed inputs 1 to N into the averaging processor (parfeval call). The master process then gets the average back (presumably to be passed on to the next process). Then, upon receiving the next desired averaging input, the master process gather inputs 2 to N+1 and feeds the averaging processor again. I'm concerned about the I/O overhead (resending N-1 inputs on every subsequent call to the averaging processor and moving the data from worker 1 to master process then to worker 2 rather than direct.). In my case each of my "N" data values isn't just a scalar but a 10000ish element vector of complex doubles. I can't use a persistent variable in the processor function to store the already-passed data because it won't be guaranteed to run on the same worker again. Passing a class object with state only masks that you are copying large amounts of data to the worker on each call. If MATLAB supported shared memory across workers on the same computer, then I can imagine light function calls that reference common memory blocks. But I don't see how to implement that.
I'm considering how to group processors of a subchain together to run on a single worker so that state full functions/objects can be used locally.
Any clarifying thoughts or other suggestions are welcome.

Sign in to comment.

Categories

Find more on Parallel for-Loops (parfor) in Help Center and File Exchange

Products


Release

R2025b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!