Why does Matlab work with any spatial input dimensions in an ONNX network, but other programming languages do not?
Show older comments
Hello,
I work with an ONNX network to segment nerves in biological volume images. The model works with a 2.5-dimensional approach, i.e., to detect nerves in an image, additional information from 4 images above and below the target image is used from the volume. In other words, a stack of 9 images is used to segment the nerves in the middle image.
The images in our volumes often have more than 2000x2000 pixels. The model was trained by an external person and 512x512x9 blocks (spatial, spatial, channel) were used. This person told us that the input must also be this size. So, we would first have to split our images into 512x512 tiles, segment them with the ONNX model, and then recombine them.
However, this procedure usually results in artifacts at the edges of the tiles in the nerve segmentation.
When I tried a few things to solve this problem, I once accidentally gave the model a 1024x512x9 block as input. Surprisingly, this did not result in an error message and an output with segmented nerves was generated. It also didn't look like two blocks were created automatically because no artifact was in the middle.
Then, I also tried it on the full-size images, and it worked without any problems. However, I get an error if I use blocks with a channel size other than 9 as input.
It also doesn't look like an automatic resize, as some nerve structures are way too fine and would be lost when resizing from full size to 512x512.
We have also tested this ONNX model in Mathematica and Python. But we get an error in both if we deviate from the 512x512.
I'm also not sure if it's this ONNX model specifically, because even in the Matlab example from LINK with the peppers, I don't get any errors if I don't resize the image. (Funny note: If you resize the pepper image to 1024x1024, the predicted label will be "velvet" instead of "bell pepper.")
So my question is: What is Matlab doing there? Or is there any possibility to look at what is going on there?
Accepted Answer
More Answers (1)
Alex Taylor
on 21 Sep 2024
Edited: Alex Taylor
on 21 Sep 2024
2 votes
To add to @Conor Daly's answer, I just want to note that in the original https://arxiv.org/pdf/1505.04597 U-net paper, a tiled training, full-szed strategy was described. In the paper, they describe using multiple inference calls and a "valid" convolution strategy to avoid the introduction of seams at tile boundaries. This allows U-net to perform segmentation on arbitrarily large images.
Because FCNs like U-net are defined entirely with convolution and elementwise operations, you can also do what you described with your segmentation network where you do the entire inference in one inference call because the convolution and elementwise operations are well-defined on a larger spatial domain. The only restriction for FCNs with segmentation computationally is whether the entire inference call fits in CPU/GPU memory at one time. Otherwise you need multiple inference calls like you described.
Tiled training tends to work best when a local amount of information is sufficient to define the segmentation (e.g. cell segmenation in microscopy) as opposed to situations where knowledge of the global scene during training is useful (e.g. driving the sky is generally up, road is generally closer to the bottom of the frame).
It sounds to me like in your segmentation network it may be valid to rely on a tiled training, full-sized inference approach and I just wanted to note that this kind of workflow is well known and often useful.
Categories
Find more on Deep Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!