So I am training a patch-based CNN (with patches cropped out of images) and for now, let's assume that every minibatch (size = 6) being fed to the network looks like this :
[patch1_image1.png, patch2_image1.png, patch1_image2.png, patch2_image2.png, patch1_image3.png, patch2_image3.png]
where patch1_image1.png and patch2_image1.png belong to image 1,
patch1_image2.png and patch2_image2.png belong to image 2, etc.
My goal is to average the softmax scores of all the patches belonging to the same image (i.e. average the scores of patch1_image1.png and patch2_image1.png together, and so on and so forth) and get a new label prediction based on this new softmax average. I was able to do this "manually" after training the network, which gave really promising results. However, I would like to implement this during the training process and get a new "fused" prediction at each iteration. Also, I don't know if I can apply this to my validation images too. Any thoughts/comments? Should I add a new average pooling layer right before the last output layer or is there another way to do this?
Thank you very much! :)