These are to evaluate the quality of the output of a classifier on the data set. The diagonal elements represent the number of points for which the predicted label is equal to the true label, while off-diagonal elements are those that are mislabeled by the classifier. The higher the diagonal values of the confusion matrix the better, indicating many correct predictions.
Train, Test , Validation Confusion matrices:
They uses different data for creating confusion matrix. For train confusion matrix it uses predicted values and actual values from train data. Similarly for the other confusion matrices.
You may also refer to the answer to this question: