Ground truth is the term that describes real word data used to train and test AI model outputs. Ground truth data is required for many AI applications, including automated driving and audio or speech recognition.
Ground truth data is essential for two stages in AI algorithm development:
- Model training: Ground truth data is used as training data, where the algorithm learns which features and solutions are appropriate for the specific application
- Model testing: Ground truth data is used as test data, where the trained algorithm is tested for model accuracy
Ground truth data can come in many forms: image data, signal data, or text data (Figure 1). Manually obtaining ground truth data can be time consuming, and MATLAB® can expedite the process through labeler apps for image, signal, audio, and lidar applications.
How to Obtain Ground Truth Data
Ground truth labeling is required to generate ground truth data. Labeling is the process of assigning raw data with labels that characterize what that data means. The labeled output is required to train a supervised learning model. More accurate labeling results in a more accurate model. Manual labeling of ground truth data can be time consuming because many AI models require thousands or millions of labeled data outputs to generate accurate results.
The following labeler apps from MATLAB provide options to fully automate or semi-automate the labeling process, reducing the time required by manual labeling.
Image Labeler will help to label regions of interest in images, including pixel labeling for semantic segmentation and bounding boxes for object detection workflows.
Using Signal Labeler, you can explore data, label attributes, regions of interest, and points through visualization and custom functions.
Lidar Labeler can create bounding boxes around 3D objects, and provide automation techniques for clustering, ground plane removal, and tracking of point cloud data.