Data Labeling Quality Control
Last updated
Last updated
There are a number of processes that we can use to perform quality assurance, depending on the type of data.
After discussing to the Domain Expert, we will coordinate with Vulcan heroes to organise a Knowledge Sharing Session, to make sure all Vulcan heroes are well acquaint in the domain.
Annotation Consensus is a repeated annotation of the exactly same data by two or more Vulcan heroes. Labels are considered valid when its confidence factor surpasses criteria prefered by the domain expert. The confidence factor depends on the number of Vulcan heroes that are assigned to the task. If more Vulcan heroes reach the same consensus, the confidence value of the data set will be accordingly high. Invalid labels will be held out for subsequence investigation by the domain experts.
In this scenario, every Vulcan hero will receive a ground truth data set randomly-unbiasedly inserted during the labeling of the data to test how well each Vulcan hero performs labeling on the data set. The result from the ground truth set will be used to measure quality of the labeling, and also provide performance feedback to the Vulcan heroes. The ground truth data set is designed in conjunction with the client's domain experts.
The data is divided into parts, with the first part of the labeled data will be sent to the domain experts for feedback. Then, we will take that feedback as a guideline to improve the labeling so that the data set works to serve the expectations of the domain experts.
After the data set is completely labeled, we will develop an AI to detect the unusual labeling in the data set via unsupervised learning. Though the actual process will be different by the type of the data, roughly we will categorize data into clusters, so that the data with the same labels are supposed to group together. Having very few data sets separate from others could be due to mislabeling or peculiarity, either case will trigger a deeper investigate from our scientists.
This is a way to test whether labels from each Vulcan heroes is consistent or not, we will train an AI model from the labels by each Vulcan heroes and then cross-check an AI model with those from the other Vulcan heroes. If any model has lower performance than others we will re-examine that respective labels.
From the figure, it can be seen that the AI models from Vulcan hero B and C are performance-consistent; but when tested with labels from Vulcan hero A both of these models has exceptionally low performance and A's model was poorly performing when tested with B and C labels, so it can be concluded that A's labels is different from the others which will be subjected to investigation.
We use AI to assist the Vulcan hero to make work easier. Generally, we can perform active machine learning so that AI will help to pre-label the data set.
Active machine learning can be done by having the Vulcan hero labelling about 10% of the total data set and using this data set to develop a basic AI and let this AI predicts the rest of the data set. So what we get from the AI are the labels of the rest of the data and the confidence of that label, then we can prioritise the data set beginning with low confidence.
In addition, we can also use trained AI with similar data sets to introduce labels to Vulcan heroes to help Vulcan heroes work easier together with improving their work efficiency.