Proof of Data Quality

In Bitcoin mining, Proof of Work (PoW) is used to validate transactions and mine new Bitcoin. Technically, it is a decentralized consensus mechanism that requires network members to spend time solving a random mathematical problem in order to prevent members from manipulating the system.

Similar concept to Bitcoin's Proof of Work, Vulcan's Proof of Data Quality (PoDQ) is a process that requires Vulcan heroes to spend time organizing, managing, and labeling raw data sets and produce high-quality labeled data sets. Once the data set has been validated, a Vulcan token will be minted and distributed to Vulcan heroes according to the quantity and quality of the data set they produced.

The more labeled data sets Vulcan heroes produce, the more efficient Vulcan's AI models can be. With higher quality of Vulcan's AI models, the more revenue Vulcan can generate. On the other hand, we can state that the amount of Vulcan tokens being minted will theoretically correlate to the quality of Vulcan's AI model and the company's revenue.

Challenges

One of the most difficult aspects of PoDQ is that there are so many different types of data sets, ranging from image data sets to speech data sets to video data sets (and much more). Different data sets take varying amounts of time and effort to label. We designed the PoDQ to ensure that all Vulcan heroes are fairly rewarded with Vulcan tokens based on the amount and quality of work done.

QT Score - Data quantity evaluation

Visually impaired workforces labeling speech data sets for the Text to Speech model can produce approximately 30 data in one hour, whereas hearing impaired workforces labeling image data sets for the Autonomous driving vehicle model may produce approximately 2 data in one hour. We use the process called “Contribution Evaluation” as a mechanism for calculating ‘QT score (quantity score)’ for data they generate.

Please read “On the contribution evaluation and credit distribution of a data labeling framework” for a mathematical explanation of the calculation.

42KB

On the Contribution Evaluation and Credit Distribution of a Data Labeling Framework.pdf

pdf

We begin with the monthly validation procedure, which removes unqualified data from the calculation. This process needed be done in order to avoid Vulcan heroes from producing shoddy data sets intentionally.

After slashing, we'll have verified data sets from which to determine each Vulcan hero's QT score.

When we finish calculating the QT score for data quantity for each Vulcan hero, and at the end of each month, all Vulcan heroes will be rewarded 50% of the Vulcan tokens minted that month. The amount of tokens is proportional to the QT score each Vulcan hero gets in that month.

For example, in Jan 2023,

Vulcan protocol mints 1,000,000 Vulcan tokens.
We have total 10,000 Vulcan heroes.
QT Score of Mr. A is 2.47.
Mr. A will be rewarded 50% x (2.47 x (1,000,000/10,000)) = 123.5 Vulcan tokens.
QT Score of Ms. B is 0.72.
Ms. B will be rewarded 50% x (0.72 x (1,000,000/10,000)) = 36 Vulcan tokens.
In Jan 2023, the total number of tokens distributed to all Vulcan heroes will be 500,000 tokens.
Another 500,000 tokens will be held in Vulcan QL reserves for distribution during the Data quality evaluation phase.

QL Score - Data quality evaluation

Once 50% of Vulcan tokens have been allocated to all Vulcan heroes in a given month, the remaining 50% which is kept in QL reserve will be distributed to each Vulcan hero depending on quality of data they produced, which will be evaluated on a periodic basis (from one month to one year depending on the evaluation process of different AI model) using “Incentive Computation” as a mechanism.

Please read “On the contribution evaluation and credit distribution of a data labeling framework” for a mathematical explanation of the calculation.

42KB

On the Contribution Evaluation and Credit Distribution of a Data Labeling Framework.pdf

pdf

For example, in Jan-Mar 2023,

Vulcan protocol mints 1,000,000 Vulcan tokens.
Every 3 months, the Text-to-Speech data set will be evaluated QL score.
We have total 3,000 Vulcan heroes who produce Text-to-Speech data set between Jan - March 2023.
Mr. A and Ms. B is in this group.
QL Score of Mr. A is 0.89 Text-to-Speech data set have 0.4 weight.
Mr. A will be rewarded 50% x (0.89 x (3 x 0.4 x 1,000,000/3,000)) = 178 Vulcan tokens.
QT Score of Ms. B is 1.29.
Ms. B will be rewarded 50% x (1.29 x (3 x 0.4 x 1,000,000/3,000)) = 258 Vulcan tokens.

PreviousToken Allocation NextData Labeling Quality Control

Last updated 3 years ago