Learnings and insights: Nine months of AI Job stats powered by Theta Edge Network and Lavita community
By: Pedram Hosseini, AI Lead
It’s been almost nine months since we first launched Lavita AI Jobs. We thought it’s a good time to update everyone on the progress we made, thanks to the great support of our amazing community, over the past couple of months, the lessons we learned, and the path forward.
Lavita AI Jobs started with a mission to enable the engagement of our community to build Lavita with us. At a time when compute is so valuable and even referred to by some as the currency of the future, we thought our users could start helping Lavita by contributing their computational resources and get rewarded in return. This would eventually help us build and evaluate AI models at a larger scale — models that will be safely embedded in different parts of Lavita’s AI pipeline once they reach an acceptable and reliable performance.
We started by defining a simple task: fine-tuning a smaller-sized large language model (LLM) for Question Answering — an important task with many downstream applications, such as building Medical AI Assistants. We chose BioBERT [1], a pre-trained biomedical LLM, as our base model and created a small dataset of medical questions and answers based on a random sample from the MedQA [2] dataset with train and dev splits. The task is to fine-tune the model with a multiple-choice question answering head on top using the train split and evaluate it on the dev split. The performance metric is accuracy, meaning to count how many medical questions a fine-tuned model can answer correctly. We used the Hugging Face’s transformer library to implement our fine-tuning pipeline and hosted the dataset on the Hugging Face Datasets hub.
It’s wonderful to see that our dataset has been on the list of the top 5 most downloaded datasets on the Hugging Face Datasets hub (link), which shows the amazing engagement of our Theta Network and Lavita community running AI Jobs.
AI Jobs Statistics
Since the launch of Lavita AI Jobs, we’ve had approximately 2.5 million jobs completed in the period from 07–01–2023 to 04–01–2024. There have been an average of approximately 30k job submissions per month. Currently, each user can submit a job every 5 hours. The following charts show the cumulative number of submitted jobs and jobs per month over this period.
These charts demonstrate a constant increase in the number of job submissions. We also took a closer look at the job submissions per day to see the trends, find patterns, or investigate potential failures/issues. The following plot shows Lavita AI job submissions per day.
We looked at when we had job submission failure points. In January 2024, we temporarily had an issue with our cloud access tokens, resulting in users not being able to upload their fine-tuned models. This issue was fixed, and job submissions returned to normal. We’re also implementing better mechanisms for sharing access tokens to avoid similar issues in the future. On two other occasions, February 11, 2024, and March 28, 2024, we also had a reduced number of job submissions. We dug a little deeper to see why this happened. This issue was mostly associated with edge nodes with Windows OS and could be related to some updates or issues that our pipeline had with the Hugging Face datasets library, which we use to download the training/evaluation data. We have been collecting logs from users who encountered this issue so that we can implement proper mechanisms to address these problems.
Edge Nodes Statistics
We’ve had 28,188 unique edge nodes with completed AI job submissions. As the following plot shows, the majority of our edge nodes are running jobs on Windows OS. Linux and Darwin (the core Unix operating system of macOS) are the other two most frequent operating systems among edge nodes.
The Road Ahead
Our journey over the past nine months has been remarkable, and we’re encouraged to see great support from our community with nearly 2.5 million AI Jobs completed. Looking ahead and envisioning the next steps, we see a couple of paths forward. First, considering that the majority of our edge nodes operate on the Windows OS, we encourage our Theta Network and Lavita community to participate with CUDA-enabled GPUs to earn even more rewards and help us power the development and evaluation of various models. Here are the step-by-step instructions to enable CUDA. Second, Question Answering was just the beginning, and Lavita plans to expand the AI Jobs to a wider range of tasks in future steps, not only for model training and development but also for testing and evaluation, which are equally, if not more, important. Once again, we extend our gratitude to our great community for their unwavering support and active participation. Stay tuned for more developments!
References
[1] Lee, Jinhyuk, et al. “BioBERT: a pre-trained biomedical language representation model for biomedical text mining.” Bioinformatics 36.4 (2020): 1234–1240.
[2] Jin, Di, et al. “What disease does this patient have? a large-scale open domain question answering dataset from medical exams.” Applied Sciences 11.14 (2021): 6421.