Learnings and insights: Nine months of AI Job stats powered by Theta Edge Network and Lavita community

Lavita
5 min readApr 8, 2024

By: Pedram Hosseini, AI Lead

It’s been almost nine months since we first launched Lavita AI Jobs. We thought it’s a good time to update everyone on the progress we made, thanks to the great support of our amazing community, over the past couple of months, the lessons we learned, and the path forward.

Lavita AI Jobs started to enable our community to build Lavita with us. Specifically, at a time when compute is so valuable and even referred to by some as the currency of the future, we thought our users could help Lavita by contributing their computational resources and get rewarded in return, which would help us build and test AI models that eventually will be safely embedded in different parts of Lavita’s AI pipeline once reached an acceptable and reliable performance. We started by defining a simple task, fine-tuning a smaller size large language model (LLM) for Question Answering–an important task with many downstream applications such as building Medical Assistants.

We chose BioBERT [1], a pre-trained biomedical LLM, as our base model, and created a small dataset of medical questions and answers based on a random sample from MedQA [2] dataset with train and dev splits. The task is to fine-tune the model with a multiple-choice question answering head on top using the train split and evaluate it on the dev split. The performance metric is accuracy, meaning to count how many medical questions a fine-tuned model can answer correctly. We used the Hugging Face’s transformer library to implement our fine-tuning pipeline. And we hosted the dataset on the Hugging Face Datasets hub.

Our main purpose at this stage was not necessarily to build state-of-the-art AI models, but to start engaging our users in building various models with us and build a reliable framework that could be later leveraged for developing many other tasks as well. It’s wonderful to see that our dataset has been on the list of the top 6 most downloaded datasets on Hugging Face Datasets hub (link), which shows the amazing engagement of our Theta Network and Lavita community running AI jobs.

AI Jobs Statistics

Since the launch of Lavita AI Jobs, we’ve had ~2.4 million jobs completed in the date period from 07–01–2023 to 04–01–2024. There have been an average of ~30k job submissions per month. Currently, each user can submit a job every 5 hours. The following charts show the cumulative number of submitted jobs and jobs per month over this period.

These charts demonstrate a constant increase in the number of job submissions. We also took a closer look at the job submissions per day to see the trends, find patterns, or investigate potential failures/issues. The following plot shows Lavita AI Job submissions per day.

We looked at when we had job submission failure points. In January 2024, we temporarily had an issue with our cloud access tokens, resulting in users not being able to upload their fine-tuned models. This issue was fixed and job submissions were back to normal. On two other points, February 11, 2024, and March 28, 2024, we also had a reduced number of job submissions. We dug a little deeper to see why this happened. This issue was mostly associated with edge nodes with Windows OS and could be related to some updates or issues that our pipeline had with the Hugging Face’s datasets library which we use to download the training/evaluation data. We have been collecting the logs from users who encountered this issue so that we can implement proper mechanisms to avoid such issues in the future.

Edge Nodes Statistics

We’ve had 28,188 unique edge nodes with completed AI Job submissions. As the following plot shows, the majority of our edge nodes are running jobs in Windows OS. Linux and Darwin (the core Unix operating system of macOS,) are the other two most frequent operating systems among edge nodes.

Looking ahead

As we’ve shared, our journey over the past nine months has been remarkable, with more than 2.4 million AI jobs completed. Also, considering that the majority of our edge nodes operate on Windows OS, we encourage our Theta Network and Lavita community to participate with CUDA-enabled GPUs. Here are the step-by-step instructions to enable CUDA. Not only can you earn more rewards, but you will also be able to contribute to building diverse models and framework. This will enable Lavita to develop wider tasks in future steps, enhancing our collective capability. Again, we extend our thanks to the great community for your unwavering support and active participation. Stay tuned for more opportunities!

References

[1] Lee, Jinhyuk, et al. “BioBERT: a pre-trained biomedical language representation model for biomedical text mining.” Bioinformatics 36.4 (2020): 1234–1240.

[2] Jin, Di, et al. “What disease does this patient have? a large-scale open domain question answering dataset from medical exams.” Applied Sciences 11.14 (2021): 6421.

--

--

Lavita

The first privacy-preserving genomic and health data marketplace powered by AI, blockchain and @Theta_Network. Check out for more info! lavita.ai