Registration Page, Discord, Looping Deck, Final Presentation template, Team Registration form
Basic RAG locally, Basic RAG in Google Colab
Welcome to the first Women in AI Hackathon, hosted by Zilliz and sponsored by TwelveLabs, Arize AI, OmniStack, StreamNative, AWS, and Mistral.
This repo provides all required information for the day as well as serving as the starting point for your submission. Direct any questions to Stefan Webb before the day of the hackathon and to the Discord or in-person mentors on the day.
- 8.30-9.00: Check-in, light breakfast
- 9.00-9.30: Kickoff
- 9:30-10.00: Team reveal and challenge recap
- 10.00: Let the Hacking Begin!
- 12.00-13.00: Lunch and speakers
- 13.00-17.30: More Hacking!
- 17.30: Hard submission and code freeze
- 17.30-18.00: Work on presentations
- 18.00-19.30: Showcase your project
- 19.30-20.00: Judges award prizes
There a couple of items we recommend completing in advance of the hackathon:
If you have not already, set up a GitHub account plus the necessary Git tooling on your system. Also, join the Discord server, for the hackathon and introduce yourself.
Clone this repo and set up your development environment. Your environment must allow you to develop a solution within the constraints of the prompt, that is, developing a RAG application in Python using Milvus or Zilliz Cloud.
We recommend:
Please confirm that you can run the starter notebooks on your platform:
You may also wish to confirm that you can start and use a Milvus Standalone deployment locally and access the free-tier of Zilliz Cloud.
We recommend downloading in advance any datasets you wish to explore with your teammates to save time and reduce stress on the on-site WiFi.
Here are some suggested open-source datasets:
flax-sentence-embeddings/stackexchange_math_jsonl
Cohere/wikipedia-22-12-en-embeddings
justicedao/Caselaw_Access_Project_embeddings
MongoDB/tech-news-embeddings
allenai/objaverse-xl
Note
The choice of dataset and data modality is an excellent opportunity to showcase your creativity!
It may help to choose datasets whose vector embeddings have been pre-calculated, or else to calculate and save them in advance. Otherwise, you can calculate embeddings for the dataset locally during the hackathon, or use free credits provided by our sponsors to perform this embedding in the cloud.
Here are some suggested open-source embedding models for text:
Note
You are not restricted to working with text. Consider image, video, audio, 3d meshes, graphs, and other modalities. Twelve Labs offers some excellent models for video embedding and inference. See their website for more details.
We also recommend downloading in advance any foundation models you plan to use locally during the hackathon. Here are some suggested open-source general-purpose foundation models (also look for quantized versions on HF):
meta-llama/Llama-3.2-11B-Vision-Instruct
microsoft/phi-4
mistralai/Mistral-7B-Instruct-v0.3
mistralai/Pixtral-12B-2409
Qwen/Qwen2.5-14B-Instruct
And specialized fine-tuned models:
meta-llama/CodeLlama-13b-hf
meta-llama/Llama-Guard-3-1B
grounded-ai/phi3-rag-relevance-judge
grounded-ai/phi3-rag-relevance-judge
grounded-ai/phi3-hallucination-judge
Important
Some foundation models on HuggingFace, for example, Llama 3.x
, require obtaining permission from the authors to download. It can take up to several days for permission to be granted, so we recommend that you do this in advance of the hackerthon.
Note
Multimodal models offer many avenues for creativity, and a technically sophisticated solution is likely to make use of several fine-tuned models for specific parts of the pipeline.
Tip
As an alternative, see here for free credits provided by our sponsors to perform model inference.
Zilliz, AWS, and Mistral have a generous free-tier for their cloud services.
Twelve Labs has kindly provided 10 free hours of credit for their inference service, including video foundation models.
OmniStack is providing over $500 credits for their inference, monitoring, and deployment services.
StreamNative is offering $200 in free credits for their cloud data platform.
At 9.30-10am, we will reveal the team assignment. Teams comprise 3-5 hackers of varying experience and backgrounds. Of course, you may negotiate a team change with your fellow hackers if you wish although encourage you to pair with people you have not previously met.
After settling on your teams, please decide on a team lead and complete the Team Registration form. You will have from 10am - 5.30pm to develop a submission with your team. Before 5.30pm push your final submission to your cloned repo.
Important
At this time, no further code changes will be considered by the judges.
Additional time from 5.30-6.00pm is provided to work on your presentation (see submission instructions below). Finally, each team will make a short presentation before the judges make a decision and announce the results!
Build a retrieval-augmented generation (RAG) system for one of the following applications:
- A recommender system;
- A question/answering system for a specialized > domain;
- A product review summarizer;
- A personalized job recruiter; or,
- Something of your own imagination!
Your submission must run in Python and use Milvus (any deployment type) or Zilliz Cloud as the underlying vector database. We recommend but do not require your submission to use Jupyter Notebook or Gradio.
You may use agentic steps in your RAG pipeline and free credits from our sponsors are available for embedding and foundation model inference.
Note
We provide suggested RAG applications, datasets, models etc. to give some structure to your starting point. Although, we want to emphasize that these are only suggestions - follow your creativity and passion!
Your chosen team lead submits your team's code via their fork of this GitHub repo.
- 9.30am - 10am: Have your team complete the Team Registration form, which requires,
- team's name and members;
- forked GitHub repo address for code submission; and,
- link to a copy of the final presentation template on Google slides.
Important
Set the necessary permissions so that the judges have access both to your GitHub repo and the final presentation slides.
- 10am - 5.30pm: Hack, hack, hack! Submit your code via pushes to your forked GitHub repo throughout the day.
Important
Ensure your final code is submitted before 5.30pm!
- 5.30pm - 6pm: Finalize your presentation slides saving to your copy of the Google slides template.
- 6pm - 7.30pm: Each team presents their project via Jupyter notebook, Gradio app, or some other way.
- 7.30pm - 8pm: Judges announce results!
The judges will rank the teams' submissions in 3 criteria, separately:
- creativity;
- technical sophistication; and,
- potential business impact.
In the spirit of RAG, the teams rankings will be combined into a single score per-judge with Reciprocal Rank Fusion (RRF). The per-judge score of a team is,
k = 10
score = 1 / (rank_creativity + k) + 1 / (rank_technical + k) + 1 / (rank_business + k)
where the rank
terms denote the team's ranking for a given judge and criterion. The final score per team is the average of team scores across judges. What this means is that the winning team must score highly across all 3 criteria with a consensus across judges.
We will provide a breakdown of team scores by final score and score per criterion separately (naturally, with error bars).
- First prize: $1000 bucks, $10,000 AWS credits, Zilliz Blog Opportunity, Social Mentions, Swag
- Second prize: $700 bucks, Zilliz Blog Opportunity, Social Mentions, Swag
- Third prize: $500 bucks, Social mentions, Swag
- Top score using Mistral models: $500 Mistral credits
- Everybody: Satisfaction from a job well done!
- Milvus documentation
- Milvus Bootcamp tutorials
- Milvus notebook gallery
- Zilliz Generative AI Resource Hub
- HuggingFace Open-Source AI Cookbook
More details of our sponsors and how to use their free cloud credits are provided here.