This project provides a structured database of more than 14,000 previous year questions (PYQS) from JEE Mains. The questions are reverse engineered from API endpoints of a subscription site and cached for efficient use. It supports clustering, filtering, and rendering of questions into HTML for easy study.
- Access to 14k+ JEE Mains PYQS
- Precomputed embeddings using the
intfloat/e5-large-v2model for efficient clustering - Cluster similar questions together based on semantic embeddings
- Apply chainable filters (by chapter, topic, year, etc.)
- Render filtered or clustered questions into HTML using themed styles
The core folder contains the following modules:
- cache.py – Defines the
Cacheclass for creating and loading internal caches. Not intended for direct user interaction. - chapter.py – Defines the
Chapterclass, which is stored in theDataBaseChapterscache file. Internal use only. - data_base.py – Defines the
DataBaseclass. This must be initialized before any operations. - filter.py – Defines the
Filterclass. Provides chainable methods to filter questions and update the current set. - question.py – Defines the
Questionobject. - styles.py – Contains themed HTML styles for rendering.
- pdfy.py – Provides functions to convert clusters or sets of questions into HTML.
- Install using pip:
pip install jee_data_base
- Clone the repository:
git clone https://github.com/HostServer001/jee_mains_pyqs_data_base
Navigate into the project directory and ensure dependencies are installed.
import os
from jee_data_base import DataBase, Filter, pdfy
# Initialize database
db = DataBase()
# Initialize filter
filter = Filter(db.chapters_dict)
# Inspect available chapters
print(filter.get_possible_filter_values()["chapter"])Its highly recommended to filter as much as possible so that your html files open smoothly in browser
Its always good to use the cluster method and render_cluster_to_html method to get your output, it provides the most efficeint way of practice
The render_cluster_to_html_skim is great if you have prepared chapter loosely and want to skin thorugh and get most out of it (use it after cluster)
from jee_data_base import DataBase,Filter
path = "<path where chpater folder will be created>"
chpater = "<your example chpater>"
#Load the data base
db = DataBase()
#Initialize filter
filter = Filter(db.chapter_dict)
#Create html file
filter.render_chap_last5yrs(path,chpater,skim=False)# Get all questions from a specific chapter in the last 3 years
questions = filter.by_chapter("thermodynamics").by_n_last_yrs(3).get()
for q in questions:
print(q.question)# Cluster questions by topic and render to HTML
filter.current_set = filter.by_chapter("organic-compounds").by_n_last_yrs(5).get()
cluster = filter.cluster()
pdfy.render_cluster_to_html(
cluster,
"organic_compounds.html",
"Organic Compounds - Last 5 Years"
)# can use render_cluster_to_html_skim() function to make a file which
#makes a html file perfected for skiming through a chapterdef render_chapter(chapter_name: str):
all_q = filter.by_chapter(chapter_name).by_n_last_yrs(5).get()
os.makedirs(chapter_name, exist_ok=True)
for topic in filter.get_possible_filter_values()["topic"]:
filter.current_set = all_q
filter.by_topic(topic)
cluster = filter.cluster()
pdfy.render_cluster_to_html_skim(
cluster,
f"{chapter_name}/{topic}.html",
topic
)
render_chapter("alcohols-phenols-and-ethers")- The output will look somthing like this PDF 📄
- DataBaseChapters – Contains a dictionary with chapter names as keys and
Chapterobjects as values. - EmbeddingsChapters – Contains precomputed embeddings of all questions to save computation time.
Contributions are welcome. You can help by:
- Improving documentation
- Adding new filters or clustering strategies
- Enhancing rendering styles
- Reporting issues and suggesting features
Fork the repository, create a new branch for your changes, and submit a pull request.
This project is provided for educational purposes. Please review the repository for licensing details.