As part of the assignment, you are required to explore IMDB dataset from Kaggle or an open-source repository of your choice. You will apply your knowledge of deep learning and PyTorch to develop a solution for a specific problem in the dataset. Use Azure ML Studio with Designer Pipelines for using Pre-Build components and Visualization of movie categories for recommendation system.
-
Dataset Collection:
- Choose a dataset from Kaggle or any open-source source. Ensure that the dataset is not overly complex and can be processed on a standard local machine.
- Sample Dataset from kaggle is here
-
Preprocessing:
- Analyze the dataset and preprocess the data to make it suitable for training (e.g., normalization, encoding, splitting).
- Use Stemming, Stopwords, Lemmatization, etc.,
-
Feature Extraction:
- Use
Bag of WordsorTF-IDFVectorization technique
- Use
-
Model Building:
- Build a PyTorch-based deep learning model that solves the selected problem. You can use a simple neural network or experiment with architectures like CNNs or RNNs depending on your dataset.
-
Training and Evaluation:
- Train your model, evaluate its performance, and present metrics like accuracy, loss, or other relevant metrics.
-
Visualization:
- Include visualizations such as loss curves, accuracy trends, or sample predictions.
-
Documentation:
- Document your findings, observations, and challenges in a concise report.
Add that document to your Git repository and share the repository URL with your instructor.URL in this Microsoft Form - Here