Skip to content

Commit 05d47a5

Browse files
authored
AMLC project page
1 parent c5c038f commit 05d47a5

File tree

2 files changed

+63
-68
lines changed

2 files changed

+63
-68
lines changed

_projects/8_project.md

Lines changed: 63 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -1,81 +1,76 @@
11
---
2-
layout: page
3-
title: project 8
4-
description: an other project with a background image and giscus comments
5-
img: assets/img/9.jpg
6-
importance: 2
7-
category: work
8-
giscus_comments: true
2+
layout: page
3+
title: Multimodal Price Regressor
4+
description: Smart product price prediction - Amazon ML Challenge 2025
5+
img: assets/img/AMLC2025-Final-drawio.png
6+
importance: 1
7+
category: ml_research
98
---
109

11-
Every project has a beautiful feature showcase page.
12-
It's easy to include images in a flexible 3-column grid format.
13-
Make your photos 1/3, 2/3, or full width.
10+
### <a href="[https://github.com/RudrakshSJoshi/amlc-multimodal-mlp]">[Code]</a> <a href="https://www.kaggle.com/datasets/manav2805/amazon-ml-challenge-25/data">[Data]</a>
1411

15-
To give your project a background in the portfolio page, just add the img tag to the front matter like so:
12+
As part of team **SPAM_LLMs**, we achieved **6th place** overall in the **Amazon ML Challenge 2025**, a national competition focused on smart product pricing. Our solution secured the **3rd position on the public leaderboard and 5th on the private leaderboard**.
1613

17-
---
18-
layout: page
19-
title: project
20-
description: a project with a background image
21-
img: /assets/img/12.jpg
22-
---
14+
This project showcases a **multimodal learning** approach to predict product prices from text descriptions and images. By using large, pre-trained embedding models without fine-tuning, our architecture effectively processed complex, real-world e-commerce data to achieve high accuracy.
15+
16+
---
2317

2418
<div class="row">
25-
<div class="col-sm mt-3 mt-md-0">
26-
{% include figure.liquid loading="eager" path="assets/img/1.jpg" title="example image" class="img-fluid rounded z-depth-1" %}
27-
</div>
28-
<div class="col-sm mt-3 mt-md-0">
29-
{% include figure.liquid loading="eager" path="assets/img/3.jpg" title="example image" class="img-fluid rounded z-depth-1" %}
30-
</div>
31-
<div class="col-sm mt-3 mt-md-0">
32-
{% include figure.liquid loading="eager" path="assets/img/5.jpg" title="example image" class="img-fluid rounded z-depth-1" %}
33-
</div>
34-
</div>
35-
<div class="caption">
36-
Caption photos easily. On the left, a road goes through a tunnel. Middle, leaves artistically fall in a hipster photoshoot. Right, in another hipster photoshoot, a lumberjack grasps a handful of pine needles.
37-
</div>
38-
<div class="row">
39-
<div class="col-sm mt-3 mt-md-0">
40-
{% include figure.liquid loading="eager" path="assets/img/5.jpg" title="example image" class="img-fluid rounded z-depth-1" %}
41-
</div>
19+
<div class="col-sm mt-3 mt-md-0">
20+
{% include figure.liquid path="assets/img/AMLC2025-Final-drawio.png" title="Solution Architecture" class="img-fluid rounded z-depth-1" %}
21+
</div>
4222
</div>
4323
<div class="caption">
44-
This image can also have a caption. It's like magic.
24+
Our solution combined text and image modalities using separate networks before a final regression head.
4525
</div>
4626

47-
You can also put regular text between your rows of images.
48-
Say you wanted to write a little bit about your project before you posted the rest of the images.
49-
You describe how you toiled, sweated, _bled_ for your project, and then... you reveal its glory in the next row of images.
50-
51-
<div class="row justify-content-sm-center">
52-
<div class="col-sm-8 mt-3 mt-md-0">
53-
{% include figure.liquid path="assets/img/6.jpg" title="example image" class="img-fluid rounded z-depth-1" %}
54-
</div>
55-
<div class="col-sm-4 mt-3 mt-md-0">
56-
{% include figure.liquid path="assets/img/11.jpg" title="example image" class="img-fluid rounded z-depth-1" %}
57-
</div>
58-
</div>
59-
<div class="caption">
60-
You can also have artistically styled 2/3 + 1/3 images, like these.
61-
</div>
27+
---
6228

63-
The code is simple.
64-
Just wrap your images with `<div class="col-sm">` and place them inside `<div class="row">` (read more about the <a href="https://getbootstrap.com/docs/4.4/layout/grid/">Bootstrap Grid</a> system).
65-
To make images responsive, add `img-fluid` class to each; for rounded corners and shadows use `rounded` and `z-depth-1` classes.
66-
Here's the code for the last row of images above:
67-
68-
{% raw %}
69-
70-
```html
71-
<div class="row justify-content-sm-center">
72-
<div class="col-sm-8 mt-3 mt-md-0">
73-
{% include figure.liquid path="assets/img/6.jpg" title="example image" class="img-fluid rounded z-depth-1" %}
74-
</div>
75-
<div class="col-sm-4 mt-3 mt-md-0">
76-
{% include figure.liquid path="assets/img/11.jpg" title="example image" class="img-fluid rounded z-depth-1" %}
77-
</div>
78-
</div>
79-
```
29+
## Competition Highlights
30+
31+
### **Key Achievements**
32+
- **Top National Ranking**: Finished 3rd (Public LB) and 5th (Private LB), placing us among the top 6 teams in a highly competitive national challenge.
33+
- **Advanced Multimodal Architecture**: Designed and implemented a novel system that processes text and image data in parallel to predict prices.
34+
- **Strategic Model Selection**: Demonstrated that large, frozen pre-trained models (**Qwen-3-4B**, **Siglip2**, **DinoV3**) can outperform fine-tuned smaller models, providing a crucial performance edge.
35+
- **Effective Data Preprocessing**: Developed a comprehensive pipeline for cleaning and normalizing noisy catalog data, which was critical for model performance.
36+
37+
---
38+
39+
## Methodology
40+
41+
### 1. **Business Problem**
42+
In e-commerce, setting the optimal price for products is vital for success. The challenge was to create a machine learning model that analyzes product details from text and images to accurately predict its price, tackling complexities like brand value, specifications, and quantity.
43+
44+
### 2. **Data Handling & Preprocessing**
45+
- **Transformation**: Applied a **log1p transformation** to the right-skewed price data to normalize its distribution.
46+
- **Text Cleaning**: Processed the `catalog_content` by stripping it into descriptions, bullet points, and quantity. We also standardized inconsistent units (e.g., "g," "gm" to "grams") and converted numbers to words to improve embedding quality.
47+
- **Feature Selection**: To handle large text fields, we prioritized using the product description if available, otherwise defaulting to the top five bullet points.
48+
49+
### 3. **Multimodal Architecture**
50+
- **Embedding Generation**: Utilized powerful, pre-trained models to generate high-quality embeddings for each modality:
51+
- **Text**: Qwen-3-4B
52+
- **Text + Image**: Siglip2 Giant
53+
- **Image**: DinoV3
54+
- **Modality-Specific Networks**: Fed the embeddings from each modality into separate, dedicated neural networks. This allowed the model to learn modality-specific representations before fusing the information.
55+
- **Final Regressor**: Concatenated the outputs from the modality-specific networks and passed them to a final regression head to predict the log-transformed price.
56+
57+
### 4. **Training and Evaluation**
58+
- **Loss Function**: Employed a **Log-based Mean Squared Error (MSE) Loss**, which is well-suited for log-transformed targets.
59+
- **Evaluation Metric**: The competition used the **Symmetric Mean Absolute Percentage Error (SMAPE)**, where a lower score is better.
60+
61+
---
62+
63+
## Insights & Conclusion
64+
65+
- **Power of Large Models**: Our success showed that using larger, state-of-the-art pre-trained models, even without fine-tuning, can provide superior feature representations compared to fine-tuning smaller, distilled models.
66+
- **Multimodality is Key**: Integrating image data provided a significant signal that purely text-based models might miss, confirming the value of a multimodal approach for complex real-world problems.
67+
- **Architecture Matters**: Using modality-specific networks to process embeddings before concatenation was a key architectural decision that improved training stability and overall performance.
68+
69+
---
8070

81-
{% endraw %}
71+
### **Team Members**
72+
This was a collaborative effort by **SPAM_LLMs** (IIT ISM Dhanbad):
73+
- Manav Jain [[LinkedIn]](https://www.linkedin.com/in/manav-jain-05a711255/)
74+
- Karaka Prasanth Naidu [[LinkedIn]](https://www.linkedin.com/in/prasanth-naidu-karaka-a7162019b/)
75+
- Alok Raj [[LinkedIn]](https://www.linkedin.com/in/loki-silvres/)
76+
- Rudraksh Sachin Joshi [[LinkedIn]](https://www.linkedin.com/in/rudraksh-sachin-joshi-75554b202/)
208 KB
Loading

0 commit comments

Comments
 (0)