|
1 | 1 | --- |
2 | | -layout: page |
3 | | -title: project 8 |
4 | | -description: an other project with a background image and giscus comments |
5 | | -img: assets/img/9.jpg |
6 | | -importance: 2 |
7 | | -category: work |
8 | | -giscus_comments: true |
| 2 | +layout: page |
| 3 | +title: Multimodal Price Regressor |
| 4 | +description: Smart product price prediction - Amazon ML Challenge 2025 |
| 5 | +img: assets/img/AMLC2025-Final-drawio.png |
| 6 | +importance: 1 |
| 7 | +category: ml_research |
9 | 8 | --- |
10 | 9 |
|
11 | | -Every project has a beautiful feature showcase page. |
12 | | -It's easy to include images in a flexible 3-column grid format. |
13 | | -Make your photos 1/3, 2/3, or full width. |
| 10 | +### <a href="[https://github.com/RudrakshSJoshi/amlc-multimodal-mlp]">[Code]</a> <a href="https://www.kaggle.com/datasets/manav2805/amazon-ml-challenge-25/data">[Data]</a> |
14 | 11 |
|
15 | | -To give your project a background in the portfolio page, just add the img tag to the front matter like so: |
| 12 | +As part of team **SPAM_LLMs**, we achieved **6th place** overall in the **Amazon ML Challenge 2025**, a national competition focused on smart product pricing. Our solution secured the **3rd position on the public leaderboard and 5th on the private leaderboard**. |
16 | 13 |
|
17 | | - --- |
18 | | - layout: page |
19 | | - title: project |
20 | | - description: a project with a background image |
21 | | - img: /assets/img/12.jpg |
22 | | - --- |
| 14 | +This project showcases a **multimodal learning** approach to predict product prices from text descriptions and images. By using large, pre-trained embedding models without fine-tuning, our architecture effectively processed complex, real-world e-commerce data to achieve high accuracy. |
| 15 | + |
| 16 | +--- |
23 | 17 |
|
24 | 18 | <div class="row"> |
25 | | - <div class="col-sm mt-3 mt-md-0"> |
26 | | - {% include figure.liquid loading="eager" path="assets/img/1.jpg" title="example image" class="img-fluid rounded z-depth-1" %} |
27 | | - </div> |
28 | | - <div class="col-sm mt-3 mt-md-0"> |
29 | | - {% include figure.liquid loading="eager" path="assets/img/3.jpg" title="example image" class="img-fluid rounded z-depth-1" %} |
30 | | - </div> |
31 | | - <div class="col-sm mt-3 mt-md-0"> |
32 | | - {% include figure.liquid loading="eager" path="assets/img/5.jpg" title="example image" class="img-fluid rounded z-depth-1" %} |
33 | | - </div> |
34 | | -</div> |
35 | | -<div class="caption"> |
36 | | - Caption photos easily. On the left, a road goes through a tunnel. Middle, leaves artistically fall in a hipster photoshoot. Right, in another hipster photoshoot, a lumberjack grasps a handful of pine needles. |
37 | | -</div> |
38 | | -<div class="row"> |
39 | | - <div class="col-sm mt-3 mt-md-0"> |
40 | | - {% include figure.liquid loading="eager" path="assets/img/5.jpg" title="example image" class="img-fluid rounded z-depth-1" %} |
41 | | - </div> |
| 19 | + <div class="col-sm mt-3 mt-md-0"> |
| 20 | + {% include figure.liquid path="assets/img/AMLC2025-Final-drawio.png" title="Solution Architecture" class="img-fluid rounded z-depth-1" %} |
| 21 | + </div> |
42 | 22 | </div> |
43 | 23 | <div class="caption"> |
44 | | - This image can also have a caption. It's like magic. |
| 24 | + Our solution combined text and image modalities using separate networks before a final regression head. |
45 | 25 | </div> |
46 | 26 |
|
47 | | -You can also put regular text between your rows of images. |
48 | | -Say you wanted to write a little bit about your project before you posted the rest of the images. |
49 | | -You describe how you toiled, sweated, _bled_ for your project, and then... you reveal its glory in the next row of images. |
50 | | - |
51 | | -<div class="row justify-content-sm-center"> |
52 | | - <div class="col-sm-8 mt-3 mt-md-0"> |
53 | | - {% include figure.liquid path="assets/img/6.jpg" title="example image" class="img-fluid rounded z-depth-1" %} |
54 | | - </div> |
55 | | - <div class="col-sm-4 mt-3 mt-md-0"> |
56 | | - {% include figure.liquid path="assets/img/11.jpg" title="example image" class="img-fluid rounded z-depth-1" %} |
57 | | - </div> |
58 | | -</div> |
59 | | -<div class="caption"> |
60 | | - You can also have artistically styled 2/3 + 1/3 images, like these. |
61 | | -</div> |
| 27 | +--- |
62 | 28 |
|
63 | | -The code is simple. |
64 | | -Just wrap your images with `<div class="col-sm">` and place them inside `<div class="row">` (read more about the <a href="https://getbootstrap.com/docs/4.4/layout/grid/">Bootstrap Grid</a> system). |
65 | | -To make images responsive, add `img-fluid` class to each; for rounded corners and shadows use `rounded` and `z-depth-1` classes. |
66 | | -Here's the code for the last row of images above: |
67 | | - |
68 | | -{% raw %} |
69 | | - |
70 | | -```html |
71 | | -<div class="row justify-content-sm-center"> |
72 | | - <div class="col-sm-8 mt-3 mt-md-0"> |
73 | | - {% include figure.liquid path="assets/img/6.jpg" title="example image" class="img-fluid rounded z-depth-1" %} |
74 | | - </div> |
75 | | - <div class="col-sm-4 mt-3 mt-md-0"> |
76 | | - {% include figure.liquid path="assets/img/11.jpg" title="example image" class="img-fluid rounded z-depth-1" %} |
77 | | - </div> |
78 | | -</div> |
79 | | -``` |
| 29 | +## Competition Highlights |
| 30 | + |
| 31 | +### **Key Achievements** |
| 32 | +- **Top National Ranking**: Finished 3rd (Public LB) and 5th (Private LB), placing us among the top 6 teams in a highly competitive national challenge. |
| 33 | +- **Advanced Multimodal Architecture**: Designed and implemented a novel system that processes text and image data in parallel to predict prices. |
| 34 | +- **Strategic Model Selection**: Demonstrated that large, frozen pre-trained models (**Qwen-3-4B**, **Siglip2**, **DinoV3**) can outperform fine-tuned smaller models, providing a crucial performance edge. |
| 35 | +- **Effective Data Preprocessing**: Developed a comprehensive pipeline for cleaning and normalizing noisy catalog data, which was critical for model performance. |
| 36 | + |
| 37 | +--- |
| 38 | + |
| 39 | +## Methodology |
| 40 | + |
| 41 | +### 1. **Business Problem** |
| 42 | +In e-commerce, setting the optimal price for products is vital for success. The challenge was to create a machine learning model that analyzes product details from text and images to accurately predict its price, tackling complexities like brand value, specifications, and quantity. |
| 43 | + |
| 44 | +### 2. **Data Handling & Preprocessing** |
| 45 | +- **Transformation**: Applied a **log1p transformation** to the right-skewed price data to normalize its distribution. |
| 46 | +- **Text Cleaning**: Processed the `catalog_content` by stripping it into descriptions, bullet points, and quantity. We also standardized inconsistent units (e.g., "g," "gm" to "grams") and converted numbers to words to improve embedding quality. |
| 47 | +- **Feature Selection**: To handle large text fields, we prioritized using the product description if available, otherwise defaulting to the top five bullet points. |
| 48 | + |
| 49 | +### 3. **Multimodal Architecture** |
| 50 | +- **Embedding Generation**: Utilized powerful, pre-trained models to generate high-quality embeddings for each modality: |
| 51 | + - **Text**: Qwen-3-4B |
| 52 | + - **Text + Image**: Siglip2 Giant |
| 53 | + - **Image**: DinoV3 |
| 54 | +- **Modality-Specific Networks**: Fed the embeddings from each modality into separate, dedicated neural networks. This allowed the model to learn modality-specific representations before fusing the information. |
| 55 | +- **Final Regressor**: Concatenated the outputs from the modality-specific networks and passed them to a final regression head to predict the log-transformed price. |
| 56 | + |
| 57 | +### 4. **Training and Evaluation** |
| 58 | +- **Loss Function**: Employed a **Log-based Mean Squared Error (MSE) Loss**, which is well-suited for log-transformed targets. |
| 59 | +- **Evaluation Metric**: The competition used the **Symmetric Mean Absolute Percentage Error (SMAPE)**, where a lower score is better. |
| 60 | + |
| 61 | +--- |
| 62 | + |
| 63 | +## Insights & Conclusion |
| 64 | + |
| 65 | +- **Power of Large Models**: Our success showed that using larger, state-of-the-art pre-trained models, even without fine-tuning, can provide superior feature representations compared to fine-tuning smaller, distilled models. |
| 66 | +- **Multimodality is Key**: Integrating image data provided a significant signal that purely text-based models might miss, confirming the value of a multimodal approach for complex real-world problems. |
| 67 | +- **Architecture Matters**: Using modality-specific networks to process embeddings before concatenation was a key architectural decision that improved training stability and overall performance. |
| 68 | + |
| 69 | +--- |
80 | 70 |
|
81 | | -{% endraw %} |
| 71 | +### **Team Members** |
| 72 | +This was a collaborative effort by **SPAM_LLMs** (IIT ISM Dhanbad): |
| 73 | +- Manav Jain [[LinkedIn]](https://www.linkedin.com/in/manav-jain-05a711255/) |
| 74 | +- Karaka Prasanth Naidu [[LinkedIn]](https://www.linkedin.com/in/prasanth-naidu-karaka-a7162019b/) |
| 75 | +- Alok Raj [[LinkedIn]](https://www.linkedin.com/in/loki-silvres/) |
| 76 | +- Rudraksh Sachin Joshi [[LinkedIn]](https://www.linkedin.com/in/rudraksh-sachin-joshi-75554b202/) |
0 commit comments