NanmiCoder
diff --git a/‎README.md‎
Lines changed: 7 additions & 0 deletions b/‎README.md‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎README_en.md‎
Lines changed: 7 additions & 0 deletions b/‎README_en.md‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎config/base_config.py‎
Lines changed: 2 additions & 2 deletions b/‎config/base_config.py‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/excel_export_guide.md‎
Lines changed: 244 additions & 0 deletions b/‎docs/excel_export_guide.md‎
Lines changed: 244 additions & 0 deletions
diff --git a/‎main.py‎
Lines changed: 12 additions & 0 deletions b/‎main.py‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 3 additions & 0 deletions b/‎pyproject.toml‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎requirements.txt‎
Lines changed: 4 additions & 1 deletion b/‎requirements.txt‎
Lines changed: 4 additions & 1 deletion
@@ -212,6 +212,10 @@ python main.py --help
 支持多种数据存储方式：
 - **CSV 文件**：支持保存到 CSV 中（`data/` 目录下）
 - **JSON 文件**：支持保存到 JSON 中（`data/` 目录下）
+- **Excel 文件**：支持保存到格式化的 Excel 文件（`data/` 目录下）✨ 新功能
+  - 多工作表支持（内容、评论、创作者）
+  - 专业格式化（标题样式、自动列宽、边框）
+  - 易于分析和分享
 - **数据库存储**
   - 使用参数 `--init_db` 进行数据库初始化（使用`--init_db`时不需要携带其他optional）
   - **SQLite 数据库**：轻量级数据库，无需服务器，适合个人使用（推荐）
@@ -224,6 +228,9 @@ python main.py --help
 
 ### 使用示例：
 ```shell
+# 使用 Excel 存储数据（推荐用于数据分析）✨ 新功能
+uv run main.py --platform xhs --lt qrcode --type search --save_data_option excel
+
 # 初始化 SQLite 数据库（使用'--init_db'时不需要携带其他optional）
 uv run main.py --init_db sqlite
 # 使用 SQLite 存储数据（推荐个人用户使用）
 
@@ -209,6 +209,10 @@ python main.py --help
 Supports multiple data storage methods:
 - **CSV Files**: Supports saving to CSV (under `data/` directory)
 - **JSON Files**: Supports saving to JSON (under `data/` directory)
+- **Excel Files**: Supports saving to formatted Excel files (under `data/` directory) ✨ New Feature
+  - Multi-sheet support (Contents, Comments, Creators)
+  - Professional formatting (styled headers, auto-width columns, borders)
+  - Easy to analyze and share
 - **Database Storage**
   - Use the `--init_db` parameter for database initialization (when using `--init_db`, no other optional arguments are needed)
   - **SQLite Database**: Lightweight database, no server required, suitable for personal use (recommended)
@@ -221,6 +225,9 @@ Supports multiple data storage methods:
 
 ### Usage Examples:
 ```shell
+# Use Excel to store data (recommended for data analysis) ✨ New Feature
+uv run main.py --platform xhs --lt qrcode --type search --save_data_option excel
+
 # Initialize SQLite database (when using '--init_db', no other optional arguments are needed)
 uv run main.py --init_db sqlite
 # Use SQLite to store data (recommended for personal users)
 
@@ -70,8 +70,8 @@
 # 设置为False可以保持浏览器运行，便于调试
 AUTO_CLOSE_BROWSER = True
 
-# 数据保存类型选项配置,支持四种类型：csv、db、json、sqlite, 最好保存到DB，有排重的功能。
-SAVE_DATA_OPTION = "json"  # csv or db or json or sqlite
+# 数据保存类型选项配置,支持五种类型：csv、db、json、sqlite、excel, 最好保存到DB，有排重的功能。
+SAVE_DATA_OPTION = "json"  # csv or db or json or sqlite or excel
 
 # 用户浏览器缓存的浏览器文件配置
 USER_DATA_DIR = "%s_user_data_dir"  # %s will be replaced by platform name
 
@@ -0,0 +1,244 @@
+# Excel Export Guide
+
+## Overview
+
+MediaCrawler now supports exporting crawled data to formatted Excel files (.xlsx) with professional styling and multiple sheets for contents, comments, and creators.
+
+## Features
+
+- **Multi-sheet workbooks**: Separate sheets for Contents, Comments, and Creators
+- **Professional formatting**: 
+  - Styled headers with blue background and white text
+  - Auto-adjusted column widths
+  - Cell borders and text wrapping
+  - Clean, readable layout
+- **Smart export**: Empty sheets are automatically removed
+- **Organized storage**: Files saved to `data/{platform}/` directory with timestamps
+
+## Installation
+
+Excel export requires the `openpyxl` library:
+
+```bash
+# Using uv (recommended)
+uv sync
+
+# Or using pip
+pip install openpyxl
+```
+
+## Usage
+
+### Basic Usage
+
+1. **Configure Excel export** in `config/base_config.py`:
+
+```python
+SAVE_DATA_OPTION = "excel"  # Change from json/csv/db to excel
+```
+
+2. **Run the crawler**:
+
+```bash
+# Xiaohongshu example
+uv run main.py --platform xhs --lt qrcode --type search
+
+# Douyin example
+uv run main.py --platform dy --lt qrcode --type search
+
+# Bilibili example
+uv run main.py --platform bili --lt qrcode --type search
+```
+
+3. **Find your Excel file** in `data/{platform}/` directory:
+   - Filename format: `{platform}_{crawler_type}_{timestamp}.xlsx`
+   - Example: `xhs_search_20250128_143025.xlsx`
+
+### Command Line Examples
+
+```bash
+# Search by keywords and export to Excel
+uv run main.py --platform xhs --lt qrcode --type search --save_data_option excel
+
+# Crawl specific posts and export to Excel
+uv run main.py --platform xhs --lt qrcode --type detail --save_data_option excel
+
+# Crawl creator profile and export to Excel
+uv run main.py --platform xhs --lt qrcode --type creator --save_data_option excel
+```
+
+## Excel File Structure
+
+### Contents Sheet
+Contains post/video information:
+- `note_id`: Unique post identifier
+- `title`: Post title
+- `desc`: Post description
+- `user_id`: Author user ID
+- `nickname`: Author nickname
+- `liked_count`: Number of likes
+- `comment_count`: Number of comments
+- `share_count`: Number of shares
+- `ip_location`: IP location
+- `image_list`: Comma-separated image URLs
+- `tag_list`: Comma-separated tags
+- `note_url`: Direct link to post
+- And more platform-specific fields...
+
+### Comments Sheet
+Contains comment information:
+- `comment_id`: Unique comment identifier
+- `note_id`: Associated post ID
+- `content`: Comment text
+- `user_id`: Commenter user ID
+- `nickname`: Commenter nickname
+- `like_count`: Comment likes
+- `create_time`: Comment timestamp
+- `ip_location`: Commenter location
+- `sub_comment_count`: Number of replies
+- And more...
+
+### Creators Sheet
+Contains creator/author information:
+- `user_id`: Unique user identifier
+- `nickname`: Display name
+- `gender`: Gender
+- `avatar`: Profile picture URL
+- `desc`: Bio/description
+- `fans`: Follower count
+- `follows`: Following count
+- `interaction`: Total interactions
+- And more...
+
+## Advantages Over Other Formats
+
+### vs CSV
+- ✅ Multiple sheets in one file
+- ✅ Professional formatting
+- ✅ Better handling of special characters
+- ✅ Auto-adjusted column widths
+- ✅ No encoding issues
+
+### vs JSON
+- ✅ Human-readable tabular format
+- ✅ Easy to open in Excel/Google Sheets
+- ✅ Better for data analysis
+- ✅ Easier to share with non-technical users
+
+### vs Database
+- ✅ No database setup required
+- ✅ Portable single-file format
+- ✅ Easy to share and archive
+- ✅ Works offline
+
+## Tips & Best Practices
+
+1. **Large datasets**: For very large crawls (>10,000 rows), consider using database storage instead for better performance
+
+2. **Data analysis**: Excel files work great with:
+   - Microsoft Excel
+   - Google Sheets
+   - LibreOffice Calc
+   - Python pandas: `pd.read_excel('file.xlsx')`
+
+3. **Combining data**: You can merge multiple Excel files using:
+   ```python
+   import pandas as pd
+   df1 = pd.read_excel('file1.xlsx', sheet_name='Contents')
+   df2 = pd.read_excel('file2.xlsx', sheet_name='Contents')
+   combined = pd.concat([df1, df2])
+   combined.to_excel('combined.xlsx', index=False)
+   ```
+
+4. **File size**: Excel files are typically 2-3x larger than CSV but smaller than JSON
+
+## Troubleshooting
+
+### "openpyxl not installed" error
+
+```bash
+# Install openpyxl
+uv add openpyxl
+# or
+pip install openpyxl
+```
+
+### Excel file not created
+
+Check that:
+1. `SAVE_DATA_OPTION = "excel"` in config
+2. Crawler successfully collected data
+3. No errors in console output
+4. `data/{platform}/` directory exists
+
+### Empty Excel file
+
+This happens when:
+- No data was crawled (check keywords/IDs)
+- Login failed (check login status)
+- Platform blocked requests (check IP/rate limits)
+
+## Example Output
+
+After running a successful crawl, you'll see:
+
+```
+[ExcelStoreBase] Initialized Excel export to: data/xhs/xhs_search_20250128_143025.xlsx
+[ExcelStoreBase] Stored content to Excel: 7123456789
+[ExcelStoreBase] Stored comment to Excel: comment_123
+...
+[Main] Excel file saved successfully
+```
+
+Your Excel file will have:
+- Professional blue headers
+- Clean borders
+- Wrapped text for long content
+- Auto-sized columns
+- Separate organized sheets
+
+## Advanced Usage
+
+### Programmatic Access
+
+```python
+from store.excel_store_base import ExcelStoreBase
+
+# Create store
+store = ExcelStoreBase(platform="xhs", crawler_type="search")
+
+# Store data
+await store.store_content({
+    "note_id": "123",
+    "title": "Test Post",
+    "liked_count": 100
+})
+
+# Save to file
+store.flush()
+```
+
+### Custom Formatting
+
+You can extend `ExcelStoreBase` to customize formatting:
+
+```python
+from store.excel_store_base import ExcelStoreBase
+
+class CustomExcelStore(ExcelStoreBase):
+    def _apply_header_style(self, sheet, row_num=1):
+        # Custom header styling
+        super()._apply_header_style(sheet, row_num)
+        # Add your customizations here
+```
+
+## Support
+
+For issues or questions:
+- Check [常见问题](常见问题.md)
+- Open an issue on GitHub
+- Join the WeChat discussion group
+
+---
+
+**Note**: Excel export is designed for learning and research purposes. Please respect platform terms of service and rate limits.
@@ -84,6 +84,18 @@ async def main():
     crawler = CrawlerFactory.create_crawler(platform=config.PLATFORM)
     await crawler.start()
 
+    # Flush Excel data if using Excel export
+    if config.SAVE_DATA_OPTION == "excel":
+        try:
+            # Get the store instance and flush data
+            from store.xhs import XhsStoreFactory
+            store = XhsStoreFactory.create_store()
+            if hasattr(store, 'flush'):
+                store.flush()
+                print(f"[Main] Excel file saved successfully")
+        except Exception as e:
+            print(f"Error flushing Excel data: {e}")
+
     # Generate wordcloud after crawling is complete
     # Only for JSON save mode
     if config.SAVE_DATA_OPTION == "json" and config.ENABLE_GET_WORDCLOUD:
 
@@ -35,6 +35,9 @@ dependencies = [
     "wordcloud==1.9.3",
     "xhshow>=0.1.3",
     "pre-commit>=3.5.0",
+    "openpyxl>=3.1.2",
+    "pytest>=7.4.0",
+    "pytest-asyncio>=0.21.0",
 ]
 
 [[tool.uv.index]]
 
@@ -25,4 +25,7 @@ alembic>=1.16.5
 asyncmy>=0.2.10
 sqlalchemy>=2.0.43
 motor>=3.3.0
-xhshow>=0.1.3
+xhshow>=0.1.3
+openpyxl>=3.1.2
+pytest>=7.4.0
+pytest-asyncio>=0.21.0
Original file line number	Diff line number	Diff line change
`@@ -35,6 +35,9 @@ dependencies = [`
`35`	`35`	`"wordcloud==1.9.3",`
`36`	`36`	`"xhshow>=0.1.3",`
`37`	`37`	`"pre-commit>=3.5.0",`
	`38`	`+ "openpyxl>=3.1.2",`
	`39`	`+ "pytest>=7.4.0",`
	`40`	`+ "pytest-asyncio>=0.21.0",`
`38`	`41`	`]`
`39`	`42`
`40`	`43`	`[[tool.uv.index]]`