Skip to content

XiaoHongShuSpider不知道怎么用? #6

@bobkingdom

Description

@bobkingdom

如题,换成马蜂窝的爬虫也似乎没爬到任何东西,这个要怎么用呀?
2024-09-01 23:47:53,010 - INFO - HTTP Request: GET https://www.mafengwo.cn/mdd "HTTP/1.1 301 Moved Permanently"
[ERROR][2024-09-01 23:47:53][main.py:439] - Error occurred while crawling: '__jsluid_s'
INFO: 127.0.0.1:53710 - "POST /fetch_mfw HTTP/1.1" 200 OK

@app.post("/fetch_mfw")
async def crawl_mafengwo_mdd():
    url = "https://www.mafengwo.cn/mdd"
    # proxy_gene_func = MyProxy()
    # config = SpiderConfig(proxy_gene_func=proxy_gene_func)
    config = SpiderConfig()
    # 使用 XiaoHongShuSpider
    spider = MaFengWoSpider(config)
   
    try:
        # 使用异步方法抓取网页内容
        doc = await spider.a_crawl(url)
        logger.info(f"Successfully crawled content: {doc.page_content}")
        
       
        return doc.page_content
    
    except Exception as e:
        logger.error(f"Error occurred while crawling: {str(e)}")
        return {"error": str(e)}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions