Skip to content

Latest commit

 

History

History
120 lines (90 loc) · 3.63 KB

File metadata and controls

120 lines (90 loc) · 3.63 KB

Changelog

  • 2024/11/25: Project Initialization

Table of Contents

  1. llm-web-kit
  2. TODO
  3. Known Issues
  4. FAQ
  5. All Thanks To Our Contributors
  6. License Information
  7. Acknowledgments
  8. Citation
  9. Star History
  10. Links

llm-web-kit

Project Introduction

llm-web-kit is a python library that ..

Key Features

  • Remove headers, footers, footnotes, page numbers, etc., to ensure semantic coherence.
  • Output text in human-readable order, suitable for single-column, multi-column, and complex layouts.

Quick Start

from llm_web_kit.simple import extract_html_to_md
import traceback
from loguru import logger

def extract(url:str, html:str) -> str:
    try:
        nlp_md = extract_html_to_md(url, html)
        # or mm_nlp_md = extract_html_to_mm_md(url, html)
        return nlp_md
    except Exception as e:
        logger.exception(e)
    return None

if __name__=="__main__":
    url = ""
    html = ""
    markdown = extract(url, html)

Usage

TODO

Known Issues

FAQ

contributors

contributors

License Information

Acknowledgments

Citation

Star History

Star History Chart

links