Skip to content

Does this work? #1

@Anonyfox

Description

@Anonyfox

Hey, I just stumbled upon this repo, and it seems that you have ported the famous readability algorithm into rust, using kuchiki and therefore html5ever. First: truly great!

But it seems that this algo does crash when used on actual HTML websites, I get panics like

1:        0x10a9bc24c - std::sys::imp::backtrace::tracing::imp::write::hf587afb8e94ad165
   2:        0x10a9be23e - std::panicking::default_hook::{{closure}}::haf3443cb412055ce
   3:        0x10a9bdde3 - std::panicking::default_hook::h742f925bfab3bbfa
   4:        0x10a9be6f7 - std::panicking::rust_panic_with_hook::h6f06ff8d28a94df6
   5:        0x10a9be5a4 - std::panicking::begin_panic::h7b9167ba3324cfae
   6:        0x10a9be4c2 - std::panicking::begin_panic_fmt::hb5f8f1fe0fe23e28
   7:        0x10a9be427 - rust_begin_unwind
   8:        0x10a9e5e60 - core::panicking::panic_fmt::he6eb92dab4407c61
   9:        0x10a9e5eed - core::option::expect_failed::hf8bba00a6e833438
  10:        0x10a70f373 - <core::option::Option<T>>::expect::hba43ec4f65591df2
  11:        0x10a6cf697 - <std::collections::hash::map::HashMap<K, V, S> as core::ops::Index<&'a Q>>::index::he1febf3b2b851612
  12:        0x10a782795 - readability::Readability::add_info::h3257b725054a9642
  13:        0x10a782026 - readability::Readability::readify::h110ae48756961de8
  14:        0x10a781a7a - readability::Readability::parse::h69c7871f90548046

Maybe this repo needs also some small polish, like publishing on crates.io and a README with a short "how to use". I just figured out that

readability::new().parse(&html_string).text_contents()

works more or less to get started, but I tinkered with kuchiki before. Do you want some help? I might not be of good use for the algorithmic side in Rust yet, but when you have a working state of this crate I'd like to write some docs for you in exchange. What dou you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions