Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 17 additions & 15 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
- **Clarity over cleverness.** Be concise, but favour explicit over terse or
obscure idioms. Prefer code that's easy to follow.
- **Use functions and composition.** Avoid repetition by extracting reusable
logic. Prefer generators or comprehensions, and declarative code to imperative
repetition when readable.
logic. Prefer generators or comprehensions, and declarative code to
imperative repetition when readable.
- **Small, meaningful functions.** Functions must be small, clear in purpose,
single responsibility, and obey command/query segregation.
- **Clear commit messages.** Commit messages should be descriptive, explaining
Expand All @@ -25,12 +25,14 @@
("-ize" / "-yse" / "-our") spelling and grammar, with the exception of
references to external APIs.
- **Illustrate with clear examples.** Function documentation must include clear
examples demonstrating the usage and outcome of the function. Test documentation
should omit examples where the example serves only to reiterate the test logic.
- **Keep file size managable.** No single code file may be longer than 400 lines.
examples demonstrating the usage and outcome of the function. Test
documentation should omit examples where the example serves only to reiterate
the test logic.
- **Keep file size managable.** No single code file may be longer than 400
lines.
Long switch statements or dispatch tables should be broken up by feature and
constituents colocated with targets. Large blocks of test data should be moved
to external data files.
constituents colocated with targets. Large blocks of test data should be
moved to external data files.

## Documentation Maintenance

Expand All @@ -42,8 +44,8 @@
relevant file(s) in the `docs/` directory to reflect the latest state.
**Ensure the documentation remains accurate and current.**
- Documentation must use en-GB-oxendict ("-ize" / "-yse" / "-our") spelling
and grammar. (EXCEPTION: the naming of the "LICENSE" file, which
is to be left unchanged for community consistency.)
and grammar. (EXCEPTION: the naming of the "LICENSE" file, which is to be
left unchanged for community consistency.)

## Change Quality & Committing

Expand Down Expand Up @@ -153,19 +155,19 @@ project:
specified in `Cargo.toml` must use SemVer-compatible caret requirements
(e.g., `some-crate = "1.2.3"`). This is Cargo's default and allows for safe,
non-breaking updates to minor and patch versions while preventing breaking
changes from new major versions. This approach is critical for ensuring
build stability and reproducibility.
changes from new major versions. This approach is critical for ensuring build
stability and reproducibility.
- **Prohibit unstable version specifiers.** The use of wildcard (`*`) or
open-ended inequality (`>=`) version requirements is strictly forbidden
as they introduce unacceptable risk and unpredictability. Tilde requirements
open-ended inequality (`>=`) version requirements is strictly forbidden as
they introduce unacceptable risk and unpredictability. Tilde requirements
(`~`) should only be used where a dependency must be locked to patch-level
updates for a specific, documented reason.

### Error Handling

- **Prefer semantic error enums**. Derive `std::error::Error` (via the
`thiserror` crate) for any condition the caller might inspect, retry, or
map to an HTTP status.
`thiserror` crate) for any condition the caller might inspect, retry, or map
to an HTTP status.
- **Use an *opaque* error only at the app boundary**. Use `eyre::Report` for
human-readable logs; these should not be exposed in public APIs.
- **Never export the opaque type from a library**. Convert to domain enums at
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ alongside regular Markdown tables.

See
[HTML table support for more details](docs/architecture.md#html-table-support-in-mdtablefix)
.
.

## Module structure

Expand Down
5 changes: 3 additions & 2 deletions src/ellipsis.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,13 @@
//! complete triple remain. Fenced code blocks and inline code spans are left
//! untouched.

use std::sync::LazyLock;

use regex::Regex;

use crate::wrap::{Token, tokenize_markdown};

static DOT_RE: std::sync::LazyLock<Regex> =
std::sync::LazyLock::new(|| Regex::new(r"\.{3,}").unwrap());
static DOT_RE: LazyLock<Regex> = lazy_regex!(r"\.{3,}", "ellipsis pattern regex should compile");

/// Replace `...` with `…` outside code spans and fences.
#[must_use]
Expand Down
17 changes: 10 additions & 7 deletions src/footnotes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,19 @@
//! footnote links and rewrites the trailing numeric list into a footnote
//! block. Only the final contiguous list of footnotes is processed.

use std::sync::LazyLock;

use regex::{Captures, Regex};

static INLINE_FN_RE: std::sync::LazyLock<Regex> = std::sync::LazyLock::new(|| {
Regex::new(r"(?P<pre>^|[^0-9])(?P<punc>[.!?);:])(?P<style>[*_]*)(?P<num>\d+)(?P<boundary>\s|$)")
.unwrap()
});
static INLINE_FN_RE: LazyLock<Regex> = lazy_regex!(
r"(?P<pre>^|[^0-9])(?P<punc>[.!?);:])(?P<style>[*_]*)(?P<num>\d+)(?P<boundary>\s|$)",
"inline footnote reference pattern should compile",
);

static FOOTNOTE_LINE_RE: std::sync::LazyLock<Regex> = std::sync::LazyLock::new(|| {
Regex::new(r"^(?P<indent>\s*)(?P<num>\d+)\.\s+(?P<rest>.*)$").unwrap()
});
static FOOTNOTE_LINE_RE: LazyLock<Regex> = lazy_regex!(
r"^(?P<indent>\s*)(?P<num>\d+)\.\s+(?P<rest>.*)$",
"footnote line pattern should compile",
);

use crate::wrap::{Token, tokenize_markdown};

Expand Down
9 changes: 6 additions & 3 deletions src/html.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,13 @@ use regex::Regex;
use crate::wrap::is_fence;

/// Matches the start of an HTML `<table>` tag, ignoring case.
static TABLE_START_RE: LazyLock<Regex> =
LazyLock::new(|| Regex::new(r"(?i)^<table(?:\s|>|$)").unwrap());
static TABLE_START_RE: LazyLock<Regex> = lazy_regex!(
r"(?i)^<table(?:\s|>|$)",
"HTML table start pattern should compile"
);
/// Matches the end of an HTML `</table>` tag, ignoring case.
static TABLE_END_RE: LazyLock<Regex> = LazyLock::new(|| Regex::new(r"(?i)</table>").unwrap());
static TABLE_END_RE: LazyLock<Regex> =
lazy_regex!(r"(?i)</table>", "HTML table end pattern should compile");

/// Extracts the text content of a DOM node, collapsing consecutive
/// whitespace to single spaces.
Expand Down
7 changes: 7 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,13 @@
//! - `process` for stream processing.
//! - `io` for file helpers.

#[macro_export]
macro_rules! lazy_regex {
($re:expr, $msg:expr $(,)?) => {
std::sync::LazyLock::new(|| regex::Regex::new($re).expect($msg))
};
}

pub mod breaks;
pub mod ellipsis;
pub mod fences;
Expand Down
76 changes: 53 additions & 23 deletions src/wrap.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
//! `docs/architecture.md` and uses the `unicode-width` crate for accurate
//! display calculations.

use regex::Regex;
use regex::{Captures, Regex};

static FENCE_RE: std::sync::LazyLock<Regex> =
std::sync::LazyLock::new(|| Regex::new(r"^\s*(```|~~~).*").unwrap());
Expand All @@ -18,6 +18,42 @@ static FOOTNOTE_RE: std::sync::LazyLock<Regex> =
static BLOCKQUOTE_RE: std::sync::LazyLock<Regex> =
std::sync::LazyLock::new(|| Regex::new(r"^(\s*(?:>\s*)+)(.*)$").unwrap());

struct PrefixHandler {
re: &'static std::sync::LazyLock<Regex>,
is_bq: bool,
build_prefix: fn(&Captures) -> String,
rest_group: usize,
}

impl PrefixHandler {
fn build_bullet_prefix(cap: &Captures) -> String { cap[1].to_string() }

fn build_footnote_prefix(cap: &Captures) -> String { format!("{}{}", &cap[1], &cap[2]) }

fn build_blockquote_prefix(cap: &Captures) -> String { cap[1].to_string() }
}

static HANDLERS: &[PrefixHandler] = &[
PrefixHandler {
re: &BULLET_RE,
is_bq: false,
build_prefix: PrefixHandler::build_bullet_prefix,
rest_group: 2,
},
PrefixHandler {
re: &FOOTNOTE_RE,
is_bq: false,
build_prefix: PrefixHandler::build_footnote_prefix,
rest_group: 3,
},
PrefixHandler {
re: &BLOCKQUOTE_RE,
is_bq: true,
build_prefix: PrefixHandler::build_blockquote_prefix,
rest_group: 2,
},
];

/// Markdown token emitted by [`tokenize_markdown`].
#[derive(Debug, PartialEq)]
pub enum Token<'a> {
Expand Down Expand Up @@ -341,7 +377,7 @@ pub fn wrap_text(lines: &[String], width: usize) -> Vec<String> {
let mut indent = String::new();
let mut in_code = false;

for line in lines {
'line_loop: for line in lines {
if FENCE_RE.is_match(line) {
flush_paragraph(&mut out, &buf, &indent, width);
buf.clear();
Expand Down Expand Up @@ -380,27 +416,21 @@ pub fn wrap_text(lines: &[String], width: usize) -> Vec<String> {
continue;
}

if let Some(cap) = BULLET_RE.captures(line) {
let prefix = cap.get(1).unwrap().as_str();
let rest = cap.get(2).unwrap().as_str();
handle_prefix_line(&mut out, &mut buf, &mut indent, width, prefix, rest, false);
continue;
}

if let Some(cap) = FOOTNOTE_RE.captures(line) {
let indent_part = cap.get(1).unwrap().as_str();
let label_part = cap.get(2).unwrap().as_str();
let prefix = format!("{indent_part}{label_part}");
let rest = cap.get(3).unwrap().as_str();
handle_prefix_line(&mut out, &mut buf, &mut indent, width, &prefix, rest, false);
continue;
}

if let Some(cap) = BLOCKQUOTE_RE.captures(line) {
let prefix = cap.get(1).unwrap().as_str();
let rest = cap.get(2).unwrap().as_str();
handle_prefix_line(&mut out, &mut buf, &mut indent, width, prefix, rest, true);
continue;
for handler in HANDLERS {
if let Some(cap) = handler.re.captures(line) {
let prefix = (handler.build_prefix)(&cap);
let rest = cap.get(handler.rest_group).unwrap().as_str();
handle_prefix_line(
&mut out,
&mut buf,
&mut indent,
width,
&prefix,
rest,
handler.is_bq,
);
continue 'line_loop;
}
}

if buf.is_empty() {
Expand Down
Loading