Skip to content

Commit af9f90a

Browse files
authored
table caption fixes (#52)
1 parent 7778644 commit af9f90a

File tree

23 files changed

+31501
-31413
lines changed

23 files changed

+31501
-31413
lines changed

crates/quarto-markdown-pandoc/CLAUDE.md

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,18 @@ need to be done with the traversal helpers in traversals.rs.
1919
- **IMPORTANT**: When making changes to the code, ALWAYS run both `cargo check` AND `cargo test` to ensure changes compile and don't affect behavior. The test suite is fast enough to run after each change. Never skip running `cargo test` - it must always be executed together with `cargo check`.
2020
- **CRITICAL**: Do NOT assume changes are safe if ANY tests fail, even if they seem unrelated. Some tests require pandoc to be properly installed to pass. Always ensure ALL tests pass before and after changes.
2121

22-
## Environment setup
22+
## **CRITICAL**: HOW TO DO CODING WORK IN THIS REPO
2323

24-
- Rust toolchain is installed at `/home/claude-sandbox/.cargo/bin`
25-
- Pandoc is installed at `/home/claude-sandbox/local/bin`
24+
Whenever you start working on a coding task, follow these steps:
25+
26+
- Make a plan to yourself.
27+
- The plan should include adding appropriate tests to the test suite.
28+
- Before implementing the feature, write the test that you think should fail, and ensure that the test fails the way you expect to!
29+
- Work on the plan item by item.
30+
- You are not done until the test you wrote passes.
31+
- You are not done until the test you wrote is integrated to our test suite.
32+
- If you run out of ideas and still can't make the test pass, do not erase the test. Report back to me and we will work on it together.
33+
- If in the process of writing tests you run into an unexpected parse error, store it in a separate file and report it to me. We're still improving the parser and it's possible that you will run into bugs.
2634

2735
# Error messages
2836

@@ -44,6 +52,7 @@ After changing any of the resources/error-corpus/*.{json,qmd} files, run the scr
4452

4553
The `quarto-markdown-pandoc` binary accepts the following options:
4654
- `-t, --to <TO>`: Output format (default: native)
55+
- `-f, --from <FROM>`: Input format (default: qmd)
4756
- `-v, --verbose`: Verbose output
4857
- `-i, --input <INPUT>`: Input file (default: stdin)
4958
- `--loose`: Loose parsing mode
@@ -52,8 +61,8 @@ The `quarto-markdown-pandoc` binary accepts the following options:
5261

5362
## Instructions
5463

55-
- in this repository, "qmd" means "quarto markdown", the dialect of markdown we are developing. Although we aim to be largely compatible with Pandoc, it is not necessarily the case that a discrepancy in the behavior is a bug.
56-
- the qmd format only supports the inline syntax for a link [link](./target.html), and not the reference-style syntax [link][1].
64+
- In this repository, "qmd" means "quarto markdown", the dialect of markdown we are developing. Although we aim to be largely compatible with Pandoc, it is not necessarily the case that a discrepancy in the behavior is a bug.
65+
- The qmd format only supports the inline syntax for a link [link](./target.html), and not the reference-style syntax [link][1].
5766
- Always strive for test documents as small as possible. Prefer a large number of small test documents instead of small number of large documents.
5867
- When fixing bugs, always try to isolate and fix one bug at a time.
5968
- When fixing bugs using tests, run the failing test before attempting to fix issues. This helps ensuring that tests are exercising the failure as expected, and fixes actually fix the particular issue.

crates/quarto-markdown-pandoc/src/filters.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -742,6 +742,12 @@ pub fn topdown_traverse_block(block: Block, filter: &mut Filter) -> Blocks {
742742
},
743743
)]
744744
}
745+
Block::CaptionBlock(_) => {
746+
// CaptionBlock should have been removed by postprocessing
747+
panic!(
748+
"CaptionBlock found in filter - should have been processed during postprocessing"
749+
)
750+
}
745751
}
746752
}
747753

crates/quarto-markdown-pandoc/src/pandoc/block.rs

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ pub enum Block {
3535
BlockMetadata(MetaBlock),
3636
NoteDefinitionPara(NoteDefinitionPara),
3737
NoteDefinitionFencedBlock(NoteDefinitionFencedBlock),
38+
CaptionBlock(CaptionBlock),
3839
}
3940

4041
pub type Blocks = Vec<Block>;
@@ -144,6 +145,12 @@ pub struct NoteDefinitionFencedBlock {
144145
pub source_info: SourceInfo,
145146
}
146147

148+
#[derive(Debug, Clone, PartialEq)]
149+
pub struct CaptionBlock {
150+
pub content: Inlines,
151+
pub source_info: SourceInfo,
152+
}
153+
147154
impl_source_location!(
148155
// blocks
149156
Plain,
@@ -163,7 +170,8 @@ impl_source_location!(
163170
// quarto extensions
164171
MetaBlock,
165172
NoteDefinitionPara,
166-
NoteDefinitionFencedBlock
173+
NoteDefinitionFencedBlock,
174+
CaptionBlock
167175
);
168176

169177
fn make_block_leftover(node: &tree_sitter::Node, input_bytes: &[u8]) -> Block {

crates/quarto-markdown-pandoc/src/pandoc/treesitter.rs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ use crate::pandoc::treesitter_utils::attribute::process_attribute;
88
use crate::pandoc::treesitter_utils::atx_heading::process_atx_heading;
99
use crate::pandoc::treesitter_utils::backslash_escape::process_backslash_escape;
1010
use crate::pandoc::treesitter_utils::block_quote::process_block_quote;
11+
use crate::pandoc::treesitter_utils::caption::process_caption;
1112
use crate::pandoc::treesitter_utils::citation::process_citation;
1213
use crate::pandoc::treesitter_utils::code_fence_content::process_code_fence_content;
1314
use crate::pandoc::treesitter_utils::code_span::process_code_span;
@@ -86,6 +87,7 @@ fn get_block_source_info(block: &Block) -> &SourceInfo {
8687
Block::BlockMetadata(b) => &b.source_info,
8788
Block::NoteDefinitionPara(b) => &b.source_info,
8889
Block::NoteDefinitionFencedBlock(b) => &b.source_info,
90+
Block::CaptionBlock(b) => &b.source_info,
8991
}
9092
}
9193

@@ -733,7 +735,7 @@ fn native_visitor<T: Write>(
733735
}
734736
"pipe_table_delimiter_row" => process_pipe_table_delimiter_row(children, context),
735737
"pipe_table_cell" => process_pipe_table_cell(node, children, context),
736-
"table_caption" => PandocNativeIntermediate::IntermediateInlines(native_inlines(children)),
738+
"caption" => process_caption(node, children, context),
737739
"pipe_table" => process_pipe_table(node, children, context),
738740
"setext_h1_underline" => PandocNativeIntermediate::IntermediateSetextHeadingLevel(1),
739741
"setext_h2_underline" => PandocNativeIntermediate::IntermediateSetextHeadingLevel(2),
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
/*
2+
* caption.rs
3+
*
4+
* Functions for processing caption nodes in the tree-sitter AST.
5+
*
6+
* Copyright (c) 2025 Posit, PBC
7+
*/
8+
9+
use crate::pandoc::ast_context::ASTContext;
10+
use crate::pandoc::block::{Block, CaptionBlock};
11+
use crate::pandoc::inline::Inlines;
12+
use crate::pandoc::location::node_source_info_with_context;
13+
14+
use super::pandocnativeintermediate::PandocNativeIntermediate;
15+
16+
pub fn process_caption(
17+
node: &tree_sitter::Node,
18+
children: Vec<(String, PandocNativeIntermediate)>,
19+
context: &ASTContext,
20+
) -> PandocNativeIntermediate {
21+
let mut caption_inlines: Inlines = Vec::new();
22+
23+
for (node_name, child) in children {
24+
if node_name == "inline" {
25+
match child {
26+
PandocNativeIntermediate::IntermediateInlines(inlines) => {
27+
caption_inlines.extend(inlines);
28+
}
29+
_ => panic!("Expected Inlines in caption, got {:?}", child),
30+
}
31+
}
32+
// Skip other nodes like ":", blank_line, etc.
33+
}
34+
35+
PandocNativeIntermediate::IntermediateBlock(Block::CaptionBlock(CaptionBlock {
36+
content: caption_inlines,
37+
source_info: node_source_info_with_context(node, context),
38+
}))
39+
}

crates/quarto-markdown-pandoc/src/pandoc/treesitter_utils/mod.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ pub mod attribute;
77
pub mod atx_heading;
88
pub mod backslash_escape;
99
pub mod block_quote;
10+
pub mod caption;
1011
pub mod citation;
1112
pub mod code_fence_content;
1213
pub mod code_span;

crates/quarto-markdown-pandoc/src/pandoc/treesitter_utils/postprocess.rs

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ use crate::filters::{
77
Filter, FilterReturn::FilterResult, FilterReturn::Unchanged, topdown_traverse,
88
};
99
use crate::pandoc::attr::{Attr, is_empty_attr};
10-
use crate::pandoc::block::{Block, DefinitionList, Div, Figure, Plain};
10+
use crate::pandoc::block::{Block, Blocks, DefinitionList, Div, Figure, Plain};
1111
use crate::pandoc::caption::Caption;
1212
use crate::pandoc::inline::{Inline, Inlines, Space, Span, Str, Superscript};
1313
use crate::pandoc::location::{Range, SourceInfo, empty_range, empty_source_info};
@@ -620,6 +620,42 @@ pub fn postprocess(doc: Pandoc) -> Result<Pandoc, Vec<String>> {
620620
attr
621621
));
622622
FilterResult(vec![], false)
623+
})
624+
.with_blocks(|blocks| {
625+
// Process CaptionBlock nodes: attach to preceding tables or issue warnings
626+
let mut result: Blocks = Vec::new();
627+
628+
for block in blocks {
629+
// Check if current block is a CaptionBlock
630+
if let Block::CaptionBlock(caption_block) = block {
631+
// Look for a preceding Table
632+
if let Some(Block::Table(table)) = result.last_mut() {
633+
// Attach caption to the table
634+
table.caption = Caption {
635+
short: None,
636+
long: Some(vec![Block::Plain(Plain {
637+
content: caption_block.content.clone(),
638+
source_info: caption_block.source_info.clone(),
639+
})]),
640+
};
641+
// Don't add the CaptionBlock to the result (it's now attached)
642+
} else {
643+
// TODO: Issue a warning/error when proper error infrastructure is ready
644+
// For now, print a warning to stderr
645+
eprintln!(
646+
"Warning: Caption found without a preceding table at {}:{}",
647+
caption_block.source_info.range.start.row + 1,
648+
caption_block.source_info.range.start.column + 1
649+
);
650+
// Remove the caption from the output (don't add to result)
651+
}
652+
} else {
653+
// Not a CaptionBlock, add it to result
654+
result.push(block);
655+
}
656+
}
657+
658+
FilterResult(result, true)
623659
});
624660
topdown_traverse(doc, &mut filter)
625661
};

crates/quarto-markdown-pandoc/src/writers/json.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -395,6 +395,11 @@ fn write_block(block: &Block) -> Value {
395395
"c": [refdef.id, write_blocks(&refdef.content)],
396396
"l": write_location(refdef),
397397
}),
398+
Block::CaptionBlock(_) => {
399+
panic!(
400+
"CaptionBlock found in JSON writer - should have been processed during postprocessing"
401+
)
402+
}
398403
}
399404
}
400405

crates/quarto-markdown-pandoc/src/writers/qmd.rs

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -680,6 +680,23 @@ fn write_table(table: &Table, buf: &mut dyn std::io::Write) -> std::io::Result<(
680680
}
681681
}
682682

683+
// Write caption if it exists
684+
if let Some(ref long_caption) = table.caption.long {
685+
if !long_caption.is_empty() {
686+
writeln!(buf)?; // Blank line before caption
687+
for block in long_caption {
688+
// Extract inline content from Plain blocks in caption
689+
if let Block::Plain(plain) = block {
690+
write!(buf, ": ")?;
691+
for inline in &plain.content {
692+
write_inline(inline, buf)?;
693+
}
694+
writeln!(buf)?;
695+
}
696+
}
697+
}
698+
}
699+
683700
Ok(())
684701
}
685702

@@ -1183,6 +1200,11 @@ fn write_block(block: &crate::pandoc::Block, buf: &mut dyn std::io::Write) -> st
11831200
Block::NoteDefinitionFencedBlock(refdef) => {
11841201
write_fenced_note_definition(refdef, buf)?;
11851202
}
1203+
Block::CaptionBlock(_) => {
1204+
panic!(
1205+
"CaptionBlock found in QMD writer - should have been processed during postprocessing"
1206+
)
1207+
}
11861208
}
11871209
Ok(())
11881210
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
| Name | Age |
2+
|-------|-----|
3+
| Alice | 30 |
4+
| Bob | 25 |
5+
6+
: Sample table caption

0 commit comments

Comments
 (0)