Skip to content

Commit f0327c8

Browse files
tarlebargent0
andcommitted
Support complex figures. [API change]
Thanks and credit go to Aner Lucero, who laid the groundwork for this feature in the 2021 GSoC project. He contributed many changes, including modifications to the readers for HTML, JATS, and LaTeX, and to the HTML and JATS writers. Shared (Albert Krewinkel): - The new function `figureDiv`, exported from `Text.Pandoc.Shared`, offers a standardized way to convert a figure into a Div element. Readers (Aner Lucero): - HTML reader: `<figure>` elements are parsed as figures, with the caption taken from the respective `<figcaption>` elements. - JATS reader: The `<fig>` and `<caption>` elements are parsed into figure elements, even if the contents is more complex. - LaTeX reader: support for figures with non-image contents and for subfigures. - Markdown reader: paragraphs containing just an image are treated as figures if the `implicit_figures` extension is enabled. The identifier is used as the figure's identifier and the image description is also used as figure caption; all other attributes are treated as belonging to the image. Writers (Aner Lucero, Albert Krewinkel): - DokuWiki, Haddock, Jira, Man, MediaWiki, Ms, Muse, PPTX, RTF, TEI, ZimWiki writers: Figures are rendered like Div elements. - Asciidoc writer: The figure contents is unwrapped; each image in the the figure becomes a separate figure. - Classic custom writers: Figures are passed to the global function `Figure(caption, contents, attr)`, where `caption` and `contents` are strings and `attr` is a table of key-value pairs. - ConTeXt writer: Figures are wrapped in a "placefigure" environment with `\startplacefigure`/`\endplacefigure`, adding the features caption and listing title as properties. Subfigures are place in a single row with the `\startfloatcombination` environment. - DocBook writer: Uses `mediaobject` elements, unless the figure contains subfigures or tables, in which case the figure content is unwrapped. - Docx writer: figures with multiple content blocks are rendered as tables with style `FigureTable`; like before, single-image figures are still output as paragraphs with style `Figure` or `Captioned Figure`, depending on whether a caption is attached. - DokuWiki writer: Caption and "alt-text" are no longer combined. The alt text of a figure will now be lost in the conversion. - FB2 writer: The figure caption is added as alt text to the images in the figure; pre-existing alt texts are kept. - ICML writer: Only single-image figures are supported. The contents of figures with additional elements gets unwrapped. - HTML writer: the alt text is no longer constructed from the caption, as was the case with implicit figures. This reduces duplication, but comes at the risk of images that are missing alt texts. Authors should take care to provide alt texts for all images. Some readers, most notably the Markdown reader with the `implicit_figures` extension, add a caption that's identical to the image description. The writer checks for this and adds an `aria-hidden` attribute to the `<figcaption>` element in that case. - JATS writer: The `<fig>` and `<caption>` elements are used write figures. - LaTeX writer: complex figures, e.g. with non-image contents and subfigures, are supported. The `subfigure` template variable is set if the document contains subfigures, triggering the conditional loading of the *subcaption* package. Contants of figures that contain tables are become unwrapped, as longtable environments are not allowed within figures. - Markdown writer: figures are output as implicit figures if possible, via HTML if the `raw_html` extension is enabled, and as Div elements otherwise. - OpenDocument writer: A separate paragraph is generated for each block element in a figure, each with style `FigureWithCaption`. Behavior for single-image figures therefore remains unchanged. - Org writer: Only the first element in a figure is given a caption; additional block elements in the figure are appended without any caption being added. - RST writer: Single-image figures are supported as before; the contents of more complex images become nested in a container of type `float`. - Texinfo writer: Figures are rendered as float with type `figure`. - Textile writer: Figures are rendered with the help of HTML elements. - XWiki: Figures are placed in a group. Co-authored-by: Aner Lucero <[email protected]>
1 parent 4db5b05 commit f0327c8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

98 files changed

+1492
-595
lines changed

cabal.project

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,13 @@ source-repository-package
1616
type: git
1717
location: https://github.com/jgm/texmath
1818
tag: 1a77db688bd3285228299e5aeefc93d6c0d8c0b9
19+
20+
source-repository-package
21+
type: git
22+
location: https://github.com/tarleb/pandoc-types
23+
tag: f84b7359765a2798f22efe4e9457538cda7a8d4a
24+
25+
source-repository-package
26+
type: git
27+
location: https://github.com/pandoc/pandoc-lua-marshal
28+
tag: a2a97e2af78326ea7841101d4ef56e74426b66c4

data/templates/default.latex

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -293,6 +293,9 @@ $if(numbersections)$
293293
$else$
294294
\setcounter{secnumdepth}{-\maxdimen} % remove section numbering
295295
$endif$
296+
$if(subfigure)$
297+
\usepackage{subcaption}
298+
$endif$
296299
$if(beamer)$
297300
$else$
298301
$if(block-headings)$

pandoc-lua-engine/src/Text/Pandoc/Lua/Writer/Classic.hs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,12 @@ blockToCustom (CodeBlock attr str) =
157157
blockToCustom (BlockQuote blocks) =
158158
invoke "BlockQuote" (Stringify blocks)
159159

160+
blockToCustom (Figure attr (Caption _ cbody) content) =
161+
invoke "Figure"
162+
(Stringify cbody)
163+
(Stringify content)
164+
(attrToMap attr)
165+
160166
blockToCustom (Table _ blkCapt specs thead tbody tfoot) =
161167
let (capt, aligns, widths, headers, rows) = toLegacyTable blkCapt specs thead tbody tfoot
162168
aligns' = map show aligns

pandoc-lua-engine/test/sample.lua

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -295,6 +295,12 @@ function CaptionedImage(src, tit, caption, attr)
295295
end
296296
end
297297

298+
function Figure(caption, contents, attr)
299+
return '<figure' .. attributes(attr) .. '>\n' .. contents ..
300+
'\n<figcaption>' .. caption .. '</figcaption>\n' ..
301+
'</figure>'
302+
end
303+
298304
-- Caption is a string, aligns is an array of strings,
299305
-- widths is an array of floats, headers is an array of
300306
-- strings, rows is an array of arrays of strings.

pandoc-lua-engine/test/writer.custom

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -737,7 +737,8 @@ So is &lsquo;pine.&rsquo;</p>
737737
<p>From &ldquo;Voyage dans la Lune&rdquo; by Georges Melies (1902):</p>
738738

739739
<figure>
740-
<img src="lalune.jpg" id="" alt="lalune"/><figcaption>lalune</figcaption>
740+
<img src="lalune.jpg" title="Voyage dans la Lune"/>
741+
<figcaption>lalune</figcaption>
741742
</figure>
742743

743744
<p>Here is a movie <img src="movie.jpg" title=""/> icon.</p>

src/Text/Pandoc/Readers/HTML.hs

Lines changed: 13 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ module Text.Pandoc.Readers.HTML ( readHtml
2525
) where
2626

2727
import Control.Applicative ((<|>))
28-
import Control.Monad (guard, msum, mzero, unless, void)
28+
import Control.Monad (guard, mzero, unless, void)
2929
import Control.Monad.Except (throwError, catchError)
3030
import Control.Monad.Reader (ask, asks, lift, local, runReaderT)
3131
import Data.Text.Encoding.Base64 (encodeBase64)
@@ -36,6 +36,7 @@ import Data.List.Split (splitWhen)
3636
import Data.List (foldl')
3737
import qualified Data.Map as M
3838
import Data.Maybe (fromMaybe, isJust, isNothing)
39+
import Data.Either (partitionEithers)
3940
import Data.Monoid (First (..))
4041
import qualified Data.Set as Set
4142
import Data.Text (Text)
@@ -63,8 +64,8 @@ import Text.Pandoc.Options (
6364
extensionEnabled)
6465
import Text.Pandoc.Parsing hiding ((<|>))
6566
import Text.Pandoc.Shared (
66-
addMetaField, blocksToInlines', extractSpaces,
67-
htmlSpanLikeElements, renderTags', safeRead, tshow, formatCode)
67+
addMetaField, extractSpaces, htmlSpanLikeElements, renderTags',
68+
safeRead, tshow, formatCode)
6869
import Text.Pandoc.URI (escapeURI)
6970
import Text.Pandoc.Walk
7071
import Text.TeXMath (readMathML, writeTeX)
@@ -581,24 +582,15 @@ pPara = do
581582
<|> return (B.para contents)
582583

583584
pFigure :: PandocMonad m => TagParser m Blocks
584-
pFigure = try $ do
585-
TagOpen _ _ <- pSatisfy (matchTagOpen "figure" [])
586-
skipMany pBlank
587-
let pImg = (\x -> (Just x, Nothing)) <$>
588-
(pInTag TagsOmittable "p" pImage <* skipMany pBlank)
589-
pCapt = (\x -> (Nothing, Just x)) <$> do
590-
bs <- pInTags "figcaption" block
591-
return $ blocksToInlines' $ B.toList bs
592-
pSkip = (Nothing, Nothing) <$ pSatisfy (not . matchTagClose "figure")
593-
res <- many (pImg <|> pCapt <|> pSkip)
594-
let mbimg = msum $ map fst res
595-
let mbcap = msum $ map snd res
596-
TagClose _ <- pSatisfy (matchTagClose "figure")
597-
let caption = fromMaybe mempty mbcap
598-
case B.toList <$> mbimg of
599-
Just [Image attr _ (url, tit)] ->
600-
return $ B.simpleFigureWith attr caption url tit
601-
_ -> mzero
585+
pFigure = do
586+
TagOpen tag attrList <- pSatisfy $ matchTagOpen "figure" []
587+
let parser = Left <$> pInTags "figcaption" block <|>
588+
(Right <$> block)
589+
(captions, rest) <- partitionEithers <$> manyTill parser (pCloses tag <|> eof)
590+
-- Concatenate all captions together
591+
return $ B.figureWith (toAttr attrList)
592+
(B.simpleCaption (mconcat captions))
593+
(mconcat rest)
602594

603595
pCodeBlock :: PandocMonad m => TagParser m Blocks
604596
pCodeBlock = try $ do

src/Text/Pandoc/Readers/JATS.hs

Lines changed: 11 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ import Text.TeXMath (readMathML, writeTeX)
3838
import qualified Data.Set as S (fromList, member)
3939
import Data.Set ((\\))
4040
import Text.Pandoc.Sources (ToSources(..), sourcesToText)
41-
import qualified Data.Foldable as DF
4241

4342
type JATS m = StateT JATSState m
4443

@@ -232,29 +231,17 @@ parseBlock (Elem e) =
232231
terms' <- mapM getInlines terms
233232
items' <- mapM getBlocks items
234233
return (mconcat $ intersperse (str "; ") terms', items')
235-
parseFigure =
236-
-- if a simple caption and single graphic, we emit a standard
237-
-- implicit figure. otherwise, we emit a div with the contents
238-
case filterChildren (named "graphic") e of
239-
[g] -> do
240-
capt <- case filterChild (named "caption") e of
241-
Just t -> mconcat .
242-
intersperse linebreak <$>
243-
mapM getInlines
244-
(filterChildren (const True) t)
245-
Nothing -> return mempty
246-
247-
let figAttributes = DF.toList $
248-
("alt", ) . strContent <$>
249-
filterChild (named "alt-text") e
250-
251-
return $ simpleFigureWith
252-
(attrValue "id" e, [], figAttributes)
253-
capt
254-
(attrValue "href" g)
255-
(attrValue "title" g)
256-
257-
_ -> divWith (attrValue "id" e, ["fig"], []) <$> getBlocks e
234+
parseFigure = do
235+
capt <- case filterChild (named "caption") e of
236+
Just t -> mconcat . intersperse linebreak <$>
237+
mapM getInlines (filterChildren (const True) t)
238+
Nothing -> return mempty
239+
contents <- getBlocks e
240+
241+
return $ figureWith
242+
(attrValue "id" e, [], [])
243+
(simpleCaption $ plain capt)
244+
contents
258245
parseFootnoteGroup = do
259246
forM_ (filterChildren (named "fn") e) $ \fn -> do
260247
let id' = attrValue "id" fn

src/Text/Pandoc/Readers/LaTeX.hs

Lines changed: 29 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ import Data.Maybe (fromMaybe, maybeToList)
3333
import qualified Data.Set as Set
3434
import Data.Text (Text)
3535
import qualified Data.Text as T
36+
import Data.Either (partitionEithers)
3637
import Skylighting (defaultSyntaxMap)
3738
import System.FilePath (addExtension, replaceExtension, takeExtension)
3839
import Text.Collate.Lang (renderLang)
@@ -1011,8 +1012,8 @@ environments = M.union (tableEnvironments blocks inline) $
10111012
, ("letter", env "letter" letterContents)
10121013
, ("minipage", env "minipage" $
10131014
skipopts *> spaces *> optional braced *> spaces *> blocks)
1014-
, ("figure", env "figure" $ skipopts *> figure)
1015-
, ("subfigure", env "subfigure" $ skipopts *> tok *> figure)
1015+
, ("figure", env "figure" $ skipopts *> figure')
1016+
, ("subfigure", env "subfigure" $ skipopts *> tok *> figure')
10161017
, ("center", divWith ("", ["center"], []) <$> env "center" blocks)
10171018
, ("quote", blockQuote <$> env "quote" blocks)
10181019
, ("quotation", blockQuote <$> env "quotation" blocks)
@@ -1164,37 +1165,33 @@ letterContents = do
11641165
_ -> mempty
11651166
return $ addr <> bs -- sig added by \closing
11661167

1167-
figure :: PandocMonad m => LP m Blocks
1168-
figure = try $ do
1168+
figure' :: PandocMonad m => LP m Blocks
1169+
figure' = try $ do
11691170
resetCaption
1170-
blocks >>= addImageCaption
1171-
1172-
addImageCaption :: PandocMonad m => Blocks -> LP m Blocks
1173-
addImageCaption = walkM go
1174-
where go p@(Para [Image attr@(_, cls, kvs) _ (src, tit)])
1175-
| not ("fig:" `T.isPrefixOf` tit) = do
1176-
st <- getState
1177-
case sCaption st of
1178-
Nothing -> return p
1179-
Just (Caption _mbshort bs) -> do
1180-
let mblabel = sLastLabel st
1181-
let attr' = case mblabel of
1182-
Just lab -> (lab, cls, kvs)
1183-
Nothing -> attr
1184-
case attr' of
1185-
("", _, _) -> return ()
1186-
(ident, _, _) -> do
1187-
num <- getNextNumber sLastFigureNum
1188-
setState
1189-
st{ sLastFigureNum = num
1190-
, sLabels = M.insert ident
1191-
[Str (renderDottedNum num)] (sLabels st) }
1192-
1193-
return $ SimpleFigure attr'
1194-
(maybe id removeLabel mblabel
1195-
(blocksToInlines bs))
1196-
(src, tit)
1197-
go x = return x
1171+
innerContent <- many $ try (Left <$> label) <|> (Right <$> block)
1172+
let content = walk go $ mconcat $ snd $ partitionEithers innerContent
1173+
st <- getState
1174+
let caption' = case sCaption st of
1175+
Nothing -> B.emptyCaption
1176+
Just capt -> capt
1177+
let mblabel = sLastLabel st
1178+
let attr = case mblabel of
1179+
Just lab -> (lab, [], [])
1180+
Nothing -> nullAttr
1181+
case mblabel of
1182+
Nothing -> pure ()
1183+
Just lab -> do
1184+
num <- getNextNumber sLastFigureNum
1185+
setState
1186+
st { sLastFigureNum = num
1187+
, sLabels = M.insert lab [Str (renderDottedNum num)] (sLabels st)
1188+
}
1189+
return $ B.figureWith attr caption' content
1190+
1191+
where
1192+
-- Remove the `Image` caption b.c. it's on the `Figure`
1193+
go (Para [Image attr _ target]) = Plain [Image attr [] target]
1194+
go x = x
11981195

11991196
coloredBlock :: PandocMonad m => Text -> LP m Blocks
12001197
coloredBlock stylename = try $ do

src/Text/Pandoc/Readers/LaTeX/Math.hs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,8 @@ addQed bs =
214214
qedSign = B.str "\xa0\x25FB"
215215

216216
italicize :: Block -> Block
217-
italicize x@(Para [Image{}]) = x -- see #6925
217+
italicize x@(Para [Image{}]) = x -- see #6925
218+
italicize x@(Plain [Image{}]) = x -- ditto
218219
italicize (Para ils) = Para [Emph ils]
219220
italicize (Plain ils) = Plain [Emph ils]
220221
italicize x = x

src/Text/Pandoc/Readers/Markdown.hs

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1046,7 +1046,7 @@ para = try $ do
10461046
[Image attr figCaption (src, tit)]
10471047
| extensionEnabled Ext_implicit_figures exts
10481048
, not (null figCaption) -> do
1049-
B.simpleFigureWith attr (B.fromList figCaption) src tit
1049+
implicitFigure attr (B.fromList figCaption) src tit
10501050

10511051
_ -> constr inlns
10521052

@@ -1077,6 +1077,17 @@ para = try $ do
10771077
plain :: PandocMonad m => MarkdownParser m (F Blocks)
10781078
plain = fmap B.plain . trimInlinesF <$> inlines1
10791079

1080+
implicitFigure :: Attr -> Inlines -> Text -> Text -> Blocks
1081+
implicitFigure (ident, classes, attribs) capt url title =
1082+
let alt = case "alt" `lookup` attribs of
1083+
Just alt' -> B.text alt'
1084+
_ -> capt
1085+
attribs' = filter ((/= "alt") . fst) attribs
1086+
figattr = (ident, mempty, mempty)
1087+
caption = B.simpleCaption $ B.plain capt
1088+
figbody = B.plain $ B.imageWith ("", classes, attribs') url title alt
1089+
in B.figureWith figattr caption figbody
1090+
10801091
--
10811092
-- raw html
10821093
--

0 commit comments

Comments
 (0)