'remove_field' option: also remove fldChar nodes by kdevkdev · Pull Request #707 · davidgohel/officer

kdevkdev · 2026-01-13T10:12:02Z

In docx_summary(), when remove_fields is true and detailed is true:

Recent changes made that empty nodes get converted to paragraphs containing NA when using docx_summary() , meaning that they will be represented by a row in the returned dataframe that has run_content_text set to NA. I think this is not a desired behaviour.

See code in docx_runs_content_information() in fortify_docx.R.

Therefore, the empty fldChar nodes also need to be removed since they got converted to to NA, to avoid NA's appearing in run_content_text using doc_summary.

Attached also a .docx with an example field to try:
example.docx

…ing in 'run_content_text' using doc_summary

kdevkdev · 2026-01-13T10:37:49Z

BTW great work with providing run data, potentially really useful!

davidgohel · 2026-01-14T21:39:59Z

thank you, using your example shows the issue you detected !

remove_fields = TRUE does not work as expected:

library(tibble)
library(officer)
curl::curl_download("https://github.com/user-attachments/files/24586594/example.docx", "example.docx")
doc <- read_docx("example.docx")
z1 <- docx_summary(doc, preserve = TRUE, remove_fields = FALSE, detailed = TRUE)
z2 <- docx_summary(doc, preserve = TRUE, remove_fields = TRUE, detailed = TRUE)
as_tibble(z1)
#> # A tibble: 6 × 37
#>   doc_index content_type run_index run_content_index run_content_text image_path
#>       <int> <chr>            <int>             <int> <chr>            <chr>     
#> 1         1 paragraph            1                 1 " "              <NA>      
#> 2         1 paragraph            2                 1  <NA>            <NA>      
#> 3         1 paragraph            3                 1 "ADDIN ZOTERO_I… <NA>      
#> 4         1 paragraph            4                 1  <NA>            <NA>      
#> 5         1 paragraph            5                 1 "[1,2]"          <NA>      
#> 6         1 paragraph            6                 1  <NA>            <NA>      
#> # ℹ 31 more variables: field_code <chr>, footnote_text <chr>, link <chr>,
#> #   link_to_bookmark <chr>, bookmark_start <chr>, character_stylename <chr>,
#> #   sz <int>, sz_cs <int>, font_family_ascii <chr>, font_family_eastasia <chr>,
#> #   font_family_hansi <chr>, font_family_cs <chr>, bold <lgl>, italic <lgl>,
#> #   underline <lgl>, color <chr>, shading <chr>, shading_color <chr>,
#> #   shading_fill <chr>, paragraph_stylename <chr>, keep_with_next <lgl>,
#> #   align <chr>, level <int>, num_id <int>, table_index <int>, row_id <int>, …
as_tibble(z2)
#> # A tibble: 5 × 37
#>   doc_index content_type run_index run_content_index run_content_text image_path
#>       <int> <chr>            <int>             <int> <chr>            <chr>     
#> 1         1 paragraph            1                 1 " "              <NA>      
#> 2         1 paragraph            2                 1  <NA>            <NA>      
#> 3         1 paragraph            3                 1  <NA>            <NA>      
#> 4         1 paragraph            4                 1 "[1,2]"          <NA>      
#> 5         1 paragraph            5                 1  <NA>            <NA>      
#> # ℹ 31 more variables: field_code <chr>, footnote_text <chr>, link <chr>,
#> #   link_to_bookmark <chr>, bookmark_start <chr>, character_stylename <chr>,
#> #   sz <int>, sz_cs <int>, font_family_ascii <chr>, font_family_eastasia <chr>,
#> #   font_family_hansi <chr>, font_family_cs <chr>, bold <lgl>, italic <lgl>,
#> #   underline <lgl>, color <chr>, shading <chr>, shading_color <chr>,
#> #   shading_fill <chr>, paragraph_stylename <chr>, keep_with_next <lgl>,
#> #   align <chr>, level <int>, num_id <int>, table_index <int>, row_id <int>, …

^{Created on 2026-01-14 with reprex v2.1.1}

With your fix, we will see the expected output:

library(tibble)
library(officer)
curl::curl_download("https://github.com/user-attachments/files/24586594/example.docx", "example.docx")
doc <- read_docx("example.docx")
z1 <- docx_summary(doc, preserve = TRUE, remove_fields = FALSE, detailed = TRUE)
z2 <- docx_summary(doc, preserve = TRUE, remove_fields = TRUE, detailed = TRUE)
as_tibble(z1)
#> # A tibble: 6 × 37
#>   doc_index content_type run_index run_content_index run_content_text image_path
#>       <int> <chr>            <int>             <int> <chr>            <chr>     
#> 1         1 paragraph            1                 1 " "              <NA>      
#> 2         1 paragraph            2                 1  <NA>            <NA>      
#> 3         1 paragraph            3                 1 "ADDIN ZOTERO_I… <NA>      
#> 4         1 paragraph            4                 1  <NA>            <NA>      
#> 5         1 paragraph            5                 1 "[1,2]"          <NA>      
#> 6         1 paragraph            6                 1  <NA>            <NA>      
#> # ℹ 31 more variables: field_code <chr>, footnote_text <chr>, link <chr>,
#> #   link_to_bookmark <chr>, bookmark_start <chr>, character_stylename <chr>,
#> #   sz <int>, sz_cs <int>, font_family_ascii <chr>, font_family_eastasia <chr>,
#> #   font_family_hansi <chr>, font_family_cs <chr>, bold <lgl>, italic <lgl>,
#> #   underline <lgl>, color <chr>, shading <chr>, shading_color <chr>,
#> #   shading_fill <chr>, paragraph_stylename <chr>, keep_with_next <lgl>,
#> #   align <chr>, level <int>, num_id <int>, table_index <int>, row_id <int>, …
as_tibble(z2)
#> # A tibble: 2 × 37
#>   doc_index content_type run_index run_content_index run_content_text image_path
#>       <int> <chr>            <int>             <int> <chr>            <chr>     
#> 1         1 paragraph            1                 1 " "              <NA>      
#> 2         1 paragraph            2                 1 "[1,2]"          <NA>      
#> # ℹ 31 more variables: field_code <chr>, footnote_text <chr>, link <chr>,
#> #   link_to_bookmark <chr>, bookmark_start <chr>, character_stylename <chr>,
#> #   sz <int>, sz_cs <int>, font_family_ascii <chr>, font_family_eastasia <chr>,
#> #   font_family_hansi <chr>, font_family_cs <chr>, bold <lgl>, italic <lgl>,
#> #   underline <lgl>, color <chr>, shading <chr>, shading_color <chr>,
#> #   shading_fill <chr>, paragraph_stylename <chr>, keep_with_next <lgl>,
#> #   align <chr>, level <int>, num_id <int>, table_index <int>, row_id <int>, …

^{Created on 2026-01-14 with reprex v2.1.1}

'remove_field' option: also remove fldChar nodes to avoid NA's appear…

9e1b53d

…ing in 'run_content_text' using doc_summary

davidgohel merged commit e9f3956 into davidgohel:master Jan 14, 2026
0 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'remove_field' option: also remove fldChar nodes #707

'remove_field' option: also remove fldChar nodes #707
davidgohel merged 1 commit intodavidgohel:masterfrom
kdevkdev:master

kdevkdev commented Jan 13, 2026

Uh oh!

kdevkdev commented Jan 13, 2026

Uh oh!

davidgohel commented Jan 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kdevkdev commented Jan 13, 2026

Uh oh!

kdevkdev commented Jan 13, 2026

Uh oh!

davidgohel commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

davidgohel commented Jan 14, 2026 •

edited

Loading