Skip to content

'remove_field' option: also remove fldChar nodes #707

Merged
davidgohel merged 1 commit intodavidgohel:masterfrom
kdevkdev:master
Jan 14, 2026
Merged

'remove_field' option: also remove fldChar nodes #707
davidgohel merged 1 commit intodavidgohel:masterfrom
kdevkdev:master

Conversation

@kdevkdev
Copy link
Contributor

In docx_summary(), when remove_fields is true and detailed is true:

Recent changes made that empty nodes get converted to paragraphs containing NA when using docx_summary() , meaning that they will be represented by a row in the returned dataframe that has run_content_text set to NA. I think this is not a desired behaviour.

See code in docx_runs_content_information() in fortify_docx.R.

Therefore, the empty fldChar nodes also need to be removed since they got converted to to NA, to avoid NA's appearing in run_content_text using doc_summary.

Attached also a .docx with an example field to try:
example.docx

@kdevkdev
Copy link
Contributor Author

BTW great work with providing run data, potentially really useful!

@davidgohel
Copy link
Owner

davidgohel commented Jan 14, 2026

thank you, using your example shows the issue you detected !

remove_fields = TRUE does not work as expected:

library(tibble)
library(officer)
curl::curl_download("https://github.com/user-attachments/files/24586594/example.docx", "example.docx")
doc <- read_docx("example.docx")
z1 <- docx_summary(doc, preserve = TRUE, remove_fields = FALSE, detailed = TRUE)
z2 <- docx_summary(doc, preserve = TRUE, remove_fields = TRUE, detailed = TRUE)
as_tibble(z1)
#> # A tibble: 6 × 37
#>   doc_index content_type run_index run_content_index run_content_text image_path
#>       <int> <chr>            <int>             <int> <chr>            <chr>     
#> 1         1 paragraph            1                 1 " "              <NA>      
#> 2         1 paragraph            2                 1  <NA>            <NA>      
#> 3         1 paragraph            3                 1 "ADDIN ZOTERO_I… <NA>      
#> 4         1 paragraph            4                 1  <NA>            <NA>      
#> 5         1 paragraph            5                 1 "[1,2]"          <NA>      
#> 6         1 paragraph            6                 1  <NA>            <NA>      
#> # ℹ 31 more variables: field_code <chr>, footnote_text <chr>, link <chr>,
#> #   link_to_bookmark <chr>, bookmark_start <chr>, character_stylename <chr>,
#> #   sz <int>, sz_cs <int>, font_family_ascii <chr>, font_family_eastasia <chr>,
#> #   font_family_hansi <chr>, font_family_cs <chr>, bold <lgl>, italic <lgl>,
#> #   underline <lgl>, color <chr>, shading <chr>, shading_color <chr>,
#> #   shading_fill <chr>, paragraph_stylename <chr>, keep_with_next <lgl>,
#> #   align <chr>, level <int>, num_id <int>, table_index <int>, row_id <int>, …
as_tibble(z2)
#> # A tibble: 5 × 37
#>   doc_index content_type run_index run_content_index run_content_text image_path
#>       <int> <chr>            <int>             <int> <chr>            <chr>     
#> 1         1 paragraph            1                 1 " "              <NA>      
#> 2         1 paragraph            2                 1  <NA>            <NA>      
#> 3         1 paragraph            3                 1  <NA>            <NA>      
#> 4         1 paragraph            4                 1 "[1,2]"          <NA>      
#> 5         1 paragraph            5                 1  <NA>            <NA>      
#> # ℹ 31 more variables: field_code <chr>, footnote_text <chr>, link <chr>,
#> #   link_to_bookmark <chr>, bookmark_start <chr>, character_stylename <chr>,
#> #   sz <int>, sz_cs <int>, font_family_ascii <chr>, font_family_eastasia <chr>,
#> #   font_family_hansi <chr>, font_family_cs <chr>, bold <lgl>, italic <lgl>,
#> #   underline <lgl>, color <chr>, shading <chr>, shading_color <chr>,
#> #   shading_fill <chr>, paragraph_stylename <chr>, keep_with_next <lgl>,
#> #   align <chr>, level <int>, num_id <int>, table_index <int>, row_id <int>, …

Created on 2026-01-14 with reprex v2.1.1

With your fix, we will see the expected output:

library(tibble)
library(officer)
curl::curl_download("https://github.com/user-attachments/files/24586594/example.docx", "example.docx")
doc <- read_docx("example.docx")
z1 <- docx_summary(doc, preserve = TRUE, remove_fields = FALSE, detailed = TRUE)
z2 <- docx_summary(doc, preserve = TRUE, remove_fields = TRUE, detailed = TRUE)
as_tibble(z1)
#> # A tibble: 6 × 37
#>   doc_index content_type run_index run_content_index run_content_text image_path
#>       <int> <chr>            <int>             <int> <chr>            <chr>     
#> 1         1 paragraph            1                 1 " "              <NA>      
#> 2         1 paragraph            2                 1  <NA>            <NA>      
#> 3         1 paragraph            3                 1 "ADDIN ZOTERO_I… <NA>      
#> 4         1 paragraph            4                 1  <NA>            <NA>      
#> 5         1 paragraph            5                 1 "[1,2]"          <NA>      
#> 6         1 paragraph            6                 1  <NA>            <NA>      
#> # ℹ 31 more variables: field_code <chr>, footnote_text <chr>, link <chr>,
#> #   link_to_bookmark <chr>, bookmark_start <chr>, character_stylename <chr>,
#> #   sz <int>, sz_cs <int>, font_family_ascii <chr>, font_family_eastasia <chr>,
#> #   font_family_hansi <chr>, font_family_cs <chr>, bold <lgl>, italic <lgl>,
#> #   underline <lgl>, color <chr>, shading <chr>, shading_color <chr>,
#> #   shading_fill <chr>, paragraph_stylename <chr>, keep_with_next <lgl>,
#> #   align <chr>, level <int>, num_id <int>, table_index <int>, row_id <int>, …
as_tibble(z2)
#> # A tibble: 2 × 37
#>   doc_index content_type run_index run_content_index run_content_text image_path
#>       <int> <chr>            <int>             <int> <chr>            <chr>     
#> 1         1 paragraph            1                 1 " "              <NA>      
#> 2         1 paragraph            2                 1 "[1,2]"          <NA>      
#> # ℹ 31 more variables: field_code <chr>, footnote_text <chr>, link <chr>,
#> #   link_to_bookmark <chr>, bookmark_start <chr>, character_stylename <chr>,
#> #   sz <int>, sz_cs <int>, font_family_ascii <chr>, font_family_eastasia <chr>,
#> #   font_family_hansi <chr>, font_family_cs <chr>, bold <lgl>, italic <lgl>,
#> #   underline <lgl>, color <chr>, shading <chr>, shading_color <chr>,
#> #   shading_fill <chr>, paragraph_stylename <chr>, keep_with_next <lgl>,
#> #   align <chr>, level <int>, num_id <int>, table_index <int>, row_id <int>, …

Created on 2026-01-14 with reprex v2.1.1

@davidgohel davidgohel merged commit e9f3956 into davidgohel:master Jan 14, 2026
0 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants