Column naming for empty string and duplicate NA label #6795

Divendra2006 · 2025-02-03T21:08:15Z

This PR addresses the issues of Handling empty strings with "empty_string" and ensuring unique column.

Handling Empty Strings in Column Names

Added a function handle_empty_strings to replace empty strings in column names with "empty_string".

Ensuring Unique Column Names

Added a function ensure_unique_names to ensure column names are unique by appending a suffix if duplicates are found.

Improved fill.default Handling

Explicitly handled fill.default when fun.aggregate is used and fill is NULL.
When fun.aggregate is used and fill is NULL, missing values in the reshaped data need to be filled with a default value.This change ensures that fill.default is computed correctly and used to fill missing values, improving consistency and preventing errors.

Added a check to ensure the length of names matches the length of the vector before setting the names attribute.

if there is any improvement needed in the code than tell me.

codecov · 2025-02-03T21:15:41Z

Codecov Report

Attention: Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 98.63%. Comparing base (f9cf2a1) to head (e52de3e).

Files with missing lines	Patch %	Lines
R/fcast.R	50.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6795      +/-   ##
==========================================
- Coverage   98.64%   98.63%   -0.02%     
==========================================
  Files          79       79              
  Lines       14642    14646       +4     
==========================================
+ Hits        14444    14446       +2     
- Misses        198      200       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tdhock

before asking for review, please click "Files changed" tab and make sure that only a minimal set of changes that is relevant to the PR appears.

here there are many irrelevant changes should be reverted before review (adding empty lines, and removing comments)

tdhock · 2025-02-04T12:51:24Z

R/fcast.R

    if (is.function(dat[[i]]))
      stopf("Column [%s] not found or of unknown type.", deparse(x))
  }
+


please undo addition of empty lines

tdhock · 2025-02-04T12:51:51Z

R/fcast.R

  subset = m[["subset"]][[2L]]
  if (!is.null(subset)) {
    if (is.name(subset)) subset = as.call(list(quote(`(`), subset))
-    idx = which(eval(subset, data, parent.frame())) # any advantage thro' secondary keys?


please undo

Divendra2006 · 2025-02-18T04:41:06Z

@tdhock , Please let me know if there is any change or improvement needed .
If PR looks good I would be grateful if you could proceed with merging it at your earliest convenience.

tdhock

please fix

tdhock · 2025-02-18T10:02:25Z

R/fcast.R

    value.var = names(data)[ncol(data)]
  lvals = value_vars(value.var, names(data))
  valnames = unique(unlist(lvals))
+  valnames = handle_empty_strings(valnames)


please avoid re-writing the same variable (valnames) which can be confusing.
Either use unique names or don't use multiple lines/variables.

Also are these helper functions used only once? If so please delete the helper functions, and just use the code here instead of in a separate function. (helper functions should only be introduced if the same code is used in more than one place, to avoid repetition)

Yes these helper function used only once , so that I remove helper functions and used code directly instead of separate function.

tdhock · 2025-02-18T10:03:32Z

R/fcast.R

      lhs = lhs_; rhs = rhs_
    }
    maplen = lengths(mapunique)
-    idx = do.call(CJ, mapunique)[map, 'I' := .I][["I"]] # TO DO: move this to C and avoid materialising the Cross Join.


please undo all deletions which are not relevant to your PR.

Click "Files changed" tab in github, and make sure there are only changes relevant to your PR.

I undo all deletions and made minimal changes in a code which are relevant .

Divendra2006 · 2025-02-18T19:57:14Z

@tdhock , I made all suggested changes , is there any improvement needed in my code?

aitap · 2025-02-19T11:35:04Z

The original issue #5605 asked for making dcast return a column with an empty string as a name, not "empty_string". On the other hand, an empty name might be problematic is because the empty symbol is reserved for R's missing argument marker, so both `` and as.name("") refuse to work:

> .Internal(inspect(alist(a=)$a)) # the missing argument marker is a symbol
@56224d249db0 01 SYMSXP g0c0 [MARK,REF(6910)] [missing argument]
> as.character(alist(a=)$a) # and its text content is empty
[1] ""

tdhock · 2025-02-19T13:20:13Z

I think @aitap is right, we should probably not be encouraging column names being empty string. Empty string is not allowed as a variable name when constructing list and data table.

> list(""="foo")
Erreur : tentative d'utilisation de nom de variable de longueur nulle
> data.table(""=1)
Erreur : tentative d'utilisation de nom de variable de longueur nulle

you can create a column name which is empty string but you can't extract it using [[

> setnames(data.table(x=1),"")[[""]]
NULL

Please close PR if you agree.

column naming for empty string and duplicate NA label

f820068

Divendra2006 requested a review from tdhock as a code owner February 3, 2025 21:08

tdhock requested changes Feb 4, 2025

View reviewed changes

changes made

6a4fe65

Divendra2006 force-pushed the data branch from 33ce92e to 6a4fe65 Compare February 4, 2025 13:14

tdhock requested changes Feb 18, 2025

View reviewed changes

Divendra2006 and others added 3 commits February 19, 2025 01:06

changes made

5a47ce0

changes made

e1fd079

Merge branch 'master' into data

0586ffd

Divendra2006 added 2 commits February 19, 2025 18:05

changes made

b414ddc

changes

e52de3e

tdhock closed this Feb 19, 2025

tdhock mentioned this pull request Feb 19, 2025

dcast outputs column name V1 for empty string #5605

Closed

Column naming for empty string and duplicate NA label #6795

Column naming for empty string and duplicate NA label #6795

Uh oh!

Conversation

Divendra2006 commented Feb 3, 2025

Handling Empty Strings in Column Names

Ensuring Unique Column Names

Improved fill.default Handling

Uh oh!

codecov bot commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tdhock left a comment

Choose a reason for hiding this comment

Uh oh!

tdhock Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

tdhock Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

Divendra2006 commented Feb 18, 2025

Uh oh!

tdhock left a comment

Choose a reason for hiding this comment

Uh oh!

tdhock Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

Divendra2006 Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

tdhock Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

Divendra2006 Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

Divendra2006 commented Feb 18, 2025

Uh oh!

aitap commented Feb 19, 2025

Uh oh!

tdhock commented Feb 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Feb 3, 2025 •

edited

Loading