Skip to content

Conversation

@Divendra2006
Copy link

Fixes #5605

This PR addresses the issues of Handling empty strings with "empty_string" and ensuring unique column.

Handling Empty Strings in Column Names

Added a function handle_empty_strings to replace empty strings in column names with "empty_string".

Ensuring Unique Column Names

Added a function ensure_unique_names to ensure column names are unique by appending a suffix if duplicates are found.

Improved fill.default Handling

Explicitly handled fill.default when fun.aggregate is used and fill is NULL.
When fun.aggregate is used and fill is NULL, missing values in the reshaped data need to be filled with a default value.This change ensures that fill.default is computed correctly and used to fill missing values, improving consistency and preventing errors.

Added a check to ensure the length of names matches the length of the vector before setting the names attribute.

if there is any improvement needed in the code than tell me.

@Divendra2006 Divendra2006 requested a review from tdhock as a code owner February 3, 2025 21:08
@codecov
Copy link

codecov bot commented Feb 3, 2025

Codecov Report

Attention: Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 98.63%. Comparing base (f9cf2a1) to head (e52de3e).

Files with missing lines Patch % Lines
R/fcast.R 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6795      +/-   ##
==========================================
- Coverage   98.64%   98.63%   -0.02%     
==========================================
  Files          79       79              
  Lines       14642    14646       +4     
==========================================
+ Hits        14444    14446       +2     
- Misses        198      200       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@tdhock tdhock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before asking for review, please click "Files changed" tab and make sure that only a minimal set of changes that is relevant to the PR appears.

here there are many irrelevant changes should be reverted before review (adding empty lines, and removing comments)

R/fcast.R Outdated
if (is.function(dat[[i]]))
stopf("Column [%s] not found or of unknown type.", deparse(x))
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please undo addition of empty lines

subset = m[["subset"]][[2L]]
if (!is.null(subset)) {
if (is.name(subset)) subset = as.call(list(quote(`(`), subset))
idx = which(eval(subset, data, parent.frame())) # any advantage thro' secondary keys?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please undo

@Divendra2006
Copy link
Author

@tdhock , Please let me know if there is any change or improvement needed .
If PR looks good I would be grateful if you could proceed with merging it at your earliest convenience.

Copy link
Member

@tdhock tdhock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix

R/fcast.R Outdated
value.var = names(data)[ncol(data)]
lvals = value_vars(value.var, names(data))
valnames = unique(unlist(lvals))
valnames = handle_empty_strings(valnames)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid re-writing the same variable (valnames) which can be confusing.
Either use unique names or don't use multiple lines/variables.

Also are these helper functions used only once? If so please delete the helper functions, and just use the code here instead of in a separate function. (helper functions should only be introduced if the same code is used in more than one place, to avoid repetition)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes these helper function used only once , so that I remove helper functions and used code directly instead of separate function.

lhs = lhs_; rhs = rhs_
}
maplen = lengths(mapunique)
idx = do.call(CJ, mapunique)[map, 'I' := .I][["I"]] # TO DO: move this to C and avoid materialising the Cross Join.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please undo all deletions which are not relevant to your PR.

Click "Files changed" tab in github, and make sure there are only changes relevant to your PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I undo all deletions and made minimal changes in a code which are relevant .

@Divendra2006
Copy link
Author

@tdhock , I made all suggested changes , is there any improvement needed in my code?

@aitap
Copy link
Member

aitap commented Feb 19, 2025

The original issue #5605 asked for making dcast return a column with an empty string as a name, not "empty_string". On the other hand, an empty name might be problematic is because the empty symbol is reserved for R's missing argument marker, so both `` and as.name("") refuse to work:

> .Internal(inspect(alist(a=)$a)) # the missing argument marker is a symbol
@56224d249db0 01 SYMSXP g0c0 [MARK,REF(6910)] [missing argument]
> as.character(alist(a=)$a) # and its text content is empty
[1] ""

@tdhock
Copy link
Member

tdhock commented Feb 19, 2025

I think @aitap is right, we should probably not be encouraging column names being empty string. Empty string is not allowed as a variable name when constructing list and data table.

> list(""="foo")
Erreur : tentative d'utilisation de nom de variable de longueur nulle
> data.table(""=1)
Erreur : tentative d'utilisation de nom de variable de longueur nulle

you can create a column name which is empty string but you can't extract it using [[

> setnames(data.table(x=1),"")[[""]]
NULL

Please close PR if you agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dcast outputs column name V1 for empty string

3 participants