Skip to content

Conversation

@aitap
Copy link
Member

@aitap aitap commented Aug 28, 2024

These are the remaining easy pickings for #6180.

Costs:

  • With verbose = TRUE, assign now only prints whether REFCNT() is >= 1 and >= 2, not its value directly.
  • Rf_charIsASCII is only an "eapi" function, not a bona-fide "api" function, and it is currently only in R-devel. But it doesn't raise NOTEs.

NAMED/REFCNT version information gathered here (shameless plug).

Since data.table now depends on R >= 3.3, the backports are no longer
needed. Moreover, MAYBE_SHARED is currently a function, while
MAYBE_REFERENCED expands to !NO_REFERENCES (which is a function).

In debugging output, show MAYBE_REFERENCED (NAMED > 0) instead of NAMED.
@MichaelChirico
Copy link
Member

thanks for the great resource!

would you mind

  1. splitting this into two PRs? id rather wait for Rf_isASCII to make a release before depending on it
  2. adding a NOTE to NEWS & citing yourself?

re:LEVELS, is there a performance penalty for the new approach?

@github-actions
Copy link

github-actions bot commented Aug 28, 2024

Comparison Plot

Generated via commit d626df2

Download link for the artifact containing the test results: ↓ atime-results.zip

Time taken to finish the standard R installation steps: 11 minutes and 21 seconds

Time taken to run atime::atime_pkg on the tests: 5 minutes and 45 seconds

@aitap
Copy link
Member Author

aitap commented Aug 28, 2024

re:LEVELS, is there a performance penalty for the new approach?

Currently, IS_ASCII(x) expands to LEVELS(x) & 64, so a function call and a bitmask test. With marked encodings, a call to getCharCE() and an equality test should be similar in performance. Rf_charIsASCII performs the same bitmask test inside a function call (where LEVELS expands to access internal SEXPREC fields). I think it should all be similar. If you have any pointers, I can try to write a performance test.

Will split and update the PR soon-ish. Thanks for the comments!

aitap added 2 commits August 28, 2024 18:26
getCharCE appeared in R-2.7, making it possible to check for strings
_marked_ as UTF-8 or Latin-1.

There is no marking as ASCII, so fixing IS_ASCII will have to wait for R
>= 4.5.
@MichaelChirico MichaelChirico merged commit fc9b328 into master Aug 28, 2024
@MichaelChirico MichaelChirico deleted the nonapi_b_gone branch August 28, 2024 16:42
@tdhock
Copy link
Member

tdhock commented Aug 29, 2024

nice NEWS item!
This is a non-trivial contribution that merits adding Ivan to Authors@R in DESCRIPTION as role="ctb"
Ivan can you please do that in a new PR?

@aitap aitap mentioned this pull request Aug 29, 2024
MichaelChirico added a commit that referenced this pull request Aug 30, 2024
* Drop direct use of NAMED

Since data.table now depends on R >= 3.3, the backports are no longer
needed. Moreover, MAYBE_SHARED is currently a function, while
MAYBE_REFERENCED expands to !NO_REFERENCES (which is a function).

In debugging output, show MAYBE_REFERENCED (NAMED > 0) instead of NAMED.

* Almost drop direct use of LEVELS

getCharCE appeared in R-2.7, making it possible to check for strings
_marked_ as UTF-8 or Latin-1.

There is no marking as ASCII, so fixing IS_ASCII will have to wait for R
>= 4.5.

* NEWS entry for NAMED, LEVELS reduction

* fine-tune NEWS

---------

Co-authored-by: Michael Chirico <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants