Skip to content

Remove use of deprecated start_index from Categorify#1134

Merged
nv-alaiacano merged 4 commits intoNVIDIA-Merlin:mainfrom
oliverholworthy:categorify-remove-start-index
Jun 6, 2023
Merged

Remove use of deprecated start_index from Categorify#1134
nv-alaiacano merged 4 commits intoNVIDIA-Merlin:mainfrom
oliverholworthy:categorify-remove-start-index

Conversation

@oliverholworthy
Copy link
Contributor

Goals ⚽

Fix use of Categorify to support new version of NVTabular

Implementation Details 🚧

@oliverholworthy oliverholworthy added the chore Maintenance for the repository label Jun 5, 2023
@oliverholworthy oliverholworthy added this to the Merlin 23.06 milestone Jun 5, 2023
@oliverholworthy oliverholworthy self-assigned this Jun 5, 2023
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions
Copy link

github-actions bot commented Jun 5, 2023

Documentation preview

https://nvidia-merlin.github.io/models/review/pr-1134

"source": [
"%%time\n",
"item_features_names = ['f_' + str(col) for col in [47, 68]]\n",
"cat_features = [['item_id', 'purchase_id']] + item_features_names >> nvt.ops.Categorify(start_index=1)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we can add a note for the user something like that:

Categorify op maps nulls to `1`, OOVs to `2` and the frequent categories are encoded starting from `3`. Please note that we reserve `0` for padding value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can I propose merging this fix to get things working then you can update the copy?

@oliverholworthy
Copy link
Contributor Author

This is the error that is printed out in gpu-ci. Looks like something to do with the horovod init. I wonder if we should be the horovod init side-effect (from the merlin.models.tf import)

--------------------------------------------------------------------------
Sorry!  You were supposed to get help about:
    mpi_init:startup:internal-failure
But I couldn't open the help file:
    /build-result/hpcx-v2.13-gcc-inbox-ubuntu20.04-cuda11-gdrcopy2-nccl2.12-x86_64/ompi/share/openmpi/help-mpi-runtime.txt: No such file or directory.  Sorry!
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[6bbe81b6bdba:01522] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

@oliverholworthy oliverholworthy marked this pull request as ready for review June 6, 2023 09:22
@nv-alaiacano nv-alaiacano merged commit 6350fef into NVIDIA-Merlin:main Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Maintenance for the repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants