-
Notifications
You must be signed in to change notification settings - Fork 914
[wip] Corefud v1.3 #1502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Jemoka
wants to merge
6
commits into
dev
Choose a base branch
from
corefud_v1.3
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[wip] Corefud v1.3 #1502
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
b9638ac
corefud 1.3 corpus support
Jemoka 2650440
model changes to support underscore innference
Jemoka c0fa7cc
inference processor for coref
Jemoka 009c31c
fixes for zero coref inference
Jemoka 359a2e5
small debugging patches to support empty node prediction
Jemoka 0efcfb4
Merge remote-tracking branch 'origin/dev' into corefud_v1.3
Jemoka File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general, is this always on? i would think there will be datasets that don't have zeros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general, reporting this shouldn't hurt, since all we'll have in that case is that all of
doc["is_zero"]
isFalse
. Hence, that will give us 100% zeros accuracy, and not break any logging. Do you think we should handle those cases differently? The tricky part is that we have currently no way to tell if a dataset has no zeros, or if a batch has no zeros (which is quite likely since zeros are relatively rare.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case it doesn't matter too much, although i would think a higher level part of the routine could also look at the whole dataset and check if it has zeros or not. but not a big deal
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good; I would err on the side of "no" just because technically having "100% zeros accuracy" is technically correct still + involves less post-processing. Your call though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, no strong opinions except that
/ z_total
is probably not ideal in the case ofz_total == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have:
so in this case the only situation in which
z_total
would be the case where the number of elements indoc["is_zero"]
is zero for the entire corpus (i.e., the corpus has no length); this would be a bad state and not usually possible.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, will z_correct include documents correctly predicted to have 0 zeros?