Skip to content

Deduplicate CC-News-EN #2

@mam10eks

Description

@mam10eks

The new CIKM 2020 resource paper CC-News-En: A Large English News Corpus introduced a new corpus. The goal of this ticket is to run the deduplication pipeline on the CC-News-EN corpus.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions