Skip to content

Conversation

@karpnv
Copy link
Collaborator

@karpnv karpnv commented Nov 10, 2023

Common Crawl dataset preprocessing

karpnv and others added 30 commits September 12, 2023 04:28
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
karpnv and others added 30 commits March 19, 2024 09:32
* YouTube German config and new processors

Signed-off-by: Sasha Meister <[email protected]>

* Added Merge Manifests processor

Signed-off-by: Sasha Meister <[email protected]>

* Clean de.yaml pipeline config

Signed-off-by: Sasha Meister <[email protected]>

* Fix Lang2Iso

Signed-off-by: Sasha Meister <[email protected]>

* fix typo

* fix empty list error - IndexError: list index out of range

* Added requirements.txt

Signed-off-by: Sasha Meister <[email protected]>

* Fixed paths for audio TN

Signed-off-by: Sasha Meister <[email protected]>

* Updated requirements.txt

Signed-off-by: Sasha Meister <[email protected]>

---------

Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
* YouTube German config and new processors

Signed-off-by: Sasha Meister <[email protected]>

* Added Merge Manifests processor

Signed-off-by: Sasha Meister <[email protected]>

* Clean de.yaml pipeline config

Signed-off-by: Sasha Meister <[email protected]>

* Fix Lang2Iso

Signed-off-by: Sasha Meister <[email protected]>

* fix typo

* fix empty list error - IndexError: list index out of range

* Added requirements.txt

Signed-off-by: Sasha Meister <[email protected]>

* Fixed paths for audio TN

Signed-off-by: Sasha Meister <[email protected]>

* Updated requirements.txt

Signed-off-by: Sasha Meister <[email protected]>

* ew processors for calculating metrics WER, CER, eedge CER, len diff ratio

Signed-off-by: Sasha Meister <[email protected]>

* Update utils.py

* Update aggregate_segments.py

* Update aggregate_segments.py

* Update aggregate_segments.py

---------

Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
Co-authored-by: Sasha Meister <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants