Skip to content

Commit 828a750

Browse files
authored
fix tests (failed bcause of certs, and fixed jwer version) (#134)
* fix tests (failed bcause of certs, and fixed jwer version) Signed-off-by: George Zelenfroind <[email protected]> * add comments and addres doc problems Signed-off-by: George Zelenfroind <[email protected]> * update link in docs to mirror Signed-off-by: George Zelenfroind <[email protected]> * switch to https Signed-off-by: George Zelenfroind <[email protected]> * add cert manually Signed-off-by: George Zelenfroind <[email protected]> * ipdate all links in the file Signed-off-by: George Zelenfroind <[email protected]> * cert manual download to correct file Signed-off-by: George Zelenfroind <[email protected]> * dwnld another cert Signed-off-by: George Zelenfroind <[email protected]> * update all certs Signed-off-by: George Zelenfroind <[email protected]> * add ignoring of coraal link check Signed-off-by: George Zelenfroind <[email protected]> * http to httpsø Signed-off-by: George Zelenfroind <[email protected]> --------- Signed-off-by: George Zelenfroind <[email protected]>
1 parent 22a6bfe commit 828a750

File tree

5 files changed

+18
-7
lines changed

5 files changed

+18
-7
lines changed

.github/workflows/tests.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,14 +75,20 @@ jobs:
7575
pip install nemo-toolkit[asr,nlp]==1.23.0
7676
pip install nemo_text_processing
7777
pip install -r requirements/huggingface.txt
78+
pip install certifi #this needed to avoid problems with certificates [COORAL]
79+
export SSL_CERT_FILE=$(python -m certifi)
7880
python -m pip cache purge
81+
7982
8083
- name: Run all tests
8184
env:
8285
AWS_SECRET_KEY: ${{ secrets.AWS_SECRET_KEY }}
8386
AWS_ACCESS_KEY: ${{ secrets.AWS_ACCESS_KEY }}
8487
CLEAN_UP_TMP_PATH: 1
8588
run: |
89+
wget https://uit.stanford.edu/sites/default/files/2023/10/11/incommon-rsa-ca2.pem #downloading cert manually [for CORAL]
90+
sudo cp incommon-rsa-ca2.pem /usr/local/share/ca-certificates/incommon-rsa-server-ca-2.crt # [cert for CORAL]
91+
sudo update-ca-certificates # [cert for CORAL]
8692
set -o pipefail # this will make sure next line returns non-0 exit code if tests fail
8793
python -m pytest tests/ --junitxml=pytest.xml --ignore=tests/test_tts_sdp_end_to_end.py --cov-report=term-missing:skip-covered --cov=sdp --durations=30 -rs | tee pytest-coverage.txt
8894

dataset_configs/english/coraal/config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ documentation: |
1818
This config performs the following data processing.
1919
2020
1. Downloads CORAAL data based on the
21-
`official file list <http://lingtools.uoregon.edu/coraal/coraal_download_list.txt>`_.
21+
`official file list <https://lingtools.uoregon.edu/coraal/coraal_download_list.txt>`_. #Official mirror link
2222
There are a couple of errors in the links there, which are fixed in our code.
2323
2. Drops all utterances which contain only pauses. Set ``drop_pauses=False`` to undo.
2424
3. Groups all consecutive segments from the same speaker until 20 seconds duration

docs/src/conf.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,3 +189,8 @@ def setup(app):
189189
]
190190
# nitpick_ignore_regex = [('py:class', '*')]
191191

192+
#adding this especially for coraal, temporary
193+
linkcheck_ignore = [
194+
r'https://lingtools\.uoregon\.edu/coraal/coraal_download_list\.txt',
195+
]
196+
# https://lingtools.uoregon.edu/coraal/coraal_download_list.txt

requirements/main.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ python-docx
1818
pydub
1919
dask
2020
distributed
21-
21+
jiwer>=3.1.0,<4.0.0
2222
# toloka-kit # Temporarily disabled due to Toloka's technical pause; keep as reference for past and future API support
2323
# for some processers, additionally https://github.com/NVIDIA/NeMo is required
2424
# for some processers, additionally nemo_text_processing is required

sdp/processors/datasets/coraal/create_initial_manifest.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -31,15 +31,15 @@ def get_coraal_url_list():
3131
There are a few mistakes in the official url list that are fixed here.
3232
Can be overridden by tests to select a subset of urls.
3333
"""
34-
dataset_url = "http://lingtools.uoregon.edu/coraal/coraal_download_list.txt"
34+
dataset_url = "https://lingtools.uoregon.edu/coraal/coraal_download_list.txt"
3535
urls = []
3636
for file_url in urllib.request.urlopen(dataset_url):
3737
file_url = file_url.decode('utf-8').strip()
3838
# fixing known errors in the urls
39-
if file_url == 'http://lingtools.uoregon.edu/coraal/les/2021.07/LES_metadata_2018.10.06.txt':
40-
file_url = 'http://lingtools.uoregon.edu/coraal/les/2021.07/LES_metadata_2021.07.txt'
41-
if file_url == 'http://lingtools.uoregon.edu/coraal/vld/2021.07/VLD_metadata_2018.10.06.txt':
42-
file_url = 'http://lingtools.uoregon.edu/coraal/vld/2021.07/VLD_metadata_2021.07.txt'
39+
if file_url == 'https://lingtools.uoregon.edu/coraal/les/2021.07/LES_metadata_2018.10.06.txt':
40+
file_url = 'https://lingtools.uoregon.edu/coraal/les/2021.07/LES_metadata_2021.07.txt'
41+
if file_url == 'https://lingtools.uoregon.edu/coraal/vld/2021.07/VLD_metadata_2018.10.06.txt':
42+
file_url = 'https://lingtools.uoregon.edu/coraal/vld/2021.07/VLD_metadata_2021.07.txt'
4343
urls.append(file_url)
4444
return urls
4545

0 commit comments

Comments
 (0)