Skip to content

Commit dd08038

Browse files
authored
build: add test for integration with unstructured to CI (#125)
Added to CI a test that grabs the latest version of unstructured, installs the ref version of unstructured-inference along with the other unstructured dependencies, and runs the ingest tests. The idea is to catch errors or changes that cause issues downstream before we bump the inference version in unstructured.
1 parent 5419dbc commit dd08038

File tree

1 file changed

+48
-2
lines changed

1 file changed

+48
-2
lines changed

.github/workflows/ci.yml

Lines changed: 48 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ jobs:
4040
needs: setup
4141
steps:
4242
- uses: actions/checkout@v3
43-
- uses: actions/cache@v3
43+
- uses: actions/cache/restore@v3
4444
id: virtualenv-cache
4545
with:
4646
path: .venv
@@ -70,7 +70,7 @@ jobs:
7070
needs: [setup, lint]
7171
steps:
7272
- uses: actions/checkout@v3
73-
- uses: actions/cache@v3
73+
- uses: actions/cache/restore@v3
7474
id: virtualenv-cache
7575
with:
7676
path: |
@@ -94,6 +94,52 @@ jobs:
9494
make test
9595
make check-coverage
9696
97+
test_ingest:
98+
strategy:
99+
matrix:
100+
python-version: ["3.8","3.9","3.10"]
101+
runs-on: ubuntu-latest
102+
env:
103+
NLTK_DATA: ${{ github.workspace }}/nltk_data
104+
needs: lint
105+
steps:
106+
- name: Checkout unstructured repo for integration testing
107+
uses: actions/checkout@v3
108+
with:
109+
repository: 'Unstructured-IO/unstructured'
110+
- name: Checkout this repo
111+
uses: actions/checkout@v3
112+
with:
113+
path: inference
114+
- name: Set up Python ${{ matrix.python-version }}
115+
uses: actions/setup-python@v4
116+
with:
117+
python-version: ${{ matrix.python-version }}
118+
- name: Test
119+
env:
120+
GH_READ_ONLY_ACCESS_TOKEN: ${{ secrets.GH_READ_ONLY_ACCESS_TOKEN }}
121+
SLACK_TOKEN: ${{ secrets.SLACK_TOKEN }}
122+
DISCORD_TOKEN: ${{ secrets.DISCORD_TOKEN }}
123+
run: |
124+
python${{ matrix.python-version }} -m venv .venv
125+
source .venv/bin/activate
126+
[ ! -d "$NLTK_DATA" ] && mkdir "$NLTK_DATA"
127+
make install-ci
128+
pip install -e inference/
129+
sudo apt-get update
130+
sudo apt-get install -y libmagic-dev poppler-utils libreoffice pandoc
131+
sudo add-apt-repository -y ppa:alex-p/tesseract-ocr5
132+
sudo apt-get install -y tesseract-ocr
133+
tesseract --version
134+
make install-ingest-s3
135+
make install-ingest-azure
136+
make install-ingest-discord
137+
make install-ingest-github
138+
make install-ingest-gitlab
139+
make install-ingest-slack
140+
make install-ingest-wikipedia
141+
./test_unstructured_ingest/test-ingest.sh
142+
97143
changelog:
98144
runs-on: ubuntu-latest
99145
steps:

0 commit comments

Comments
 (0)