Skip to content

Commit 70a45c7

Browse files
authored
Merge pull request #107 from fossology/feat/package/use-poetry
feat(python): use poetry as package manager
2 parents 5071444 + c36cd31 commit 70a45c7

File tree

14 files changed

+2055
-233
lines changed

14 files changed

+2055
-233
lines changed

.github/ISSUE_TEMPLATE.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
<!-- SPDX-FileCopyrightText: © Fossology contributors
2+
SPDX-License-Identifier: GPL-2.0-only
3+
-->
4+
5+
<!-- Search if the Issue do not already exists in the issues (https://github.com/fossology/atarashi/issues). -->
6+
7+
## Description
8+
9+
Please describe your situation in few words here.
10+
11+
### How to reproduce
12+
13+
For a bug: Describe the bug and list the steps you used when the issue occurred.
14+
15+
For an enhancement or new feature: Describe your needs/expected results.
16+
17+
### Screenshots
18+
19+
If applicable, add screenshots to help explain your problem.
20+
21+
### Versions
22+
23+
* Last commit id on master:
24+
25+
26+
### Logs
27+
28+
Any logs (if any) generated in the process of reproducing the issue.

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
<!-- SPDX-FileCopyrightText: © Fossology contributors
2+
SPDX-License-Identifier: GPL-2.0-only
3+
-->
4+
5+
<!-- Search if the bug do not already exists in the issues (https://github.com/fossology/atarashi/issues). -->
6+
<!-- Please refer to CONTRIBUTING.md (https://github.com/fossology/fossology/blob/master/CONTRIBUTING.md)
7+
before creating the pull request to make sure you follow all the standards. -->
8+
9+
## Description
10+
11+
Please describe the changes in your pull request in few words here.
12+
13+
### Changes
14+
15+
List the changes done to fix a bug or introducing a new feature.
16+
17+
## How to test
18+
19+
Describe the steps required to test the changes proposed in the pull request.
20+
21+
Please consider using the closing keyword if the pull request is proposed to
22+
fix an issue already created in the repository
23+
(https://help.github.com/articles/closing-issues-using-keywords/)

.github/depandabot.yml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# SPDX-FileCopyrightText: © 2025 Siemens AG
2+
# SPDX-FileContributor: Kaushlendra Pratap <kaushlendra-pratap.singh@siemens.com>
3+
4+
# SPDX-License-Identifier: GPL-2.0-only
5+
6+
version: 2
7+
updates:
8+
- package-ecosystem: "pip"
9+
directory: "/"
10+
schedule:
11+
interval: "weekly"
12+
- package-ecosystem: "github-actions"
13+
directory: "/"
14+
schedule:
15+
interval: "weekly"
16+
- package-ecosystem: "docker"
17+
directories:
18+
- "/"
19+
groups:
20+
docker:
21+
applies-to: security-updates
22+
patterns: [ "*" ]
23+
schedule:
24+
interval: "weekly"

.github/workflows/build-test.yml

Lines changed: 25 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,52 @@
1-
# SPDX-FileCopyrightText: 2022 Gaurav Mishra <mishra.gaurav@siemens.com>
2-
# SPDX-License-Identifier: GPL-2.0
1+
# SPDX-FileCopyrightText: 2025 Gaurav Mishra <mishra.gaurav@siemens.com>
2+
# SPDX-FileCopyrightText: 2025 Kaushlendra Pratap <kaushlendra-pratap.singh@siemens.com>
3+
# SPDX-License-Identifier: GPL-2.0-only
34

45
name: Build and test packages
56

67
on:
7-
- "pull_request"
8-
- "push"
8+
push:
9+
branches: [master]
10+
pull_request:
11+
branches: [master]
912

1013
jobs:
1114
build:
1215

1316
strategy:
1417
matrix:
15-
python: [3.5, 3.6, 3.7, 3.8, 3.9]
18+
python: ['3.10', '3.11', '3.12']
1619

1720
runs-on: ubuntu-latest
21+
env:
22+
PYTHONDONTWRITEBYTECODE: "1"
1823

1924
steps:
20-
- uses: actions/checkout@v2
25+
- uses: actions/checkout@v4
2126

2227
- name: Setup python
23-
uses: actions/setup-python@v2
28+
uses: actions/setup-python@v5
2429
with:
2530
python-version: ${{ matrix.python }}
26-
architecture: 'x64'
2731

2832
- name: Install build dependencies
2933
run: |
30-
python3 -m pip install --upgrade pip
31-
python3 -m pip install --requirement requirements.txt
34+
python3 -m pip install --upgrade pip poetry
3235
3336
- name: Build and install
3437
run: |
35-
python3 setup.py build
36-
python3 -m pip install .
38+
poetry install
39+
poetry run preprocess
40+
poetry build
3741
3842
- name: Test
3943
run: |
40-
atarashi -h
41-
atarashi -a wordFrequencySimilarity ./atarashi/atarashii.py
42-
atarashi -a DLD ./atarashi/atarashii.py
43-
atarashi -a tfidf -s ScoreSim ./atarashi/atarashii.py
44-
atarashi -a tfidf -s CosineSim ./atarashi/atarashii.py
45-
atarashi -a Ngram -s CosineSim ./atarashi/atarashii.py
46-
atarashi -a Ngram -s DiceSim ./atarashi/atarashii.py
47-
atarashi -a Ngram -s BigramCosineSim ./atarashi/atarashii.py
48-
atarashi -a Ngram -s BigramCosineSim ./atarashi/agents
44+
poetry run atarashi -h
45+
poetry run atarashi -a wordFrequencySimilarity ./atarashi/atarashii.py
46+
poetry run atarashi -a DLD ./atarashi/atarashii.py
47+
poetry run atarashi -a tfidf -s ScoreSim ./atarashi/atarashii.py
48+
poetry run atarashi -a tfidf -s CosineSim ./atarashi/atarashii.py
49+
poetry run atarashi -a Ngram -s CosineSim ./atarashi/atarashii.py
50+
poetry run atarashi -a Ngram -s DiceSim ./atarashi/atarashii.py
51+
poetry run atarashi -a Ngram -s BigramCosineSim ./atarashi/atarashii.py
52+
poetry run atarashi -a Ngram -s BigramCosineSim ./atarashi/agents

.github/workflows/release-publish.yml

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,23 +11,24 @@ jobs:
1111
runs-on: ubuntu-latest
1212

1313
steps:
14-
- uses: actions/checkout@v2
14+
- uses: actions/checkout@v4
1515

1616
- name: Setup python
17-
uses: actions/setup-python@v2
17+
uses: actions/setup-python@v5
1818
with:
19-
python-version: '3.8'
20-
architecture: 'x64'
19+
python-version: '3.11'
2120

2221
- name: Install build dependencies
23-
run: python3 -m pip install --upgrade wheel --requirement requirements.txt
22+
run: python3 -m pip install --upgrade poetry
2423

2524
- name: Build packages
26-
run: python3 setup.py sdist bdist_wheel
25+
run: |
26+
poetry install
27+
poetry run preprocess
28+
poetry build
2729
2830
- name: Upload Packages to PyPI
2931
uses: pypa/gh-action-pypi-publish@v1.4.1
3032
with:
3133
user: __token__
3234
password: ${{ secrets.PYPI_API_TOKEN }}
33-

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@ __pycache__/
88

99
TestFiles/
1010

11+
# Processed files
12+
atarashi/data/licenses/processedLicenses.csv
13+
atarashi/data/Ngram_keywords.json
14+
1115
# Distribution / packaging
1216
.Python
1317
build/

README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,9 @@ https://fossology.github.io/atarashi
1919

2020
### Requirements
2121

22-
- Python >= v3.5
23-
- pip >= 18.1
22+
- Python >= v3.10
23+
- pip >= 25.0
24+
- poetry >= 2.0.0
2425

2526
## Steps for Installation
2627

@@ -32,21 +33,20 @@ https://fossology.github.io/atarashi
3233

3334
#### Source install
3435

35-
- `pip install .`
36+
- ```shell
37+
poetry install
38+
poetry run preprocess
39+
```
3640
- It will download all dependencies required and trigger build as well.
3741
- Build will generate 3 new files in your current directory
3842
1. `data/Ngram_keywords.json`
3943
2. `licenses/<SPDX-version>.csv`
4044
3. `licenses/processedList.csv`
4145
- These files will be placed to their appropriate places by the install script.
4246

43-
### Installing just dependencies
44-
45-
- `pip install -r requirements.txt`
46-
4747
### Build (optional)
4848

49-
- `$ python3 setup.py build`
49+
- `poetry build`
5050

5151
## How to run
5252

atarashi/agents/tfidf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ def __tfidfsumscore(self, inputFile):
7878

7979
all_documents = self.licenseList['processed_text'].tolist()
8080
all_documents.append(processedData1)
81-
sklearn_tfidf = TfidfVectorizer(min_df=0, use_idf=True, smooth_idf=True,
81+
sklearn_tfidf = TfidfVectorizer(min_df=1, use_idf=True, smooth_idf=True,
8282
sublinear_tf=True, tokenizer=tokenize,
8383
vocabulary=processedData)
8484

@@ -115,7 +115,7 @@ def __tfidfcosinesim(self, inputFile):
115115
startTime = time.time()
116116

117117
all_documents = self.licenseList['processed_text'].tolist()
118-
sklearn_tfidf = TfidfVectorizer(min_df=0, max_df=0.10, use_idf=True, smooth_idf=True,
118+
sklearn_tfidf = TfidfVectorizer(min_df=1, max_df=0.10, use_idf=True, smooth_idf=True,
119119
sublinear_tf=True, tokenizer=tokenize)
120120

121121
all_documents_matrix = sklearn_tfidf.fit_transform(all_documents).toarray()

atarashi/atarashii.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
with this program; if not, write to the Free Software Foundation, Inc.,
1919
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
2020
"""
21-
from pkg_resources import resource_filename
21+
from importlib_resources import files
2222
import argparse
2323
import errno
2424
import json
@@ -121,8 +121,8 @@ def main():
121121
Calls atarashii_runner for each file in the folder/ repository specified by user
122122
Prints the Input path and the JSON output from atarashii_runner
123123
'''
124-
defaultProcessed = resource_filename("atarashi", "data/licenses/processedLicenses.csv")
125-
defaultJSON = resource_filename("atarashi", "data/Ngram_keywords.json")
124+
defaultProcessed = str(files("atarashi.data.licenses").joinpath("processedLicenses.csv"))
125+
defaultJSON = str(files("atarashi.data").joinpath("Ngram_keywords.json"))
126126
parser = argparse.ArgumentParser()
127127
parser.add_argument("inputPath", help="Specify the input file/directory path to scan")
128128
parser.add_argument("-l", "--processedLicenseList", required=False,

atarashi/license/license_merger.py

Lines changed: 3 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -73,8 +73,8 @@ def license_merger(licenseList, requiredlicenseList, verbose=0):
7373
spdx_compatible_shortname, case=False, regex=False).any():
7474
# SPDX style short name match
7575
continue
76-
licenses_merge = licenses_merge.append(
77-
licenses.loc[idx], ignore_index=True, sort=False)
76+
licenses_merge = pd.concat([licenses_merge,
77+
licenses.loc[idx]], ignore_index=True, sort=False)
7878

7979
if verbose > 0:
8080
print("Licenses to Merge", len(licenses_merge))
@@ -100,18 +100,7 @@ def license_merger(licenseList, requiredlicenseList, verbose=0):
100100

101101
requiredlicenses = requiredlicenses.drop_duplicates(
102102
subset='shortname').sort_values(by=['shortname']).reset_index(drop=True)
103-
indexesToDrop = []
104-
for idx, row in requiredlicenses.iterrows():
105-
if len(requiredlicenses.loc[requiredlicenses['shortname'] == \
106-
row['shortname'] + '-only']['deprecated'] == \
107-
True) > 0:
108-
indexesToDrop.append(idx)
109-
if row['shortname'].endswith('+') and \
110-
len(requiredlicenses.loc[requiredlicenses['shortname'] == \
111-
row['shortname'][:-1] + \
112-
"-or-later" ]['deprecated'] == True) > 0:
113-
indexesToDrop.append(idx)
114-
requiredlicenses.drop(indexesToDrop, inplace=True)
103+
requiredlicenses = requiredlicenses[requiredlicenses.deprecated == False]
115104
requiredlicenses.to_csv(str(requiredlicenseList), index=False, encoding='utf-8')
116105

117106
return str(Path(os.path.abspath(requiredlicenseList)))

0 commit comments

Comments
 (0)