Skip to content

Commit 1c12d62

Browse files
authored
Merge pull request #13 from zkoppert/only-fetch-matching-repos
Only retrieve repos with matching topic in org
2 parents 49ed71d + 97b2176 commit 1c12d62

File tree

4 files changed

+16
-21
lines changed

4 files changed

+16
-21
lines changed

.env-example

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
GH_TOKEN=' '
22
TOPIC='inner-source'
3+
ORGANIZATION=' '

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@ This project creates a `repos.json` that can be utilized by the [SAP InnerSource
1010
## Usage
1111

1212
1. Copy `.env-example` to `.env`
13-
1. Fill out the `.env` file with a token from a machine user that only has access to the org to scan
13+
1. Fill out the `.env` file with a _token_ from a user that has access to the organization to scan (listed below)
1414
1. Fill out the `.env` file with the exact _topic_ name you are searching for
15+
1. Fill out the `.env` file with the exact _organization_ that you want to search in
1516
1. Run `python3 ./crawler.py`, which will create a `repos.json` file containing the relevant metadata for the GitHub repos for the given _topic_
1617
1. Copy `repos.json` to your instance of the [SAP-InnerSource-Portal][SAP-InnerSource-Portal] and launch the portal as outlined in their installation instructions
1718

crawler.py

Lines changed: 13 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@
66

77
import github3
88
from dotenv import load_dotenv
9-
from ratelimiter import RateLimiter
109

1110
if __name__ == "__main__":
1211

@@ -17,27 +16,22 @@
1716
# Auth to GitHub.com
1817
gh = github3.login(token=os.getenv("GH_TOKEN"))
1918

20-
# Get all repos from organization
21-
all_repos = gh.repositories()
22-
rate_limiter = RateLimiter(max_calls=10, period=1)
23-
repo_list = []
2419
# Set the topic
2520
topic = os.getenv("TOPIC")
21+
organization = os.getenv("ORGANIZATION")
22+
23+
# Get all repos from organization
24+
search_string = "org:" + organization + " topic:" + topic
25+
all_repos = gh.search_repositories(search_string)
26+
repo_list = []
2627

27-
with rate_limiter:
28-
for repo in all_repos:
29-
if repo is not None:
30-
try:
31-
repo_topic = repo.topics()
32-
except Exception:
33-
print("skipping 404")
34-
else:
35-
if topic in repo_topic.names:
36-
print("{0}".format(repo))
37-
full_repository = repo.refresh()
38-
# TODO: #7 For each resulting project add a key _InnerSourceMetadata
39-
# Add stuff here about innersource.json data before appending to list
40-
repo_list.append(full_repository.as_dict())
28+
for repo in all_repos:
29+
if repo is not None:
30+
# TODO: #7 For each resulting project add a key _InnerSourceMetadata
31+
print("{0}".format(repo.repository))
32+
full_repository = repo.repository.refresh()
33+
# Add stuff here about innersource.json data before appending to list
34+
repo_list.append(repo.as_dict())
4135

4236
# Write each repository to a repos.json file
4337
with open("repos.json", "w") as f:

requirements.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,2 @@
11
github3.py==1.3.0
22
python-dotenv==0.15.0
3-
ratelimiter==1.2.0.post0

0 commit comments

Comments
 (0)