Skip to content

Commit 59c5001

Browse files
committed
added delete_discussions to README and fixed up some python syntax
1 parent 8b3fc70 commit 59c5001

File tree

3 files changed

+79
-6
lines changed

3 files changed

+79
-6
lines changed

utils/stackoverflow/README.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -377,6 +377,81 @@ The script provides detailed feedback including:
377377
3. Use `--force` only in automated scripts where confirmation isn't possible
378378
4. Monitor the logs for any failures during bulk operations
379379

380+
## Discussion Deletion (delete_discussions.py)
381+
A utility script for deleting specific GitHub Discussions based on Stack Overflow question IDs. This tool is particularly useful for cleaning up failed migrations or removing specific discussions that need to be re-migrated.
382+
383+
### Requirements
384+
* Python 3.x
385+
* Dependencies listed in requirements.txt
386+
* GitHub App with appropriate permissions (Contents, Discussions, Metadata)
387+
388+
### Setup
389+
1. Install the required dependencies:
390+
```
391+
pip install -r requirements.txt
392+
```
393+
2. Set up GitHub App authentication by setting these environment variables:
394+
```
395+
export GHD_INSTALLATION_ID=your_installation_id
396+
export GHD_APP_ID=your_github_app_id
397+
export GHD_PRIVATE_KEY=/path/to/your/private-key.pem
398+
```
399+
400+
### Usage
401+
```
402+
python delete_discussions.py --repo OWNER/REPO --category CATEGORY_NAME --question-ids ID_FILE [options]
403+
```
404+
405+
#### Parameters
406+
- `--repo` (required): GitHub repository in format owner/name
407+
- `--category` (required): GitHub Discussion category name to search in
408+
- `--question-ids` (required): File containing Stack Overflow question IDs to delete (one per line)
409+
- `--input`: Input JSON file containing Stack Overflow data (default: questions_answers_comments.json)
410+
- `--api-delay`: Minimum seconds between API calls (default: 1.0)
411+
- `--dry-run`: Show what would be deleted without actually deleting
412+
413+
#### Examples
414+
415+
**Preview what would be deleted (recommended first step):**
416+
```bash
417+
python delete_discussions.py --repo bcgov/gh-discussions-lab --category "Q&A" --question-ids failed_questions.txt --dry-run
418+
```
419+
420+
**Delete discussions for specific question IDs:**
421+
```bash
422+
python delete_discussions.py --repo bcgov/gh-discussions-lab --category "Q&A" --question-ids failed_questions.txt
423+
```
424+
425+
**Use custom input file and API delay:**
426+
```bash
427+
python delete_discussions.py --repo bcgov/gh-discussions-lab --category "Q&A" --question-ids failed_questions.txt --input custom_questions.json --api-delay 2.0
428+
```
429+
430+
### Question ID File Format
431+
Create a text file with one question ID per line. Comments (lines starting with #) are ignored:
432+
```
433+
# Failed question IDs from populate_discussion.py run
434+
# Lines starting with # are ignored
435+
436+
1354
437+
1320
438+
1321
439+
1285
440+
```
441+
442+
443+
### Output
444+
The script provides the following feedback:
445+
- Number of question IDs loaded and found in SO data
446+
- Per-discussion processing status with titles
447+
- Progress updates during deletion
448+
- Summary statistics (deleted, not found, errors)
449+
- All operations logged to `delete_discussions.log`
450+
451+
### Safety Warnings
452+
⚠️ **DESTRUCTIVE OPERATION**: This script permanently deletes GitHub discussions and all their comments. This action cannot be undone.
453+
454+
380455
## URL Validation with Playwright (validate_urls_playwright.py)
381456
A browser-based URL validation tool that checks if URLs return HTTP 301 redirects. This script is used for validating redirects to GitHub that require SSO authentication, which cannot be tested programmatically with standard HTTP libraries.
382457

utils/stackoverflow/delete_discussions.py

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"""
33
Delete GitHub Discussions based on Stack Overflow question IDs.
44
5-
This script reads a list of question IDs, finds the corresponding questions in the
5+
This script reads a list of SO question IDs, finds the corresponding questions in the
66
questions_answers_comments.json file, looks up the GitHub discussions by title,
77
and deletes them along with all associated comments.
88
"""
@@ -16,7 +16,7 @@
1616

1717
from populate_discussion_helpers import RateLimiter, GitHubAuthManager, GraphQLHelper
1818
from populate_discussion import (
19-
load_json, decode_html_entities, get_category_id, find_discussion_by_title
19+
load_json, decode_html_entities, get_category_id, find_discussion_by_title, Category
2020
)
2121

2222
# Setup logging
@@ -39,8 +39,6 @@
3939
logger = logging.getLogger(__name__)
4040

4141

42-
Category = namedtuple('Category', ['id', 'name'])
43-
4442
def load_question_ids_from_file(file_path: str) -> List[int]:
4543
"""Load question IDs from a text file (one ID per line)."""
4644
question_ids = []
@@ -295,7 +293,7 @@ def main():
295293
error_count = 0
296294

297295
for question_id, question in questions_map.items():
298-
title = decode_html_entities(question.get('title'))
296+
title = decode_html_entities(question.get('title') or "")
299297
logger.info(f"Processing question {question_id}: {title}")
300298

301299
try:

utils/stackoverflow/populate_discussion.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -881,7 +881,7 @@ def main():
881881
question_id = question['question_id'] if question else "Unknown ID"
882882
logger.error(f"Error processing question_id {question_id} question #{i+1}: {e}")
883883
continue
884-
884+
logger.info("Completed processing questions.")
885885

886886

887887
class TagsToIgnore:

0 commit comments

Comments
 (0)