Skip to content

[SCRIPT]: Add script to delete projects in bulk#124

Open
oscarhearsawho wants to merge 1 commit intomainfrom
oscar/scripts/delete-projects-in-bulk
Open

[SCRIPT]: Add script to delete projects in bulk#124
oscarhearsawho wants to merge 1 commit intomainfrom
oscar/scripts/delete-projects-in-bulk

Conversation

@oscarhearsawho
Copy link

What?

Ever needed to clean up some of those pesky projects hanging around in the Semgrep UI, but don't fancy manually clicking the delete button for several hours? 🥱

With this all new bulk delete script, your project cleanup has never been easier! 🥳

How?

This script will hit our API on the DELETE - Delete project endpoint, looping over a .csv you provide with a list of projects.

Checkout the README file for more details! 👇

Copy link
Contributor

@armchairlinguist armchairlinguist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this script uses the API, let's put it in the api directory as well, instead of in its own directory. We can start a README that applies to that directory with your content and move stuff over from the main README, I think.

# ------------------------------------------------------------

ORGSLUG = "yourorgsluggoeshere" # Replace with your organization slug (found in Settings > Identifiers)
BEARER_TOKEN = "yourkeygoeshere" # Replace with your bearer token (generate one in Settings > Tokens)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be an env var or extracted from the local settings.yml - we should not publish scripts that encourage hardcoding tokens.

If you look in the other API scripts, there are usage patterns you can follow for both retrieving the token and getting the deployment information using the token (instead of requiring a hardcode there too).

Comment on lines +15 to +17
# ------------------------------------------------------------
# NO EDITING BELOW THIS LINE
# ------------------------------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# ------------------------------------------------------------
# NO EDITING BELOW THIS LINE
# ------------------------------------------------------------

Comment on lines +74 to +78
confirmation = input("Enter Y/N >>> ")

if confirmation.lower() != 'y':
print("\nOperation cancelled by user\n")
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
confirmation = input("Enter Y/N >>> ")
if confirmation.lower() != 'y':
print("\nOperation cancelled by user\n")
return
confirmation = input("Enter y to proceed >>> ")
if confirmation.lower() != 'y':
print("\nOperation cancelled by user\n")
return

I'm a little dubious about doing bespoke prompting in a script like this generally, but definitely if it's used, the prompt needs to provide accurate instructions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is, no? lower() will ensure Y matches y

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but the prompt should just say enter y if that's the desired behavior, since anything else will result in not proceeding - it doesn't need to specify y/n, and there's no reason to use a different case than is targeted. There's a fairly common Unix-y convention where capitalization indicates it's the default, so using lowercase is generally clearer if there is no default.

This script will delete projects in bulk from the deployment specified by ORGSLUG, by looping over a CSV of Project Names, and hitting the `DELETE - Delete project` endpoint. Once complete, it will generate a log of what was deleted, and if there were any errors (as well as providing the realtime responses in your CLI).

## How to run
To run the script, you first need to create and populate an `input.csv` file with all the project names of the projects you want to delete. See the included `input.csv.example` file as an example.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this require a CSV if it's just taking a list of names? Also, it would be nice for this filename to be customizable (and the output filename too). We usually use argparse to do CLI arguments in our scripts - there are some good examples of this in the repo if you want to use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also take the list on stdin, then the file can be named whatever and we don't have to implement args about it at all.

## How to run
To run the script, you first need to create and populate an `input.csv` file with all the project names of the projects you want to delete. See the included `input.csv.example` file as an example.

You can use the `GET - List all projects` endpoint on the API to get these, but this will only return **scanned** projects, if you want to delete unscanned projects in bulk, you'll need to contact Semgrep Support to do this for you.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can use the `GET - List all projects` endpoint on the API to get these, but this will only return **scanned** projects, if you want to delete unscanned projects in bulk, you'll need to contact Semgrep Support to do this for you.
You can use the `GET - List all projects` endpoint on the API to get these, but this will only return **scanned** projects, if you want to delete unscanned projects in bulk, you'll need to contact Semgrep Support to do this for you.

It would be nice for the script to provide the option to both get and delete projects or to take an input file and delete. Not a requirement for approval, but I think it would improve the UX.


You can use the `GET - List all projects` endpoint on the API to get these, but this will only return **scanned** projects, if you want to delete unscanned projects in bulk, you'll need to contact Semgrep Support to do this for you.

Now you've got the data, you need to setup the config at the top of the script - just add your Organization Slug to `ORGSLUG`, and your token to `BEARER_TOKEN` (must be authorised for the API) for the deployment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per comment on this code, these instructions would also need an update.


Now you've got the data, you need to setup the config at the top of the script - just add your Organization Slug to `ORGSLUG`, and your token to `BEARER_TOKEN` (must be authorised for the API) for the deployment.

Then, once that's done you're good to go!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Then, once that's done you're good to go!

response = requests.delete(url, headers=headers)
if response.status_code == 200:
print(f"Successfully deleted project: {project_name}")
return True, "deleted"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it's really useful to produce both an output file and printed logs. I'd generally expect one or the other, or something like a file and then a summary output like "Successfully deleted X projects, failed to delete Y projects".

def count_projects_in_csv():
try:
with open('input.csv', 'r') as file:
return sum(1 for row in csv.reader(file) if row) - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider using enumerate(file) here - I'm not sure at what point readlines becomes unwieldy, but we do know some folks have a lot of projects.


Then, once that's done you're good to go!

CD to the scripts directory (`bulk-delete-projects`) and run it with the below command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CD to the scripts directory (`bulk-delete-projects`) and run it with the below command:
In the directory where the script is saved, run it with the following command:

"""
url = API_ENDPOINT.format(
deployment_slug=ORGSLUG,
project_name=project_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we need to url encode the project name here? They almost always have slashes and occasionally have spaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants