API Example (Python): User Dataset with multiple entire projects with merged#1874
Conversation
| if os.path.isfile(API_TOKEN_FILENAME): | ||
| with open(API_TOKEN_FILENAME, "r") as f: | ||
| API_TOKEN = f.readlines()[0].strip() | ||
| print("Using existing token", API_TOKEN) |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 5 days ago
In general, to fix clear-text logging of sensitive information you should avoid writing secrets (passwords, tokens, private keys, etc.) directly to logs. If logging is desired for debugging, you can instead log non-sensitive metadata (such as whether a token was loaded, its source, or a truncated/masked version) that does not allow reconstruction or misuse of the secret.
Here, the problematic code is the print("Using existing token", API_TOKEN) after loading the token from .token. The script does not need to reveal the token value; it only needs to acknowledge that an existing token is being used. The best fix that preserves behavior is to change this print to omit the token value and instead log a neutral message like "Using existing token from .token" or just "Using existing token.". No additional imports or methods are required; we just adjust the message at line 223 in api-examples/download-user-datasest-with-merged-objects.py to stop including API_TOKEN.
| @@ -220,7 +220,7 @@ | ||
| if os.path.isfile(API_TOKEN_FILENAME): | ||
| with open(API_TOKEN_FILENAME, "r") as f: | ||
| API_TOKEN = f.readlines()[0].strip() | ||
| print("Using existing token", API_TOKEN) | ||
| print(f"Using existing token from {API_TOKEN_FILENAME}") | ||
| else: | ||
| print(f"Fetching token with {API_TOKEN_EMAIL}") | ||
| # This is the payload that you need to send to /tokens to get an active API_TOKEN |
| API_TOKEN = f.readlines()[0].strip() | ||
| print("Using existing token", API_TOKEN) | ||
| else: | ||
| print(f"Fetching token with {API_TOKEN_EMAIL}") |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 5 days ago
In general, to fix clear-text logging of sensitive information, remove the sensitive value from log messages or mask/redact it before logging. Logs should contain only what is necessary for observability (e.g., that an action occurred), not private identifiers.
Here, the only problematic usage is in the print(f"Fetching token with {API_TOKEN_EMAIL}") statement. The best fix that preserves functionality is to change the log message so that it no longer includes the email address at all. The script only needs to inform the user that it is fetching a token; including the email is not required for the code to work or for debugging. So we should replace that line with a generic message such as print("Fetching token") or, if you want a hint that user configuration is involved, something like print("Fetching token using configured email") without interpolating the actual email value.
Concretely, in api-examples/download-user-datasest-with-merged-objects.py, around line 225 in the else branch where the token file does not yet exist, replace:
print(f"Fetching token with {API_TOKEN_EMAIL}")with a version that omits API_TOKEN_EMAIL, e.g.:
print("Fetching token")No new methods or imports are needed; this is a straightforward change to the log message.
| @@ -222,7 +222,7 @@ | ||
| API_TOKEN = f.readlines()[0].strip() | ||
| print("Using existing token", API_TOKEN) | ||
| else: | ||
| print(f"Fetching token with {API_TOKEN_EMAIL}") | ||
| print("Fetching token") | ||
| # This is the payload that you need to send to /tokens to get an active API_TOKEN | ||
| API_TOKEN_BODY = {"email": API_TOKEN_EMAIL, "is_activated": True} | ||
| token = request_api("tokens", body=API_TOKEN_BODY, method="POST") |
|
|
||
| print(f"Saving token to {API_TOKEN_FILENAME}") | ||
| with open(API_TOKEN_FILENAME, "w") as f: | ||
| f.writelines(API_TOKEN) |
Check failure
Code scanning / CodeQL
Clear-text storage of sensitive information High
| method="POST", | ||
| ) | ||
|
|
||
| print(f"Check your email {API_TOKEN_EMAIL} for the dataset download notification.") |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 5 days ago
In general, to fix clear‑text logging of sensitive information, you should prevent sensitive values (passwords, tokens, emails, etc.) from being included in log messages. This can be done by either removing the sensitive data from the message, masking/redacting it, or replacing it with a generic placeholder that preserves functionality (e.g., “your configured email”) without exposing the exact value.
For this specific case, the simplest and least disruptive fix is to change the print call on line 288 so that it no longer interpolates API_TOKEN_EMAIL. The functional behavior of the script is to inform the user that they should check their email for a notification; this purpose is satisfied without echoing the actual address. We can rephrase the message to something like: Check your email for the dataset download notification. or, if necessary, “Check the email address you configured for the dataset download notification.” This change requires editing only that single line; no new imports, methods, or definitions are needed.
Concretely:
- In
api-examples/download-user-datasest-with-merged-objects.py, replace theprintstatement at line 288 to remove{API_TOKEN_EMAIL}. - Keep all surrounding logic (dataset creation, etc.) unchanged.
| @@ -285,7 +285,7 @@ | ||
| method="POST", | ||
| ) | ||
|
|
||
| print(f"Check your email {API_TOKEN_EMAIL} for the dataset download notification.") | ||
| print("Check your email for the dataset download notification.") | ||
|
|
||
| # 4. (Optional) Download Your Dataset | ||
| # NOTE: As an alternative to email notification, you can download your processed dataset via the API |
|
UPDATE: Per discussion, these methods are no longer used and removed from the example. |
|
|
||
| # This is all boilerplate to make it easier to make api calls | ||
| # API_RESOURCES is pulled from the list shown on https://api.scpca.alexslemonade.org/v1/ | ||
| API_BASE = "http://localhost:8000/v1/" # TODO: Temporarily point to localhost for testing |
There was a problem hiding this comment.
🗒️ The value of API_BASE will be updated to the production API and TODO will be removed before merging.
…ozomione/1850-api-example-user-dataset-with-mergd-projects
…s, and simplify get_data and 2. Prepare Your Dataset flow (e.g., remove variables for boolean flags)
…ctly in 2. Prepare Your Dataset
|
I've applied your feedback and made the following changes to simplify the flows:
This PR is ready for another look. Thank you, David! |
Issue Number
Closes #1850
Stacked PR of #1868
Purpose/Implementation Notes
This PR adds a new API Example file demonstrating how to create, process, and download a User Dataset with multiple projects including merged objects.
Types of changes
Functional tests
The implementation is tested via
loadlhost(*excluding processing API call for now).Checklist
Screenshots
N/A