API Example (Python): User Dataset with multiple entire projects with merged by nozomione · Pull Request #1874 · AlexsLemonade/scpca-portal

nozomione · 2026-03-12T21:05:43Z

Issue Number

Stacked PR of #1868

Purpose/Implementation Notes

This PR adds a new API Example file demonstrating how to create, process, and download a User Dataset with multiple projects including merged objects.

Types of changes

New feature (non-breaking change which adds functionality)

Functional tests

The implementation is tested vialoadlhost (*excluding processing API call for now).

Checklist

Lint and unit tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)
Any dependent changes have been merged and published in downstream modules

Screenshots

N/A

api-examples/download-user-datasest-with-merged-objects.py

+if os.path.isfile(API_TOKEN_FILENAME):
+    with open(API_TOKEN_FILENAME, "r") as f:
+        API_TOKEN = f.readlines()[0].strip()
+        print("Using existing token", API_TOKEN)


In general, to fix clear-text logging of sensitive information you should avoid writing secrets (passwords, tokens, private keys, etc.) directly to logs. If logging is desired for debugging, you can instead log non-sensitive metadata (such as whether a token was loaded, its source, or a truncated/masked version) that does not allow reconstruction or misuse of the secret.

Here, the problematic code is the print("Using existing token", API_TOKEN) after loading the token from .token. The script does not need to reveal the token value; it only needs to acknowledge that an existing token is being used. The best fix that preserves behavior is to change this print to omit the token value and instead log a neutral message like "Using existing token from .token" or just "Using existing token.". No additional imports or methods are required; we just adjust the message at line 223 in api-examples/download-user-datasest-with-merged-objects.py to stop including API_TOKEN.

api-examples/download-user-datasest-with-merged-objects.py

+        API_TOKEN = f.readlines()[0].strip()
+        print("Using existing token", API_TOKEN)
+else:
+    print(f"Fetching token with {API_TOKEN_EMAIL}")


In general, to fix clear-text logging of sensitive information, remove the sensitive value from log messages or mask/redact it before logging. Logs should contain only what is necessary for observability (e.g., that an action occurred), not private identifiers.

Here, the only problematic usage is in the print(f"Fetching token with {API_TOKEN_EMAIL}") statement. The best fix that preserves functionality is to change the log message so that it no longer includes the email address at all. The script only needs to inform the user that it is fetching a token; including the email is not required for the code to work or for debugging. So we should replace that line with a generic message such as print("Fetching token") or, if you want a hint that user configuration is involved, something like print("Fetching token using configured email") without interpolating the actual email value.

Concretely, in api-examples/download-user-datasest-with-merged-objects.py, around line 225 in the else branch where the token file does not yet exist, replace:

print(f"Fetching token with {API_TOKEN_EMAIL}")

with a version that omits API_TOKEN_EMAIL, e.g.:

print("Fetching token")

No new methods or imports are needed; this is a straightforward change to the log message.

api-examples/download-user-datasest-with-merged-objects.py

+
+    print(f"Saving token to {API_TOKEN_FILENAME}")
+    with open(API_TOKEN_FILENAME, "w") as f:
+        f.writelines(API_TOKEN)


api-examples/download-user-datasest-with-merged-objects.py

+        method="POST",
+    )
+
+print(f"Check your email {API_TOKEN_EMAIL} for the dataset download notification.")


In general, to fix clear‑text logging of sensitive information, you should prevent sensitive values (passwords, tokens, emails, etc.) from being included in log messages. This can be done by either removing the sensitive data from the message, masking/redacting it, or replacing it with a generic placeholder that preserves functionality (e.g., “your configured email”) without exposing the exact value.

For this specific case, the simplest and least disruptive fix is to change the print call on line 288 so that it no longer interpolates API_TOKEN_EMAIL. The functional behavior of the script is to inform the user that they should check their email for a notification; this purpose is satisfied without echoing the actual address. We can rephrase the message to something like: Check your email for the dataset download notification. or, if necessary, “Check the email address you configured for the dataset download notification.” This change requires editing only that single line; no new imports, methods, or definitions are needed.

Concretely:

In api-examples/download-user-datasest-with-merged-objects.py, replace the print statement at line 288 to remove {API_TOKEN_EMAIL}.

Keep all surrounding logic (dataset creation, etc.) unchanged.

nozomione · 2026-03-12T21:07:57Z

~~@davidsmejia , currently there are two implementation approaches included for the custom get_data method:~~

Option 1: Use the projects endpoint only (Version 1: get_data)
- ~~Query a list of projects via request_api~~
- ~~Populate data locally using sample ID fields (e.g., modality_samples, multiplexed_samples) via get_data~~
Option 2: Use both the projects and samples endpoints (Version 2: get_data_by_samples)
- ~~Query a list of projects to retrieve project IDs via request_api~~
- ~~Make a second request to the samples endpoint using the fetched project IDs to populate data via get_data ,which internally calls request_api~~

~~I'd like to hear your thoughts. Thank you, David!~~

UPDATE: Per discussion, these methods are no longer used and removed from the example.

nozomione · 2026-03-12T21:41:48Z

api-examples/download-user-datasest-with-merged-objects.py

+
+# This is all boilerplate to make it easier to make api calls
+# API_RESOURCES is pulled from the list shown on https://api.scpca.alexslemonade.org/v1/
+API_BASE = "http://localhost:8000/v1/"  # TODO: Temporarily point to localhost for testing


🗒️ The value of API_BASE will be updated to the production API and TODO will be removed before merging.

…ozomione/1850-api-example-user-dataset-with-mergd-projects

…s, and simplify get_data and 2. Prepare Your Dataset flow (e.g., remove variables for boolean flags)

…ctly in 2. Prepare Your Dataset

nozomione · 2026-03-16T22:04:21Z

I've applied your feedback and made the following changes to simplify the flows:

Removed both get_data and get_data_by_samples and directly populate a data dictionary in the 2. Prepare Your Dataset section
Removed the logic for multiplexed samples
Removed variables for boolean flags (e.g., includes_merged, includes_bulk)

This PR is ready for another look. Thank you, David!

add API Example download-user-dataset-with-merged-objects.py

3d5e4be

nozomione self-assigned this Mar 12, 2026

nozomione requested a review from davidsmejia as a code owner March 12, 2026 21:05

nozomione added the API label Mar 12, 2026

github-advanced-security bot found potential problems Mar 12, 2026

View reviewed changes

(fix) correct the indentation of nested elseif

5ecfff0

vercel bot deployed to Preview March 12, 2026 21:40 View deployment

nozomione commented Mar 12, 2026

View reviewed changes

nozomione changed the base branch from dev to nozomone/1849-refactor-project-sample-endpoints March 12, 2026 21:47

(edit) improve comments and fix typos

bdb5de7

vercel bot deployed to Preview March 13, 2026 12:40 View deployment

(minor) improved the commentted block for clarity

4628339

vercel bot deployed to Preview March 13, 2026 15:30 View deployment

nozomione added 2 commits March 16, 2026 16:58

Merge branch 'nozomone/1849-refactor-project-sample-endpoints' into n…

9a250af

…ozomione/1850-api-example-user-dataset-with-mergd-projects

(edit) remove multiplexed logics, remove Version 2 get_data_by_sample…

381d92b

…s, and simplify get_data and 2. Prepare Your Dataset flow (e.g., remove variables for boolean flags)

vercel bot deployed to Preview March 16, 2026 21:29 View deployment

(edit) remove the get_data helper and populate a data dictionary dire…

086f73c

…ctly in 2. Prepare Your Dataset

vercel bot deployed to Preview March 16, 2026 21:58 View deployment

nozomione requested review from davidsmejia and removed request for davidsmejia March 16, 2026 22:09

@@ -285,7 +285,7 @@
                     method="POST",
                 )
-            print(f"Check your email {API_TOKEN_EMAIL} for the dataset download notification.")
+            print("Check your email for the dataset download notification.")
             # 4. (Optional) Download Your Dataset
             # NOTE: As an alternative to email notification, you can download your processed dataset via the API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

API Example (Python): User Dataset with multiple entire projects with merged#1874

API Example (Python): User Dataset with multiple entire projects with merged#1874
nozomione wants to merge 7 commits intonozomone/1849-refactor-project-sample-endpointsfrom
nozomione/1850-api-example-user-dataset-with-mergd-projects

nozomione commented Mar 12, 2026 •

edited

Loading

Uh oh!

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Check failure

Copilot Autofix

nozomione commented Mar 12, 2026 •

edited

Loading

Uh oh!

nozomione Mar 12, 2026

Uh oh!

nozomione commented Mar 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

@@ -222,7 +222,7 @@
                     API_TOKEN = f.readlines()[0].strip()
                     print("Using existing token", API_TOKEN)
             else:
-                print(f"Fetching token with {API_TOKEN_EMAIL}")
+                print("Fetching token")
                 # This is the payload that you need to send to /tokens to get an active API_TOKEN
                 API_TOKEN_BODY = {"email": API_TOKEN_EMAIL, "is_activated": True}
                 token = request_api("tokens", body=API_TOKEN_BODY, method="POST")

Uh oh!

Conversation

nozomione commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue Number

Purpose/Implementation Notes

Types of changes

Functional tests

Checklist

Screenshots

Uh oh!

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Check failure

Uh oh!

Copilot Autofix

nozomione commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nozomione Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

nozomione commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nozomione commented Mar 12, 2026 •

edited

Loading

nozomione commented Mar 12, 2026 •

edited

Loading

nozomione commented Mar 16, 2026 •

edited

Loading