Skip to content

API Example (Python): User Dataset with multiple entire projects with merged#1874

Open
nozomione wants to merge 7 commits intonozomone/1849-refactor-project-sample-endpointsfrom
nozomione/1850-api-example-user-dataset-with-mergd-projects
Open

API Example (Python): User Dataset with multiple entire projects with merged#1874
nozomione wants to merge 7 commits intonozomone/1849-refactor-project-sample-endpointsfrom
nozomione/1850-api-example-user-dataset-with-mergd-projects

Conversation

@nozomione
Copy link
Member

@nozomione nozomione commented Mar 12, 2026

Issue Number

Closes #1850

Stacked PR of #1868

Purpose/Implementation Notes

This PR adds a new API Example file demonstrating how to create, process, and download a User Dataset with multiple projects including merged objects.

Types of changes

  • New feature (non-breaking change which adds functionality)

Functional tests

The implementation is tested vialoadlhost (*excluding processing API call for now).

Checklist

  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in downstream modules

Screenshots

N/A

@nozomione nozomione self-assigned this Mar 12, 2026
@nozomione nozomione requested a review from davidsmejia as a code owner March 12, 2026 21:05
@nozomione nozomione added the API label Mar 12, 2026
if os.path.isfile(API_TOKEN_FILENAME):
with open(API_TOKEN_FILENAME, "r") as f:
API_TOKEN = f.readlines()[0].strip()
print("Using existing token", API_TOKEN)

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (password)
as clear text.

Copilot Autofix

AI 5 days ago

In general, to fix clear-text logging of sensitive information you should avoid writing secrets (passwords, tokens, private keys, etc.) directly to logs. If logging is desired for debugging, you can instead log non-sensitive metadata (such as whether a token was loaded, its source, or a truncated/masked version) that does not allow reconstruction or misuse of the secret.

Here, the problematic code is the print("Using existing token", API_TOKEN) after loading the token from .token. The script does not need to reveal the token value; it only needs to acknowledge that an existing token is being used. The best fix that preserves behavior is to change this print to omit the token value and instead log a neutral message like "Using existing token from .token" or just "Using existing token.". No additional imports or methods are required; we just adjust the message at line 223 in api-examples/download-user-datasest-with-merged-objects.py to stop including API_TOKEN.

Suggested changeset 1
api-examples/download-user-datasest-with-merged-objects.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/api-examples/download-user-datasest-with-merged-objects.py b/api-examples/download-user-datasest-with-merged-objects.py
--- a/api-examples/download-user-datasest-with-merged-objects.py
+++ b/api-examples/download-user-datasest-with-merged-objects.py
@@ -220,7 +220,7 @@
 if os.path.isfile(API_TOKEN_FILENAME):
     with open(API_TOKEN_FILENAME, "r") as f:
         API_TOKEN = f.readlines()[0].strip()
-        print("Using existing token", API_TOKEN)
+        print(f"Using existing token from {API_TOKEN_FILENAME}")
 else:
     print(f"Fetching token with {API_TOKEN_EMAIL}")
     # This is the payload that you need to send to /tokens to get an active API_TOKEN
EOF
@@ -220,7 +220,7 @@
if os.path.isfile(API_TOKEN_FILENAME):
with open(API_TOKEN_FILENAME, "r") as f:
API_TOKEN = f.readlines()[0].strip()
print("Using existing token", API_TOKEN)
print(f"Using existing token from {API_TOKEN_FILENAME}")
else:
print(f"Fetching token with {API_TOKEN_EMAIL}")
# This is the payload that you need to send to /tokens to get an active API_TOKEN
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
API_TOKEN = f.readlines()[0].strip()
print("Using existing token", API_TOKEN)
else:
print(f"Fetching token with {API_TOKEN_EMAIL}")

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (password)
as clear text.

Copilot Autofix

AI 5 days ago

In general, to fix clear-text logging of sensitive information, remove the sensitive value from log messages or mask/redact it before logging. Logs should contain only what is necessary for observability (e.g., that an action occurred), not private identifiers.

Here, the only problematic usage is in the print(f"Fetching token with {API_TOKEN_EMAIL}") statement. The best fix that preserves functionality is to change the log message so that it no longer includes the email address at all. The script only needs to inform the user that it is fetching a token; including the email is not required for the code to work or for debugging. So we should replace that line with a generic message such as print("Fetching token") or, if you want a hint that user configuration is involved, something like print("Fetching token using configured email") without interpolating the actual email value.

Concretely, in api-examples/download-user-datasest-with-merged-objects.py, around line 225 in the else branch where the token file does not yet exist, replace:

print(f"Fetching token with {API_TOKEN_EMAIL}")

with a version that omits API_TOKEN_EMAIL, e.g.:

print("Fetching token")

No new methods or imports are needed; this is a straightforward change to the log message.

Suggested changeset 1
api-examples/download-user-datasest-with-merged-objects.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/api-examples/download-user-datasest-with-merged-objects.py b/api-examples/download-user-datasest-with-merged-objects.py
--- a/api-examples/download-user-datasest-with-merged-objects.py
+++ b/api-examples/download-user-datasest-with-merged-objects.py
@@ -222,7 +222,7 @@
         API_TOKEN = f.readlines()[0].strip()
         print("Using existing token", API_TOKEN)
 else:
-    print(f"Fetching token with {API_TOKEN_EMAIL}")
+    print("Fetching token")
     # This is the payload that you need to send to /tokens to get an active API_TOKEN
     API_TOKEN_BODY = {"email": API_TOKEN_EMAIL, "is_activated": True}
     token = request_api("tokens", body=API_TOKEN_BODY, method="POST")
EOF
@@ -222,7 +222,7 @@
API_TOKEN = f.readlines()[0].strip()
print("Using existing token", API_TOKEN)
else:
print(f"Fetching token with {API_TOKEN_EMAIL}")
print("Fetching token")
# This is the payload that you need to send to /tokens to get an active API_TOKEN
API_TOKEN_BODY = {"email": API_TOKEN_EMAIL, "is_activated": True}
token = request_api("tokens", body=API_TOKEN_BODY, method="POST")
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated

print(f"Saving token to {API_TOKEN_FILENAME}")
with open(API_TOKEN_FILENAME, "w") as f:
f.writelines(API_TOKEN)

Check failure

Code scanning / CodeQL

Clear-text storage of sensitive information High

This expression stores
sensitive data (password)
as clear text.
method="POST",
)

print(f"Check your email {API_TOKEN_EMAIL} for the dataset download notification.")

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (password)
as clear text.

Copilot Autofix

AI 5 days ago

In general, to fix clear‑text logging of sensitive information, you should prevent sensitive values (passwords, tokens, emails, etc.) from being included in log messages. This can be done by either removing the sensitive data from the message, masking/redacting it, or replacing it with a generic placeholder that preserves functionality (e.g., “your configured email”) without exposing the exact value.

For this specific case, the simplest and least disruptive fix is to change the print call on line 288 so that it no longer interpolates API_TOKEN_EMAIL. The functional behavior of the script is to inform the user that they should check their email for a notification; this purpose is satisfied without echoing the actual address. We can rephrase the message to something like: Check your email for the dataset download notification. or, if necessary, “Check the email address you configured for the dataset download notification.” This change requires editing only that single line; no new imports, methods, or definitions are needed.

Concretely:

  • In api-examples/download-user-datasest-with-merged-objects.py, replace the print statement at line 288 to remove {API_TOKEN_EMAIL}.
  • Keep all surrounding logic (dataset creation, etc.) unchanged.
Suggested changeset 1
api-examples/download-user-datasest-with-merged-objects.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/api-examples/download-user-datasest-with-merged-objects.py b/api-examples/download-user-datasest-with-merged-objects.py
--- a/api-examples/download-user-datasest-with-merged-objects.py
+++ b/api-examples/download-user-datasest-with-merged-objects.py
@@ -285,7 +285,7 @@
         method="POST",
     )
 
-print(f"Check your email {API_TOKEN_EMAIL} for the dataset download notification.")
+print("Check your email for the dataset download notification.")
 
 # 4. (Optional) Download Your Dataset
 # NOTE: As an alternative to email notification, you can download your processed dataset via the API
EOF
@@ -285,7 +285,7 @@
method="POST",
)

print(f"Check your email {API_TOKEN_EMAIL} for the dataset download notification.")
print("Check your email for the dataset download notification.")

# 4. (Optional) Download Your Dataset
# NOTE: As an alternative to email notification, you can download your processed dataset via the API
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
@nozomione
Copy link
Member Author

nozomione commented Mar 12, 2026

@davidsmejia , currently there are two implementation approaches included for the custom get_data method:

  • Option 1: Use the projects endpoint only (Version 1: get_data)
    • Query a list of projects via request_api
    • Populate data locally using sample ID fields (e.g., modality_samples, multiplexed_samples) via get_data
  • Option 2: Use both the projects and samples endpoints (Version 2: get_data_by_samples)
    • Query a list of projects to retrieve project IDs via request_api
    • Make a second request to the samples endpoint using the fetched project IDs to populate data via get_data ,which internally calls request_api

I'd like to hear your thoughts. Thank you, David!

UPDATE: Per discussion, these methods are no longer used and removed from the example.


# This is all boilerplate to make it easier to make api calls
# API_RESOURCES is pulled from the list shown on https://api.scpca.alexslemonade.org/v1/
API_BASE = "http://localhost:8000/v1/" # TODO: Temporarily point to localhost for testing
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗒️ The value of API_BASE will be updated to the production API and TODO will be removed before merging.

@nozomione nozomione changed the base branch from dev to nozomone/1849-refactor-project-sample-endpoints March 12, 2026 21:47
…ozomione/1850-api-example-user-dataset-with-mergd-projects
…s, and simplify get_data and 2. Prepare Your Dataset flow (e.g., remove variables for boolean flags)
@nozomione
Copy link
Member Author

nozomione commented Mar 16, 2026

I've applied your feedback and made the following changes to simplify the flows:

  • Removed both get_data and get_data_by_samples and directly populate a data dictionary in the 2. Prepare Your Dataset section
  • Removed the logic for multiplexed samples
  • Removed variables for boolean flags (e.g., includes_merged, includes_bulk)

This PR is ready for another look. Thank you, David!

@nozomione nozomione requested review from davidsmejia and removed request for davidsmejia March 16, 2026 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API Example (Python): User Dataset with multiple entire projects with merged

1 participant