-
Notifications
You must be signed in to change notification settings - Fork 770
Implement user information storage and encryption in graphDBdataAccess #1497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Implement user information storage and encryption in graphDBdataAccess #1497
Conversation
| "password": encrypted_password or "", | ||
| "database": credentials.database or "neo4j", | ||
| } | ||
|
|
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
In general, the fix is to ensure that no sensitive information (passwords, tokens, secrets) is written to logs or standard output. If debugging is required, log only non-sensitive metadata (e.g., that encryption/decryption succeeded) rather than the secret values themselves.
For this specific case in backend/src/graphDB_dataAccess.py, the best fix without changing existing functionality is:
- Remove or replace the three debug
printstatements that output:- the plain password (
credentials.password), - the encrypted password (
encrypted_password), - and the decrypted password (
d_password).
- the plain password (
- Keep the integrity check (the call to
decrypt_text_aes_gcm) so that behavior is unchanged, but only log a generic success/failure message if needed. - Use the existing
loggingfacility instead ofprintif we still want confirmation that encryption/decryption worked, but without including any credential content.
Concretely:
- Around lines 697–700:
- Delete the three
printlines. - Optionally retain the decryption call to verify integrity, and wrap it in a
try/exceptif we want to treat decryption failure as an error. However, as per the current snippet, the decryption is already inside the outertryand will be covered by the existingexcept Exception as e, so we only need to keep the decryption call and remove the prints.
- Delete the three
- No new imports or helper methods are required.
-
Copy modified lines R697-R698
| @@ -694,10 +694,8 @@ | ||
| "database": credentials.database or "neo4j", | ||
| } | ||
|
|
||
| print("Plain Password:", credentials.password) | ||
| print("Encrypted Password:", encrypted_password) | ||
| d_password = decrypt_text_aes_gcm(encrypted_password, passphrase) # Test decryption to ensure integrity | ||
| print("Decrypted Password:", d_password) | ||
| # Test decryption to ensure integrity without logging sensitive data | ||
| d_password = decrypt_text_aes_gcm(encrypted_password, passphrase) | ||
|
|
||
| # Use MERGE to either create or update user info | ||
| query = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR updates the Neo4j connection-check flow to pass around a Neo4jCredentials object and adds functionality intended to persist user DB connection details (including an encrypted password) via graphDBdataAccess.
Changes:
- Refactors
connection_check_and_get_vector_dimensionsto accept acredentialsobject instead of individual parameters. - Adds AES-GCM encryption/decryption helpers and a new
save_user_information()method to persist user DB details. - Updates
/connectendpoint to use the refactored connection check and to callsave_user_information().
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
backend/src/main.py |
Updates helper wrapper signature to pass credentials object through. |
backend/src/graphDB_dataAccess.py |
Adds encryption helpers, refactors connection-check signature, and adds user info persistence logic. |
backend/score.py |
Updates /connect to call the refactored connection-check method and persist user info. |
Comments suppressed due to low confidence (1)
backend/src/graphDB_dataAccess.py:275
- The docstring for
connection_check_and_get_vector_dimensionsstill documents the old parameter list (database, email, uri). Since the method now takes a singlecredentialsobject, the Args section is misleading and should be updated to reflect the new input shape.
def connection_check_and_get_vector_dimensions(self, credentials):
"""
Get the vector index dimension from database and application configuration and DB connection status
Args:
uri: URI of the graph to extract
userName: Username to use for graph creation ( if None will use username from config file )
password: Password to use for graph creation ( if None will use password from config file )
db_name: db_name is database name to connect to graph db
Returns:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| result = await asyncio.to_thread(graph_DB_dataAccess.connection_check_and_get_vector_dimensions, credentials) | ||
| await asyncio.to_thread(graph_DB_dataAccess.save_user_information, credentials) | ||
| gcs_cache = get_value_from_env("GCS_FILE_CACHE","False","bool") |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
save_user_information() returns a status dict on failure (e.g., missing ENCRYPTION_PASSPHRASE or missing email), but the result is ignored here, so /connect can return Success even though user info wasn’t persisted. Consider checking the returned status and either (a) fail the request, or (b) include a warning/message in the response when persistence fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| TOKEN_TRACKER_DB_DATABASE="" #Mandatory if TRACK_TOKEN_USAGE is true | ||
| AUTHENTICATION_REQUIRED="" #OPTIONAL- Default_Value = "False" -- Enable authentication for API access No newline at end of file | ||
| AUTHENTICATION_REQUIRED="" #OPTIONAL- Default_Value = "False" -- Enable authentication for API access | ||
| ENCRYPTION_PASSPHRASE="" #REQUIRED- Default_Value = "Randomly generated string" -- Passphrase used for encrypting sensitive data in the database, should be 32 bytes base64 encoded for AES-256-GCM No newline at end of file |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ENCRYPTION_PASSPHRASE description says it should be “32 bytes base64 encoded for AES-256-GCM”, but the implementation derives a key from an arbitrary passphrase via PBKDF2 and does not base64-decode this value. Update this comment to match actual behavior, or update the code to accept/decode a base64 key if that’s the intended contract.
| ENCRYPTION_PASSPHRASE="" #REQUIRED- Default_Value = "Randomly generated string" -- Passphrase used for encrypting sensitive data in the database, should be 32 bytes base64 encoded for AES-256-GCM | |
| ENCRYPTION_PASSPHRASE="" #REQUIRED- Default_Value = "Randomly generated string" -- Passphrase used for encrypting sensitive data in the database; it is processed via PBKDF2 to derive the AES-256-GCM encryption key, so use a strong, randomly generated passphrase |
| def connection_check_and_get_vector_dimensions(graph, credentials): | ||
| graph_DB_dataAccess = graphDBdataAccess(graph) | ||
| return graph_DB_dataAccess.connection_check_and_get_vector_dimensions(database, email, uri) | ||
| return graph_DB_dataAccess.connection_check_and_get_vector_dimensions(credentials) |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connection_check_and_get_vector_dimensions was refactored to accept (graph, credentials), but there are still call sites using the previous (graph, database, email, uri) signature (e.g. backend/score.py later calls it with 4 args). This will raise a TypeError at runtime. Either keep this wrapper backward-compatible (accept both signatures) or update all call sites to pass a Neo4jCredentials object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Prepare MERGE params; we store db info on User node keyed by email | ||
| params = { | ||
| "email": credentials.email, | ||
| "db_url": credentials.uri or "", | ||
| "username": credentials.userName or "", | ||
| "password": encrypted_password or "", | ||
| "database": credentials.database or "neo4j", | ||
| } |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
save_user_information always keys the MERGE on email, but email is optional when AUTHENTICATION_REQUIRED is false. In that mode this will either fail (Neo4j cannot MERGE on a null property) or create inconsistent user records, and it also diverges from existing user lookup logic (e.g., get_user_embedding_model / track_token_usage merge by db_url when auth is off). Align the MERGE key with AUTHENTICATION_REQUIRED (email vs db_url) and normalize values similarly to track_token_usage.
backend/src/graphDB_dataAccess.py
Outdated
| finally: | ||
| if graph: | ||
| graph.close() No newline at end of file |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connection cleanup uses graph.close(), but elsewhere in the codebase Neo4jGraph connections are closed via graph._driver.close() after checking graph._driver._closed (e.g., backend/src/main.py:759-761). If Neo4jGraph doesn’t expose close(), this will raise in the finally block and mask the original error. Consider following the existing pattern and closing the underlying driver safely.
| result = await asyncio.to_thread(graph_DB_dataAccess.connection_check_and_get_vector_dimensions, credentials) | ||
| await asyncio.to_thread(graph_DB_dataAccess.save_user_information, credentials) |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connection_check_and_get_vector_dimensions was refactored to take a single credentials argument (and is now also called as a method). There is still a call site in this file (/backend_connection_configuration) that invokes it with the old signature (graph, database, None, uri), which will raise a TypeError at runtime. Please update that endpoint to pass a Neo4jCredentials object (or switch it to graphDBdataAccess(...).connection_check_and_get_vector_dimensions(credentials)).
| kdf = PBKDF2HMAC( | ||
| algorithm=hashes.SHA256(), | ||
| length=32, | ||
| salt=salt, | ||
| iterations=iterations, | ||
| backend=default_backend() |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PBKDF2HMAC no longer requires (and in newer cryptography versions deprecates/removes) the explicit backend=default_backend() argument. Keeping this can break installs depending on the pinned cryptography version. Consider removing the default_backend import and the backend=... parameter to keep the KDF call forward-compatible.
…n handling in graphDBdataAccess
No description provided.