Skip to content

Conversation

@jpvajda
Copy link
Contributor

@jpvajda jpvajda commented Nov 25, 2025

PR Summary

Fix: String|String[] parameter URL encoding for WebSocket clients

Issue: #629

Problem:
When passing list values to keyterm, keywords, replace, and search parameters in WebSocket clients, the SDK incorrectly URL-encoded the entire list as a string (e.g., keyterm=%5B%27TERM1%27%2C+%27TERM2%27%5D) instead of creating multiple query parameters (e.g., keyterm=TERM1&keyterm=TERM2).

Solution:

  • Updated type signatures to accept typing.Union[str, typing.Sequence[str]]
  • Added runtime logic to check parameter type and iterate over sequences
  • Each list item now correctly adds a separate query parameter

Files Changed:

  • src/deepgram/listen/v1/client.py - Fixed: keyterm, keywords, replace, search
  • src/deepgram/listen/v1/raw_client.py - Fixed: keyterm, keywords, replace, search
  • src/deepgram/listen/v2/client.py - Fixed: keyterm
  • src/deepgram/listen/v2/raw_client.py - Fixed: keyterm

Testing:

  • ✅ All 467 unit tests pass
  • ✅ Validated with reproduction scripts showing correct URL encoding
  • ✅ Backward compatible (single strings still work)

Note: REST/media API clients already handled this correctly; only WebSocket clients were affected.

Reproduction Script used for testing

"""
Simple validation test - captures actual WebSocket URLs and validates encoding
"""
import os
import sys
from unittest.mock import patch, MagicMock
from deepgram import DeepgramClient

# Get API key
api_key = os.getenv("DEEPGRAM_API_KEY", "test_key")
client = DeepgramClient(api_key=api_key)

print("="*80)
print("SDK FIX VALIDATION - URL ENCODING TEST")
print("="*80)

results = {}

def test_parameter(test_name, connect_func, expected_url_pattern):
    """Test a parameter and validate the URL encoding"""
    print(f"\n{test_name}")
    print("-"*80)

    captured_url = None

    # Mock the websocket to capture the URL
    with patch('websockets.sync.client.connect') as mock_connect:
        mock_protocol = MagicMock()
        mock_connect.return_value.__enter__ = lambda self: mock_protocol
        mock_connect.return_value.__exit__ = lambda self, *args: None

        try:
            connect_func()
            # Get the URL that was passed to websocket connect
            captured_url = mock_connect.call_args[0][0] if mock_connect.called else None
        except Exception as e:
            print(f"Error during connection: {e}")

    if not captured_url:
        print(f"❌ FAIL - No URL captured")
        results[test_name] = False
        return

    print(f"Captured URL: {captured_url}")
    print(f"Expected pattern: {expected_url_pattern}")

    # Check if the URL contains the expected pattern
    if expected_url_pattern in captured_url:
        print(f"✅ PASS - URL encoding is correct")
        results[test_name] = True
    else:
        print(f"❌ FAIL - URL encoding is incorrect")
        results[test_name] = False

# TEST 1: v1 keyterm with list
def test1():
    with client.listen.v1.connect(model="nova-3", keyterm=["TERM1", "TERM2"]) as conn:
        pass

test_parameter(
    "TEST 1: v1 keyterm=['TERM1', 'TERM2']",
    test1,
    "keyterm=TERM1&keyterm=TERM2"
)

# TEST 2: v1 keywords with list
def test2():
    with client.listen.v1.connect(model="nova-2", keywords=["word1", "word2"]) as conn:
        pass

test_parameter(
    "TEST 2: v1 keywords=['word1', 'word2']",
    test2,
    "keywords=word1&keywords=word2"
)

# TEST 3: v1 replace with list
def test3():
    with client.listen.v1.connect(model="nova-3", replace=["old1:new1", "old2:new2"]) as conn:
        pass

test_parameter(
    "TEST 3: v1 replace=['old1:new1', 'old2:new2']",
    test3,
    "replace=old1%3Anew1&replace=old2%3Anew2"
)

# TEST 4: v1 search with list
def test4():
    with client.listen.v1.connect(model="nova-3", search=["term1", "term2"]) as conn:
        pass

test_parameter(
    "TEST 4: v1 search=['term1', 'term2']",
    test4,
    "search=term1&search=term2"
)

# TEST 5: v2 keyterm with list
def test5():
    with client.listen.v2.connect(
        model="flux-general-en",
        encoding="linear16",
        sample_rate="16000",
        keyterm=["TERM1", "TERM2"]
    ) as conn:
        pass

test_parameter(
    "TEST 5: v2 keyterm=['TERM1', 'TERM2']",
    test5,
    "keyterm=TERM1&keyterm=TERM2"
)

# SUMMARY
print("\n" + "="*80)
print("RESULTS")
print("="*80)

all_passed = all(results.values())
for test_name, passed in results.items():
    status = "✅ PASS" if passed else "❌ FAIL"
    print(f"{status} - {test_name}")

print("="*80)
if all_passed:
    print("🎉 ALL TESTS PASSED")
    sys.exit(0)
else:
    print("❌ SOME TESTS FAILED")
    sys.exit(1)



Summary by CodeRabbit

  • New Features
    • Enhanced search and listen parameters: keyterm, keywords, replace, and search now accept multiple values in addition to single values, enabling more powerful and flexible query configurations across all API versions.

✏️ Tip: You can customize this high-level summary in your review settings.


@jpvajda jpvajda requested a review from lukeocodes as a code owner November 25, 2025 22:15
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 25, 2025

Walkthrough

The changes extend four query parameters (keyterm, keywords, replace, search) in V1 listen clients and keyterm in V2 listen clients to accept either a single string or a sequence of strings. Parameters are expanded into multiple query entries when sequences are provided, maintaining backward compatibility with single-string inputs.

Changes

Cohort / File(s) Summary
V1 listen clients – type expansion and multi-value handling
src/deepgram/listen/v1/client.py, src/deepgram/listen/v1/raw_client.py
Widened keyterm, keywords, replace, and search parameter types from Optional[str] to Optional[Union[str, Sequence[str]]] in both V1Client/RawV1Client and AsyncV1Client/AsyncRawV1Client connect methods. Updated query parameter construction to iterate over sequences and add multiple entries per parameter.
V2 listen clients – keyterm expansion
src/deepgram/listen/v2/client.py, src/deepgram/listen/v2/raw_client.py
Widened keyterm parameter type from Optional[str] to Optional[Union[str, Sequence[str]]] in both V2Client/RawV2Client and AsyncV2Client/AsyncRawV2Client connect methods. Updated parameter serialization to handle both single strings and sequences, adding multiple keyterm entries for sequences.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • All changes follow a consistent, repetitive pattern across four files
  • Type signature widening is mechanical and straightforward
  • Parameter iteration logic is consistent and non-complex
  • No substantial new control flow or error-handling changes

Possibly related PRs

Suggested reviewers

  • lukeocodes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing URL encoding issues with string or string array parameters, which directly addresses the core problem resolved in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/keyterms

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
src/deepgram/listen/v2/client.py (1)

46-47: Multi-value keyterm handling is correct and backward compatible; docstring slightly out of sync

The widened type and isinstance(keyterm, str) / else: for term in keyterm pattern correctly:

  • preserves single-string behavior,
  • expands sequences into repeated keyterm=... entries, and
  • avoids accidentally iterating over characters for plain strings.

Only minor nit: the keyterm docstrings still state typing.Optional[str], which no longer matches the annotated type. If these docstrings are not fully generator-owned, consider updating them (or the generator) to reflect Optional[Union[str, Sequence[str]]].

Also applies to: 102-107, 161-162, 217-222

src/deepgram/listen/v2/raw_client.py (1)

34-35: Raw V2 keyterm multi-value support matches client behavior; consider docstring alignment

The raw sync/async V2 connect methods now mirror the high-level client behavior:

  • strings yield a single keyterm query param,
  • sequences yield multiple keyterm params, one per item.

This fixes the list-encoding issue while maintaining compatibility for existing string callers. As with the client, the docstrings for keyterm still advertise typing.Optional[str]; if these docs aren’t purely generated, consider updating them (or the generator) to describe the Optional[Union[str, Sequence[str]]] input.

Also applies to: 90-95, 138-139, 194-199

src/deepgram/listen/v1/client.py (1)

56-57: V1 multi-value handling for keyterm/keywords/replace/search looks good; docs lag the types

The updated signatures and query-param logic for these four fields in both V1Client.connect and AsyncV1Client.connect correctly:

  • Preserve old behavior for single strings.
  • Expand sequences into repeated query params (e.g., search=foo&search=bar).
  • Avoid treating strings as sequences of characters via the isinstance(..., str) guard.
  • Keep sync/async behavior consistent.

The effective behavior change for empty lists (now emitting no param rather than a stringified list) is an improvement on the previous broken encoding.

Only small suggestion: the parameter docstrings for keyterm, keywords, replace, and search still say typing.Optional[str]; if feasible, consider updating the generator or docs to reflect Optional[Union[str, Sequence[str]]] so users see the multi-value capability in generated documentation.

Also applies to: 66-68, 169-180, 197-210, 283-285, 293-295, 396-407, 424-437

src/deepgram/listen/v1/raw_client.py (1)

37-38: Raw V1 multi-value parameter behavior is consistent and correct; docstrings could be updated

The raw v1 sync and async connect methods now:

  • Accept Optional[Union[str, Sequence[str]]] for keyterm, keywords, replace, and search.
  • Correctly emit one query param per value when a sequence is provided, while keeping legacy single-string usage intact.
  • Mirror the high-level V1 client behavior so there’s no divergence between raw and wrapped APIs.

As elsewhere, the docstrings for these parameters still show typing.Optional[str]. If you control the generator or these docs, it would be a small win to update them to reflect the union type so that SDK users discover the new multi-value support from the documentation as well.

Also applies to: 47-49, 150-161, 178-191, 243-245, 253-255, 356-367, 384-397

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78ac7b7 and a2346e4.

📒 Files selected for processing (4)
  • src/deepgram/listen/v1/client.py (8 hunks)
  • src/deepgram/listen/v1/raw_client.py (8 hunks)
  • src/deepgram/listen/v2/client.py (4 hunks)
  • src/deepgram/listen/v2/raw_client.py (4 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2024-07-01T19:17:04.194Z
Learnt from: dvonthenen
Repo: deepgram/deepgram-python-sdk PR: 426
File: deepgram/clients/listen/v1/__init__.py:36-43
Timestamp: 2024-07-01T19:17:04.194Z
Learning: Unused imports in `deepgram/clients/listen/v1/__init__.py` are retained to maintain backward compatibility and should not be flagged for removal in reviews.

Applied to files:

  • src/deepgram/listen/v1/client.py
  • src/deepgram/listen/v1/raw_client.py
📚 Learning: 2024-10-09T02:19:46.087Z
Learnt from: dvonthenen
Repo: deepgram/deepgram-python-sdk PR: 426
File: deepgram/clients/listen/v1/rest/options.py:12-12
Timestamp: 2024-10-09T02:19:46.087Z
Learning: Unused imports in `deepgram/clients/listen/v1/rest/options.py` are retained to maintain backwards compatibility.

Applied to files:

  • src/deepgram/listen/v1/client.py
📚 Learning: 2024-07-01T19:21:39.778Z
Learnt from: dvonthenen
Repo: deepgram/deepgram-python-sdk PR: 426
File: deepgram/clients/listen/v1/websocket/__init__.py:8-8
Timestamp: 2024-07-01T19:21:39.778Z
Learning: Unused imports in `deepgram/clients/listen/v1/websocket/__init__.py` are retained to maintain backward compatibility and should not be flagged for removal in reviews.

Applied to files:

  • src/deepgram/listen/v1/raw_client.py
📚 Learning: 2025-06-22T17:02:32.416Z
Learnt from: lukeocodes
Repo: deepgram/deepgram-python-sdk PR: 543
File: tests/unit_test/test_unit_authentication.py:67-87
Timestamp: 2025-06-22T17:02:32.416Z
Learning: In the Deepgram Python SDK, DeepgramClient("arbitrary-string") should be treated as an API key for backward compatibility with existing code patterns.

Applied to files:

  • src/deepgram/listen/v2/raw_client.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants