-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Bug Description
When configuring the YouTube Data source connector with multiple channel_ids, the videos, video, and comments streams return no data. The streams only work correctly when a single channel ID is provided.
Environment
- Connector:
source-youtube-data - Connector Version: 0.0.45
- Airbyte Platform: Self-hosted (Airbyte OSS)
Steps to Reproduce
- Create a YouTube Data source with multiple channel IDs:
{ "credentials": { "auth_method": "api_key", "api_key": "YOUR_API_KEY" }, "channel_ids": ["UCJr72fY4cTaNZv7WPbvjaSw", "UC8lxnUR_CzruT2KA6cb7p0Q"] } - Run a sync
- Observe that
videosandvideostreams return 0 records
Expected Behavior
All configured channels should have their videos discovered and synced.
Actual Behavior
- With 1 channel:
videosandvideostreams work correctly - With 2+ channels:
videosandvideostreams return no data
Root Cause Analysis
Looking at the manifest.yaml, the videos stream passes the entire channel_ids array to the YouTube Search API's channelId parameter:
videos:
type: DeclarativeStream
name: videos
retriever:
requester:
path: search
request_parameters:
channelId: "{{ config.channel_ids }}" # BUG: passes arrayHowever, the YouTube Data API v3 Search endpoint only accepts a single channelId value, not an array. When an array is passed, the API likely rejects or ignores the invalid input.
This is in contrast to the channels stream which correctly uses a ListPartitionRouter to iterate over each channel ID:
channels:
retriever:
partition_router:
type: ListPartitionRouter
values: "{{ config.channel_ids }}"
request_option:
field_name: idSuggested Fix
Add a ListPartitionRouter to the videos stream (and channel_comments stream which has the same issue):
videos:
type: DeclarativeStream
name: videos
retriever:
type: SimpleRetriever
requester:
$ref: "#/definitions/base_requester"
path: search
http_method: GET
request_parameters:
type: video
record_selector:
type: RecordSelector
extractor:
type: DpathExtractor
field_path:
- items
- "*"
- id
paginator:
type: DefaultPaginator
page_token_option:
type: RequestOption
inject_into: request_parameter
field_name: pageToken
page_size_option:
type: RequestOption
inject_into: request_parameter
field_name: maxResults
pagination_strategy:
type: CursorPagination
page_size: 50
cursor_value: "{{ response.nextPageToken }}"
stop_condition: "{{ not response.get('nextPageToken') }}"
partition_router:
type: ListPartitionRouter
values: "{{ config.channel_ids }}"
cursor_field: channel_id
request_option:
type: RequestOption
inject_into: request_parameter
field_name: channelIdAffected Streams
videos- passes array tochannelIdparametervideo- depends onvideosas parent stream, so also brokencomments- depends onvideosas parent stream, so also brokenchannel_comments- same bug pattern withallThreadsRelatedToChannelId
Workaround
Create separate Airbyte sources for each YouTube channel instead of using a single source with multiple channel IDs.
Additional Context
- YouTube Data API Search documentation
- StackOverflow discussion confirming channelId only accepts single value
Internal Tracking: https://github.com/airbytehq/oncall/issues/11141