Skip to content

Commit 73290de

Browse files
feat: add spike investigation for StreamThreadException in Bing Ads
- Document root cause analysis of UTF-8 decoding error with GZIP data - Identify issue in CompositeRawDecoder parser selection logic - Outline investigation areas and proposed fixes for concurrent source framework - Reference issue #8301 with campaign_labels stream error Co-Authored-By: unknown <>
1 parent 1c9049a commit 73290de

File tree

1 file changed

+68
-0
lines changed

1 file changed

+68
-0
lines changed

SPIKE_INVESTIGATION.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Spike Investigation: StreamThreadException in Bing Ads Source
2+
3+
## Issue Summary
4+
- **Issue**: [#8301](https://github.com/airbytehq/oncall/issues/8301) - StreamThreadException in Bing Ads source
5+
- **Error**: `'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte`
6+
- **Stream**: `campaign_labels`
7+
- **Root Cause**: GZIP-compressed data being treated as UTF-8 text
8+
9+
## Analysis
10+
11+
### Error Context
12+
From Christo's clarification in the issue:
13+
```
14+
Exception while syncing stream campaign_labels: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
15+
```
16+
17+
The byte `0x8b` is the GZIP magic number, indicating that compressed data is being passed to a UTF-8 decoder.
18+
19+
### Technical Investigation
20+
21+
#### 1. Bing Ads Connector Configuration
22+
- Uses `GzipDecoder` with `CsvDecoder` for bulk streams
23+
- Encoding: `utf-8-sig`
24+
- Stream: `campaign_labels` with `DownloadEntities: ["CampaignLabels"]`
25+
26+
#### 2. Concurrent Source Framework
27+
- `StreamThreadException` wraps exceptions from concurrent processing
28+
- `CompositeRawDecoder` handles response decoding with multiple parsers
29+
- `GzipParser` decompresses GZIP data before passing to inner parsers
30+
31+
#### 3. Root Cause Analysis
32+
The issue occurs in the concurrent source framework when:
33+
1. GZIP-compressed response is received
34+
2. Parser selection logic fails to detect GZIP content-encoding
35+
3. Compressed data (starting with 0x8b) is passed directly to UTF-8 decoder
36+
4. UTF-8 decoder fails with the observed error
37+
5. Exception is wrapped in `StreamThreadException`
38+
39+
## Proposed Investigation Areas
40+
41+
### 1. Parser Selection Logic
42+
- Examine `CompositeRawDecoder._select_parser()` method
43+
- Check header-based parser selection for GZIP content
44+
- Investigate concurrent source integration with declarative decoders
45+
46+
### 2. Error Handling
47+
- Review exception propagation in concurrent processing
48+
- Check if GZIP decompression errors are properly handled
49+
- Examine fallback mechanisms for parser failures
50+
51+
### 3. Integration Points
52+
- Analyze how `ConcurrentDeclarativeSource` handles bulk streams
53+
- Check if declarative decoders are properly integrated with concurrent framework
54+
- Investigate state management during concurrent processing
55+
56+
## Next Steps
57+
58+
1. Create test cases to reproduce the issue
59+
2. Implement parser selection improvements
60+
3. Add better error handling for GZIP decompression
61+
4. Test with Bing Ads campaign_labels stream
62+
5. Validate fix doesn't break other connectors
63+
64+
## Files to Investigate
65+
- `airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py`
66+
- `airbyte_cdk/sources/concurrent_source/concurrent_read_processor.py`
67+
- `airbyte_cdk/sources/declarative/concurrent_declarative_source.py`
68+
- Bing Ads manifest configuration for bulk streams

0 commit comments

Comments
 (0)