Skip to content

Commit f08ebad

Browse files
author
Seelam Balaji Nikitha
committed
Added trasformation function with unit test cases
1 parent 6504148 commit f08ebad

File tree

11 files changed

+802
-0
lines changed

11 files changed

+802
-0
lines changed

.DS_Store

6 KB
Binary file not shown.

.github/copilot-instructions.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Airbyte Python CDK AI Development Guide
2+
3+
This guide provides essential context for AI agents working with the Airbyte Python CDK codebase.
4+
5+
## Project Overview
6+
7+
The Airbyte Python CDK is a framework for building Source Connectors for the Airbyte data integration platform. It provides components for:
8+
9+
- HTTP API connectors (REST, GraphQL)
10+
- Declarative connectors using manifest files
11+
- File-based source connectors
12+
- Vector database destinations
13+
- Concurrent data fetching
14+
15+
## Key Architectural Concepts
16+
17+
### Core Components
18+
19+
- **Source Classes**: Implement the `Source` interface in `airbyte_cdk.sources.source`. Base implementations include:
20+
- `AbstractSource` - Base class for Python sources
21+
- `DeclarativeSource` - For low-code connectors defined via manifest files
22+
- `ConcurrentSource` - For high-throughput parallel data fetching
23+
24+
- **Streams**: Core abstraction for data sources (`airbyte_cdk.sources.streams.Stream`). Key types:
25+
- `HttpStream` - Base class for HTTP API streams
26+
- `DefaultStream` - Used with declarative sources
27+
- Concurrent streams in `airbyte_cdk.sources.streams.concurrent`
28+
29+
### Data Flow
30+
1. Sources expose one or more Stream implementations
31+
2. Streams define schema, state management, and record extraction
32+
3. Records flow through the Airbyte protocol via standardized message types
33+
34+
## Development Conventions
35+
36+
### Testing Patterns
37+
38+
- Unit tests use pytest with scenarios pattern (`unit_tests/sources/**/test_*.py`)
39+
- Mock HTTP responses with `HttpMocker` and response builders
40+
- Standard test suite base classes in `airbyte_cdk.test.standard_tests`
41+
- Use `@pytest.mark.parametrize` for test variations
42+
43+
### Source Implementation
44+
45+
- Prefer declarative manifests using `SourceDeclarativeManifest` for simple API connectors
46+
- Extend base classes for custom logic:
47+
```python
48+
from airbyte_cdk.sources import AbstractSource
49+
from airbyte_cdk.sources.streams import Stream
50+
51+
class MySource(AbstractSource):
52+
def check_connection(...):
53+
# Verify credentials/connectivity
54+
55+
def streams(self, config):
56+
return [MyStream(config)]
57+
```
58+
59+
### State Management
60+
61+
- Use `ConnectorStateManager` for handling incremental sync state
62+
- Implement cursor fields in streams for incremental syncs
63+
- State is persisted as JSON-serializable objects
64+
65+
## Common Workflows
66+
67+
### Building a New Connector
68+
69+
1. Start with [Connector Builder UI](https://docs.airbyte.com/connector-development/connector-builder-ui/overview)
70+
2. For complex cases, use low-code CDK with manifest files
71+
3. Custom Python implementation only when necessary
72+
73+
### Testing
74+
75+
```bash
76+
pytest unit_tests/ # Run all tests
77+
pytest unit_tests/sources/my_connector/ # Test specific connector
78+
```
79+
80+
### Dependencies
81+
82+
- Manage with Poetry (`pyproject.toml`)
83+
- Core requirements locked in `poetry.lock`
84+
- Optional features via extras in `pyproject.toml`
85+
86+
## Integration Points
87+
88+
- Airbyte Protocol: Messages must conform to protocol models in `airbyte_cdk.models`
89+
- External APIs: Use `HttpStream` with proper rate limiting
90+
- Vector DBs: Implement destination logic using `destinations.vector_db_based`
91+
92+
## Key Files
93+
94+
- `airbyte_cdk/sources/abstract_source.py`: Base source implementation
95+
- `airbyte_cdk/sources/streams/http/http.py`: HTTP stream base class
96+
- `airbyte_cdk/sources/declarative/`: Low-code CDK components
97+
- `unit_tests/sources/`: Test examples and patterns
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
"""Unit tests for cleaning transforms."""
2+
import pytest
3+
from airbyte_cdk.utils.transforms.cleaning import (
4+
to_lower,
5+
strip_whitespace,
6+
squash_whitespace,
7+
normalize_unicode,
8+
remove_punctuation,
9+
map_values,
10+
cast_numeric,
11+
)
12+
13+
def test_to_lower():
14+
"""Test string lowercasing function."""
15+
# Test normal cases
16+
assert to_lower("Hello") == "hello"
17+
assert to_lower("HELLO") == "hello"
18+
assert to_lower("HeLLo") == "hello"
19+
20+
# Test with spaces and special characters
21+
assert to_lower("Hello World!") == "hello world!"
22+
assert to_lower("Hello123") == "hello123"
23+
24+
# Test empty and None
25+
assert to_lower("") == ""
26+
assert to_lower(None) is None
27+
28+
def test_strip_whitespace():
29+
"""Test whitespace stripping function."""
30+
# Test normal cases
31+
assert strip_whitespace(" hello ") == "hello"
32+
assert strip_whitespace("hello") == "hello"
33+
34+
# Test with tabs and newlines
35+
assert strip_whitespace("\thello\n") == "hello"
36+
assert strip_whitespace(" hello\n world ") == "hello\n world"
37+
38+
# Test empty and None
39+
assert strip_whitespace(" ") == ""
40+
assert strip_whitespace("") == ""
41+
assert strip_whitespace(None) is None
42+
43+
def test_squash_whitespace():
44+
"""Test whitespace squashing function."""
45+
# Test normal cases
46+
assert squash_whitespace("hello world") == "hello world"
47+
assert squash_whitespace(" hello world ") == "hello world"
48+
49+
# Test with tabs and newlines
50+
assert squash_whitespace("hello\n\nworld") == "hello world"
51+
assert squash_whitespace("hello\t\tworld") == "hello world"
52+
assert squash_whitespace("\n hello \t world \n") == "hello world"
53+
54+
# Test empty and None
55+
assert squash_whitespace(" ") == ""
56+
assert squash_whitespace("") == ""
57+
assert squash_whitespace(None) is None
58+
59+
def test_normalize_unicode():
60+
"""Test unicode normalization function."""
61+
# Test normal cases
62+
assert normalize_unicode("hello") == "hello"
63+
64+
# Test composed characters
65+
assert normalize_unicode("café") == "café" # Composed 'é'
66+
67+
# Test decomposed characters
68+
decomposed = "cafe\u0301" # 'e' with combining acute accent
69+
assert normalize_unicode(decomposed) == "café" # Should normalize to composed form
70+
71+
# Test different normalization forms
72+
assert normalize_unicode("café", form="NFD") != normalize_unicode("café", form="NFC")
73+
74+
# Test empty and None
75+
assert normalize_unicode("") == ""
76+
assert normalize_unicode(None) is None
77+
78+
def test_remove_punctuation():
79+
"""Test punctuation removal function."""
80+
# Test normal cases
81+
assert remove_punctuation("hello, world!") == "hello world"
82+
assert remove_punctuation("hello.world") == "helloworld"
83+
84+
# Test with multiple punctuation marks
85+
assert remove_punctuation("hello!!! world???") == "hello world"
86+
assert remove_punctuation("hello@#$%world") == "helloworld"
87+
88+
# Test with unicode punctuation
89+
assert remove_punctuation("hello—world") == "helloworld"
90+
assert remove_punctuation("«hello»") == "hello"
91+
92+
# Test empty and None
93+
assert remove_punctuation("") == ""
94+
assert remove_punctuation(None) is None
95+
96+
def test_map_values():
97+
"""Test value mapping function."""
98+
mapping = {"a": 1, "b": 2, "c": 3}
99+
100+
# Test normal cases
101+
assert map_values("a", mapping) == 1
102+
assert map_values("b", mapping) == 2
103+
104+
# Test with default value
105+
assert map_values("x", mapping) is None
106+
assert map_values("x", mapping, default=0) == 0
107+
108+
# Test with different value types
109+
mixed_mapping = {1: "one", "two": 2, None: "null"}
110+
assert map_values(1, mixed_mapping) == "one"
111+
assert map_values(None, mixed_mapping) == "null"
112+
113+
def test_cast_numeric():
114+
"""Test numeric casting function."""
115+
# Test successful casts
116+
assert cast_numeric("123") == 123
117+
assert cast_numeric("123.45") == 123.45
118+
assert cast_numeric(123) == 123
119+
assert cast_numeric(123.45) == 123.45
120+
121+
# Test integers vs floats
122+
assert isinstance(cast_numeric("123"), int)
123+
assert isinstance(cast_numeric("123.45"), float)
124+
125+
# Test empty values
126+
assert cast_numeric(None) is None
127+
assert cast_numeric("", on_error="none") is None # Need to specify on_error="none" to get None for empty string
128+
assert cast_numeric(" ", on_error="none") is None # Need to specify on_error="none" to get None for whitespace
129+
130+
# Test empty values with default behavior (on_error="ignore")
131+
assert cast_numeric("") == ""
132+
assert cast_numeric(" ") == " "
133+
134+
# Test error handling modes
135+
non_numeric = "abc"
136+
assert cast_numeric(non_numeric, on_error="ignore") == non_numeric
137+
assert cast_numeric(non_numeric, on_error="none") is None
138+
assert cast_numeric(non_numeric, on_error="default", default=0) == 0
139+
140+
# Test error raising
141+
with pytest.raises(Exception):
142+
cast_numeric(non_numeric, on_error="raise")
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
"""Unit tests for date transforms."""
2+
from datetime import datetime
3+
4+
from airbyte_cdk.utils.transforms.date import (
5+
try_parse_date,
6+
extract_date_parts,
7+
floor_to_month,
8+
ceil_to_month,
9+
)
10+
11+
def test_try_parse_date():
12+
"""Test date parsing function."""
13+
# Test with datetime object
14+
dt = datetime(2023, 1, 15)
15+
assert try_parse_date(dt) == dt
16+
17+
# Test with non-date object
18+
assert try_parse_date("2023-01-15") is None
19+
assert try_parse_date(123) is None
20+
assert try_parse_date(None) is None
21+
22+
def test_extract_date_parts():
23+
"""Test date parts extraction function."""
24+
# Test with valid datetime
25+
dt = datetime(2023, 1, 15) # Sunday
26+
parts = extract_date_parts(dt)
27+
assert parts["year"] == 2023
28+
assert parts["month"] == 1
29+
assert parts["day"] == 15
30+
assert parts["dow"] == 6 # Sunday is 6
31+
32+
# Test with invalid input
33+
parts = extract_date_parts(None)
34+
assert all(v is None for v in parts.values())
35+
36+
parts = extract_date_parts("not a date")
37+
assert all(v is None for v in parts.values())
38+
39+
def test_floor_to_month():
40+
"""Test floor to month function."""
41+
# Test normal cases
42+
dt = datetime(2023, 1, 15)
43+
assert floor_to_month(dt) == datetime(2023, 1, 1)
44+
45+
dt = datetime(2023, 12, 31)
46+
assert floor_to_month(dt) == datetime(2023, 12, 1)
47+
48+
# Test first day of month
49+
dt = datetime(2023, 1, 1)
50+
assert floor_to_month(dt) == dt
51+
52+
# Test with invalid input
53+
assert floor_to_month(None) is None
54+
assert floor_to_month("not a date") is None
55+
56+
def test_ceil_to_month():
57+
"""Test ceil to month function."""
58+
# Test normal cases
59+
dt = datetime(2023, 1, 15)
60+
assert ceil_to_month(dt) == datetime(2023, 2, 1)
61+
62+
# Test end of year
63+
dt = datetime(2023, 12, 15)
64+
assert ceil_to_month(dt) == datetime(2024, 1, 1)
65+
66+
# Test first day of month
67+
dt = datetime(2023, 1, 1)
68+
assert ceil_to_month(dt) == datetime(2023, 2, 1)
69+
70+
# Test with invalid input
71+
assert ceil_to_month(None) is None
72+
assert ceil_to_month("not a date") is None

0 commit comments

Comments
 (0)