Skip to content

Conversation

@ChristoGrab
Copy link
Collaborator

@ChristoGrab ChristoGrab commented Jun 30, 2025

What

As a prerequisite of migrating Google Sheets to manifest-only, we need to include the unidecode dependency to the CDK.

Unidecode lets us transliterate Unicode text into plain ASCII characters. In other words, it converts characters with accents, diacritics, or non-Latin scripts into a closest possible ASCII representation:

"kožušček" → "kozuscek"
"北亰" → "Bei Jing"

Summary by CodeRabbit

  • Chores
    • Added a deprecated dependency to support a specific migration, with plans for future removal.

Important

Auto-merge enabled.

This PR is set to merge automatically when all requirements are met.

@github-actions github-actions bot added the chore label Jun 30, 2025
@ChristoGrab ChristoGrab marked this pull request as ready for review June 30, 2025 19:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds the unidecode dependency to the Python CDK so that the Google Sheets source can transliterate Unicode into ASCII.

  • Introduces unidecode (^1.3.8) in pyproject.toml under core dependencies
  • Notes usage in the comment for the Google Sheets connector

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 30, 2025

📝 Walkthrough

Walkthrough

The pyproject.toml file was updated to add the unidecode dependency with version ^1.3.8 in the main [tool.poetry.dependencies] section. A comment was included indicating that this dependency is deprecated, should not be used directly, and is temporarily added to support the source-google-sheets migration, with plans to replace it by anyascii. Additionally, unidecode was added to the DEP002 unused dependencies list with the same cautionary note.

Changes

File(s) Change Summary
pyproject.toml Added unidecode (version ^1.3.8) to main dependencies and DEP002 unused dependencies list, with deprecation notes.

Sequence Diagram(s)

Not applicable for this change, as it only involves a dependency addition.

Would you like to also add a test or documentation update to ensure future maintainers know why unidecode is required, wdyt?


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc835ef and f96781f.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • pyproject.toml (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Check: source-shopify
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link

github-actions bot commented Jun 30, 2025

PyTest Results (Fast)

3 685 tests  ±0   3 674 ✅ ±0   6m 15s ⏱️ -1s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 4823c6a. ± Comparison against base commit 8659a21.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Jun 30, 2025

PyTest Results (Full)

3 688 tests   3 677 ✅  18m 0s ⏱️
    1 suites     11 💤
    1 files        0 ❌

Results for commit 4823c6a.

♻️ This comment has been updated with latest results.

@ChristoGrab ChristoGrab marked this pull request as draft June 30, 2025 20:12
@ChristoGrab
Copy link
Collaborator Author

Probably not needed since the anyascii library has similar functionality. Leaving in draft for now but will likely close this PR.

@ChristoGrab ChristoGrab closed this Jul 1, 2025
@ChristoGrab
Copy link
Collaborator Author

ChristoGrab commented Jul 3, 2025

Psych! Reopening as regression tests showed differences in how emojis are handled by these libraries. Since we create schema properties from sheet cell names in Google Sheets, this means stream properties with emojis in them can be altered, which would be a breaking change. We are going to include the dep in the CDK to unblock the migration, with a comment that it should not be used outside Google Sheets and is intended to be deprecated (probably make for a good L3 issue)

@ChristoGrab ChristoGrab reopened this Jul 3, 2025
@ChristoGrab ChristoGrab marked this pull request as ready for review July 3, 2025 17:20
@ChristoGrab
Copy link
Collaborator Author

ChristoGrab commented Jul 3, 2025

Just noting that the amplitude standard test failure is unrelated:

AirbyteErrorTraceMessage(message='During the sync, the following streams did not sync successfully: average_session_length: MessageRepresentationAirbyteTracedErrors("\'GET\' request to \'https://amplitude.com/api/2/sessions/average?start=20240701&end=20240715\' failed with status code \'403\' and error message: \'Date too far back\'

EDIT: Just merged master, error should be gone now as we commented out this connector pending investigation

Copy link
Contributor

@aldogonzalez8 aldogonzalez8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVED

@ChristoGrab ChristoGrab enabled auto-merge (squash) July 3, 2025 17:52
@ChristoGrab ChristoGrab merged commit e24160e into main Jul 3, 2025
26 checks passed
@ChristoGrab ChristoGrab deleted the christo/unidecode-dep branch July 3, 2025 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants