complete implementation of open ai text embedding with test #new #34700
complete implementation of open ai text embedding with test #new #34700jrmccluskey merged 28 commits intoapache:masterfrom
Conversation
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
assign set of reviewers |
|
Assigning reviewers. If you would like to opt out of this review, comment R: @jrmccluskey for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
Hey there! I'll try to get a thorough review pass on your PR this afternoon; however, at a quick glance, this seems like a good candidate for inheriting from the new RemoteModelHandler base class I introduced a few weeks ago. Would you be interested in tweaking your implementation to use this class? It'll streamline your code since it handles the client-side throttling work in the parent class. |
|
Let me check first |
jrmccluskey
left a comment
There was a problem hiding this comment.
Very good starting point for this, just needs some polish and a few additions.
| def load_model(self): | ||
| # Create the client just before it's needed during pipeline execution | ||
| if self.api_key: | ||
| client = open_ai.OpenAI( |
There was a problem hiding this comment.
This import is missing. the package is also openai based on the official library docs - https://platform.openai.com/docs/libraries?language=python
There was a problem hiding this comment.
same as below it is fixed
it will be shown in upcoming commit
| boolean indication whether or not the exception is a Server Error (5xx) or | ||
| a RateLimitError (429) error. | ||
| """ | ||
| return isinstance(exception, (RateLimitError, APIError)) |
There was a problem hiding this comment.
Need an import for these exceptions
There was a problem hiding this comment.
because of the unknown reason some of the import is missing i just add those import
it will be shown in a upcoming update commit
sdks/python/apache_beam/ml/transforms/embeddings/open_ai_test.py
Outdated
Show resolved
Hide resolved
| MLTransformOutputT = TypeVar('MLTransformOutputT') | ||
|
|
||
| # Default batch size for OpenAI calls | ||
| _BATCH_SIZE = 20 # OpenAI can handle larger batches than Vertex |
There was a problem hiding this comment.
This could likely be handled in the model handler instead of being hard-coded here
|
@jrmccluskey i dont know why some of the import statement is missing is it because of the formating ? |
|
@jrmccluskey |
|
For the linting you should be able to
The formatting check can catch other things but the failure in this case is just yapf again. |
|
@jrmccluskey I'm trying to fix the linting error, but I'm not having any luck. |
|
@jrmccluskey The lint error is fixed, but now there's an error in the Prism runner that's unrelated to OpenAI. What should I do next? before that this error is not coming |
|
Hi @jrmccluskey, |
|
Hi @jrmccluskey, |
|
You should be able to ignore prism and yaml failures, those are generally flaky and not impacted by anything here. The EmbeddingsManager class is effectively a composite PTransform that produces a RunInference transform with a more traditional model handler, I cannot see a reason why that wouldn't work with the remote handler implementation. Not having a model for image embeddings is fine since you've clearly labeled the class as a text embedding model, we can always add images / multimodal implementations later as APIs become available. |
|
@jrmccluskey all important test is completed thanks |
sorry , now it is working
okay |
|
@jrmccluskey thanks |
| organization: Optional[str] = None, | ||
| dimensions: Optional[int] = None, | ||
| user: Optional[str] = None, | ||
| batch_size: Optional[int] = None, |
There was a problem hiding this comment.
this is misleading since you're setting a single batch size value then take it as the max. I'd recommend exposing the min and max batch sizes separately (or just taking them as kwargs)
|
I'm not sure what you're asking about for enrichment. Can you clarify? |
|
actually i wanna is there any need of feast io connector |
|
@jrmccluskey |
jrmccluskey
left a comment
There was a problem hiding this comment.
Sorry about the delay getting back to this, I think you've gotten it to a good place to merge! Thank you
|
my pleasure can i ask what else is needed in case of open ai |

complete implementation of open ai text embedding with test
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.