- 
                Notifications
    You must be signed in to change notification settings 
- Fork 30
          💅 Update text2vec-azure-openai to utilize isAzure: true flag and mark resourceName + deploymentId as optional
          #196
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1952e48    to
    6b07490      
    Compare
  
    …rk `resourceName` + `deploymentId` as optional This relates to the changes in weaviate/weaviate#5776
6b07490    to
    9184e62      
    Compare
  
    | Great to see you again! Thanks for the contribution. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! Left a few comments mainly around house-keeping otherwise the PR looks great 😁
| /** Will automatically be set to true. You don't need to set this manually. */ | ||
| isAzure?: true; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is always true, does it still need the ? operator? If not, can we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the true is always set by the text2VecAzureOpenAI function internally, should not be necessary to be passed by the user - but due to how the types in general are structured I could not easily 1) remove it completely from the config object, nor 2) remove the optional operator, since then the user would be required to supply isAzure: true manually 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, okay I see now. I think this makes sense if the user makes use of the .azureOpenAI method but part of the API is still to allow users to work with the raw types if they so wish. As such, if a user did:
generative: {
  name: 'generative-openai',
  config: {
    deploymentId: config.deploymentId,
    resourceName: config.resourceName,
     baseURL: config.baseURL,
  }
}then the type system would allow it since isAzure is optional yet the runtime would interpret this as isAzure: undefined, which is a false-y value.
I like the idea of introducing the isAzure flag to the TS client but I think it may be better placed as a pure internal, e.g. not exposed in the user types, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking is that isAzure should be removed from the ...Config types and instead interpreted by the client's runtime itself depending on the name of the module. So this would most likely require the addition of generative-azure-openai, alongside text2vec-azure-openai, that is then parsed appropriately in the collection creation logic
There we'd have some boolean clauses to determine whether the module is an azure one, based on the name, and then inject isAzure: true into the config appropriately. IMO, this would be the most consistent for the client/server relationship as I'm sure there will be future refactoring of the server that changes this behaviour. Then, we'd only break the internal relationship rather than the public API
We already do something similar here, wdyt about extending this logic as described above?
If you'd rather not then that's fine, I can add it to my backlog 😁 Also, sorry for the spaghetti of the collection.create method, I've not had the chance to refactor it into a better structure 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it - this definitely makes more sense 👍 I didn't look into this part.
I'm drowning a bit in other work right now, but I should be able to look into this in more detail hopefully next week or so 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I get round to it this week, I'll ping you on here to let you know. Thanks for your help so far!
…passed at all Since the only indicator for the azureOpenAI config is now the isAzure: true flag, which is set in the vectorizer setup directly, no config object is necessary for it.

With this adjustment, devs can use the
text2VecAzureOpenAIvectorizer, without specifyingdeploymentIdorresourceNameupfront for their collection.Instead, they may provide the headers
X-Azure-Deployment-IdandX-Azure-Resource-Namein their requests to set these.Internally, using text2VecAzureOpenAI will set the an
isAzure: trueflag for the OpenAI vectorizer, so it understands that the Azure logic must be used.