Skip to content

Conversation

@ShoaibMajidDar
Copy link
Contributor

When processing documents in Arabic, the expected Arabic text was not returned correctly.
I added an encoder argument to the whisper function in LLMWhisperClient. The issue was resolved by encoding the response in UTF-8, which correctly handled the Arabic text. The encoder is set to default to ISO-8859-1, but can now be adjusted as needed.

@hari-kuriakose
Copy link
Contributor

@ShoaibMajidDar Thanks for the contribution!
This would really help. Just couple of minor suggestions and rest looks good.

Copy link
Contributor

@jaseemjaskp jaseemjaskp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chandrasekharan-zipstack
Copy link
Collaborator

Ideally we need to forward the encoding in the request headers itself, so that it is understood by LLMWhisperer itself and is handled subsequently by the requests library. The current approach would help handle UTF-8 correctly which should cover most of the usecases and any requirement to support different encoding schemes will be properly tackled in the client and server in the future

@hari-kuriakose
Copy link
Contributor

@chandrasekharan-zipstack Agree, let's take up the improvements as required later.

@hari-kuriakose hari-kuriakose merged commit 2929529 into Zipstack:main Oct 30, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants