[extensions][py][hf] 2/n ASR model parser impl#780
Conversation
b156c92 to
aa6bf10
Compare
| if TYPE_CHECKING: | ||
| from aiconfig import AIConfigRuntime |
There was a problem hiding this comment.
Totally separate discussion: do we have to do this?
There was a problem hiding this comment.
from before, this was needed. Keeping it for now since we don't have automated tests running
extensions/HuggingFace/python/src/aiconfig_extension_hugging_face/__init__.py
Outdated
Show resolved
Hide resolved
...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py
Show resolved
Hide resolved
27ec5ea to
d14669c
Compare
|
Added a couple of changes:
1.5. AIConfig for this will look slightly different. See updated testplan
|
|
rebase, and fix init.py |
| f"Error: {str(e)}. Please specify a kind and value for the attachment data." | ||
| ) | ||
| # TODO: once previous todo gets resolved, modify this line to `.kind`` instead of .data.get("kind"). It will be a pydantic base class. | ||
| if not attachment.data.get("kind") == "file_uri": |
There was a problem hiding this comment.
We'll need to update the client rendering and prompt schema format to support this. And we'll need to be consistent with this type in the image-to-text parser as well
There was a problem hiding this comment.
Can we just have this be a string and migrate to the object with kind and value afterwards? My concerns here are:
- client rendering is currently adding attachments as {data: , mime_type}
- image-to-text parser is currently not doing this {kind/value} stuff and that inconsistency is going to cause a lot more work on the client
There was a problem hiding this comment.
discussed offline with @rholinshead, reverted change to input data type. For ease of use we will have data as just a string and implement structured input later
|
discussed offline with @rholinshead, reverted change to input data type. For ease of use we will have data as just a string and implement structured input later updated testplan to show as well |
| def _get_device(self) -> str: | ||
| if torch.cuda.is_available(): | ||
| return "cuda" | ||
| # Mps backend is not supported for all asr models. Seen when spinning up a default asr pipeline which uses facebook/wav2vec2-base-960h 55bb623 | ||
| return "cpu" |
| return "cpu" | ||
|
|
||
| def get_output_text(self, response: dict[str, Any]) -> str: | ||
| raise NotImplementedError("get_output_text is not implemented for HuggingFaceAutomaticSpeechRecognition") |
There was a problem hiding this comment.
Why are we not implementing?
There was a problem hiding this comment.
added implementation
| supported_keys = { | ||
| # inputs | ||
| "return_timestamps", | ||
| "generate_kwargs", | ||
| "max_new_tokens", | ||
| } |
There was a problem hiding this comment.
There's actually way more because we're using generate_kwargs which takes in from the generalized text generation params. Pls add this to the task for unifying this in prompt schema, and link to task here
...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py
Outdated
Show resolved
Hide resolved
...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py
Show resolved
Hide resolved
| output = ExecuteResult( | ||
| **{ | ||
| "output_type": "execute_result", | ||
| "data": result.get("text"), |
There was a problem hiding this comment.
So response is always guaranteed to have the text attribute? I think it would be safe just to have another line above saying text_output: str = result.get("text") if "text" in result and isinstance(result, dict) else result or whatever just to future proof
There was a problem hiding this comment.
The problem is without knowing whta result type is, we can't know for sure and better to be safe just to ensure it can't break
| device = self._get_device() | ||
| if pipeline_creation_data.get("device", None) is None: | ||
| pipeline_creation_data["device"] = device | ||
| self.pipelines[model_name] = pipeline(task="automatic-speech-recognition", **pipeline_creation_data) |
There was a problem hiding this comment.
nit: no change, just annoying that they called this automatic-speech-recognition instead of audio-to-text like everything else they did smh
| """ | ||
|
|
||
|
|
||
| class HuggingFaceAutomaticSpeechRecognition(ParameterizedModelParser): |
There was a problem hiding this comment.
nit: add "Transformer" suffix at end to meet same format as the others
rossdanlm
left a comment
There was a problem hiding this comment.
pls link relevant comments to relevant issues so we can track in the future
f2ee469 to
edb182c
Compare
|
Setting up the asr parser class
Model Parser for the Automatic Speech Recognition task on huggingface. Decisions made while implementing: - manual impl to parse input attachments - - threw exceptions on every unexpected step. Not sure if this is the direction we want to go with this. - This diff does not implement serialize() for the model parser (will implement on diff ontop) ## Testplan Created an mp3 file that says "hi". Used aiconfig to run asr on it. |<img width="922" alt="Screenshot 2024-01-09 at 7 14 47 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/fe68751d-e20b-41d9-9da5-cc9a32859cba"> |<img width="1461" alt="Screenshot 2024-01-09 at 5 54 33 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/78063a3e-2b9a-4a39-80d9-ef28a7d706cf">| | ------------- | ------------- |
edb182c to
6dc0b11
Compare
|
rebase. Discussed with @rossdanlm offline, will fix forward any potentially remaining and pertaining issues. Tasks re: huggingface model parsers have been opened. |
[extensions][py][hf] 2/n ASR model parser impl
Model Parser for the Automatic Speech Recognition task on huggingface.
Decisions made while implementing:
Testplan
Created an mp3 file that says "hi". Used aiconfig to run asr on it.
|
|
|
| ------------- | ------------- |
Stack created with Sapling. Best reviewed with ReviewStack.