[extensions][py][hf] 2/n ASR model parser impl by Ankush-lastmile · Pull Request #780 · lastmile-ai/aiconfig

Ankush-lastmile · 2024-01-05T19:46:20Z

[extensions][py][hf] 2/n ASR model parser impl

Model Parser for the Automatic Speech Recognition task on huggingface.

Decisions made while implementing:

manual impl to parse input attachments
- threw exceptions on every unexpected step. Not sure if this is the direction we want to go with this.
This diff does not implement serialize() for the model parser (will implement on diff ontop)

Testplan

Created an mp3 file that says "hi". Used aiconfig to run asr on it.

|

||
| ------------- | ------------- |

Stack created with Sapling. Best reviewed with ReviewStack.

jonathanlastmileai · 2024-01-09T17:32:02Z

...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py

+if TYPE_CHECKING:
+    from aiconfig import AIConfigRuntime


Totally separate discussion: do we have to do this?

from before, this was needed. Keeping it for now since we don't have automated tests running

extensions/HuggingFace/python/src/aiconfig_extension_hugging_face/__init__.py

...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py

Ankush-lastmile · 2024-01-09T21:26:58Z

Added a couple of changes:

Defined AttachmentInputDataWithStringValue. This will need to be incorporated into the AIConfig sdk but for now we will define this inside this model parser and enforce it here. TODO: Task Support Multi Modal Input data types #829

1.5. AIConfig for this will look slightly different. See updated testplan

Updated Model Parser to be able to customize pipeline creation. Note: for now this customizability is only possible the first time the pipeline is created in memory
added a refine Model Parser completion params
check for device in config settings
handled multiple outputs as well as outputs with metadata

Ankush-lastmile · 2024-01-09T21:44:25Z

rebase, and fix init.py

rholinshead · 2024-01-09T22:16:42Z

...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py

+                f"Error: {str(e)}. Please specify a kind and value for the attachment data."
+            )
+        # TODO: once previous todo gets resolved, modify this line to `.kind`` instead of .data.get("kind"). It will be a pydantic base class.
+        if not attachment.data.get("kind") == "file_uri":


We'll need to update the client rendering and prompt schema format to support this. And we'll need to be consistent with this type in the image-to-text parser as well

Can we just have this be a string and migrate to the object with kind and value afterwards? My concerns here are:

client rendering is currently adding attachments as {data: , mime_type}

image-to-text parser is currently not doing this {kind/value} stuff and that inconsistency is going to cause a lot more work on the client

discussed offline with @rholinshead, reverted change to input data type. For ease of use we will have data as just a string and implement structured input later

Ankush-lastmile · 2024-01-09T22:57:25Z

discussed offline with @rholinshead, reverted change to input data type. For ease of use we will have data as just a string and implement structured input later

updated testplan to show as well

rossdanlm · 2024-01-09T23:05:24Z

...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py

+    def _get_device(self) -> str:
+        if torch.cuda.is_available():
+            return "cuda"
+        # Mps backend is not supported for all asr models. Seen when spinning up a default asr pipeline which uses facebook/wav2vec2-base-960h 55bb623
+        return "cpu"


Good comment!

rossdanlm · 2024-01-09T23:05:47Z

...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py

+        return "cpu"
+
+    def get_output_text(self, response: dict[str, Any]) -> str:
+        raise NotImplementedError("get_output_text is not implemented for HuggingFaceAutomaticSpeechRecognition")


Why are we not implementing?

added implementation

rossdanlm · 2024-01-09T23:06:47Z

...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py

+    supported_keys = {
+        # inputs
+        "return_timestamps",
+        "generate_kwargs",
+        "max_new_tokens",
+    }


There's actually way more because we're using generate_kwargs which takes in from the generalized text generation params. Pls add this to the task for unifying this in prompt schema, and link to task here

...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py

rossdanlm · 2024-01-09T23:09:28Z

...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py

+        output = ExecuteResult(
+            **{
+                "output_type": "execute_result",
+                "data": result.get("text"),


So response is always guaranteed to have the text attribute? I think it would be safe just to have another line above saying text_output: str = result.get("text") if "text" in result and isinstance(result, dict) else result or whatever just to future proof

The problem is without knowing whta result type is, we can't know for sure and better to be safe just to ensure it can't break

rossdanlm · 2024-01-09T23:11:51Z

...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py

+            device = self._get_device()
+            if pipeline_creation_data.get("device", None) is None:
+                pipeline_creation_data["device"] = device
+            self.pipelines[model_name] = pipeline(task="automatic-speech-recognition", **pipeline_creation_data)


nit: no change, just annoying that they called this automatic-speech-recognition instead of audio-to-text like everything else they did smh

rossdanlm · 2024-01-09T23:11:55Z

...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py

+"""
+
+
+class HuggingFaceAutomaticSpeechRecognition(ParameterizedModelParser):


nit: add "Transformer" suffix at end to meet same format as the others

rossdanlm

pls link relevant comments to relevant issues so we can track in the future

Ankush-lastmile · 2024-01-10T00:18:02Z

added some comments
renamed to HuggingFaceAutomaticSpeechRecognitionTransformer from HuggingFaceAutomaticSpeechRecognitionTransformer
added get output text impl (copied from other model parser impls

Setting up the asr parser class

Model Parser for the Automatic Speech Recognition task on huggingface. Decisions made while implementing: - manual impl to parse input attachments - - threw exceptions on every unexpected step. Not sure if this is the direction we want to go with this. - This diff does not implement serialize() for the model parser (will implement on diff ontop) ## Testplan Created an mp3 file that says "hi". Used aiconfig to run asr on it. |<img width="922" alt="Screenshot 2024-01-09 at 7 14 47 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/fe68751d-e20b-41d9-9da5-cc9a32859cba"> |<img width="1461" alt="Screenshot 2024-01-09 at 5 54 33 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/78063a3e-2b9a-4a39-80d9-ef28a7d706cf">| | ------------- | ------------- |

Ankush-lastmile · 2024-01-10T00:21:38Z

rebase.

Discussed with @rossdanlm offline, will fix forward any potentially remaining and pertaining issues. Tasks re: huggingface model parsers have been opened.

Ankush-lastmile mentioned this pull request Jan 5, 2024

[extensions][py][hf] 1/n asr scaffolding #779

Merged

Ankush-lastmile force-pushed the pr780 branch from d9d3859 to d328bd0 Compare January 5, 2024 19:47

Ankush-lastmile changed the title ~~2/n model parser impl~~ [extensions][py][hf] 2/n ASR model parser impl Jan 5, 2024

Ankush-lastmile force-pushed the pr780 branch 3 times, most recently from b156c92 to aa6bf10 Compare January 9, 2024 17:19

Ankush-lastmile mentioned this pull request Jan 9, 2024

[extensions][py][hf] 3/n serialize #819

Draft

Ankush-lastmile force-pushed the pr780 branch from aa6bf10 to 9783023 Compare January 9, 2024 17:28

Ankush-lastmile marked this pull request as ready for review January 9, 2024 17:28

Ankush-lastmile requested review from jonathanlastmileai, rholinshead, rossdanlm, saqadri and suyoglastmileai as code owners January 9, 2024 17:28

jonathanlastmileai approved these changes Jan 9, 2024

View reviewed changes

saqadri reviewed Jan 9, 2024

View reviewed changes

extensions/HuggingFace/python/src/aiconfig_extension_hugging_face/__init__.py Outdated Show resolved Hide resolved

...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py Show resolved Hide resolved

Ankush-lastmile force-pushed the pr780 branch 2 times, most recently from 27ec5ea to d14669c Compare January 9, 2024 21:24

Ankush-lastmile force-pushed the pr780 branch from d14669c to 390d7db Compare January 9, 2024 21:44

rholinshead reviewed Jan 9, 2024

View reviewed changes

Ankush-lastmile force-pushed the pr780 branch from 390d7db to f2ee469 Compare January 9, 2024 22:55

rholinshead approved these changes Jan 9, 2024

View reviewed changes

rossdanlm reviewed Jan 9, 2024

View reviewed changes

...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py Outdated Show resolved Hide resolved

rossdanlm reviewed Jan 9, 2024

View reviewed changes

...e/python/src/aiconfig_extension_hugging_face/local_inference/automatic_speech_recognition.py Show resolved Hide resolved

rossdanlm reviewed Jan 9, 2024

View reviewed changes

rossdanlm approved these changes Jan 9, 2024

View reviewed changes

Ankush-lastmile force-pushed the pr780 branch from f2ee469 to edb182c Compare January 10, 2024 00:16

Ankush Pala ankush@lastmileai.dev added 2 commits January 9, 2024 19:18

[extensions][py][hf] 1/n asr scaffolding

485faf6

Setting up the asr parser class

Ankush-lastmile force-pushed the pr780 branch from edb182c to 6dc0b11 Compare January 10, 2024 00:21

Ankush-lastmile merged commit 0ca9e9d into main Jan 10, 2024

rossdanlm mentioned this pull request Jan 11, 2024

[docs] updated getting started #516

Closed

		"""


		class HuggingFaceAutomaticSpeechRecognition(ParameterizedModelParser):

Conversation

Ankush-lastmile commented Jan 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testplan

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Ankush-lastmile commented Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ankush-lastmile commented Jan 9, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ankush-lastmile commented Jan 9, 2024

Uh oh!

rossdanlm Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rossdanlm left a comment

Choose a reason for hiding this comment

Uh oh!

Ankush-lastmile commented Jan 10, 2024

Uh oh!

Ankush-lastmile commented Jan 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Ankush-lastmile commented Jan 5, 2024 •

edited

Loading

Ankush-lastmile commented Jan 9, 2024 •

edited

Loading

rossdanlm Jan 9, 2024 •

edited

Loading

Ankush-lastmile commented Jan 10, 2024 •

edited

Loading