Node.js Tasks: Vision, Audio, and Open-Source Models

This folder contains hands-on tasks for you to complete using Node.js and different model modalities from the GitHub Models Marketplace.

Task 04: Image Captioning (Vision Model)

Goal: Use a vision model to generate a caption for an image.
Steps:
1. Write a Node.js script that sends an image to a vision model (e.g., GPT-4V).
2. Input: Any image file (e.g., 'sample-image.jpg').
3. Output: The generated caption.
4. Reference: SDK Options

Task 05: Speech-to-Text (Audio Model)

Goal: Use an audio model to transcribe speech from an audio file.
Steps:
1. Write a Node.js script that sends an audio file to a speech-to-text model (e.g., Whisper).
2. Input: Any audio file (e.g., 'sample-audio.mp3').
3. Output: The transcribed text.
4. Reference: SDK Options

Task 06: Open-Source Model (Llama/Mistral)

Goal: Use an open-source model to generate text based on a prompt.
Steps:
1. Write a Node.js script that calls an open-source model (e.g., Mistral) for text generation.
2. Input: Any prompt string.
3. Output: The generated text.
4. Reference: SDK Options

Complete these tasks, upload your solutions to GitHub, and submit your repository link using the form in task/README.md to claim your course certificate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node.js Tasks: Vision, Audio, and Open-Source Models

Task 04: Image Captioning (Vision Model)

Task 05: Speech-to-Text (Audio Model)

Task 06: Open-Source Model (Llama/Mistral)

FilesExpand file tree

vision-audio-open-source-tasks.md

Latest commit

History

vision-audio-open-source-tasks.md

File metadata and controls

Node.js Tasks: Vision, Audio, and Open-Source Models

Task 04: Image Captioning (Vision Model)

Task 05: Speech-to-Text (Audio Model)

Task 06: Open-Source Model (Llama/Mistral)