Skip to content

Latest commit

 

History

History
31 lines (25 loc) · 1.43 KB

File metadata and controls

31 lines (25 loc) · 1.43 KB

Node.js Tasks: Vision, Audio, and Open-Source Models

This folder contains hands-on tasks for you to complete using Node.js and different model modalities from the GitHub Models Marketplace.

Task 04: Image Captioning (Vision Model)

  • Goal: Use a vision model to generate a caption for an image.
  • Steps:
    1. Write a Node.js script that sends an image to a vision model (e.g., GPT-4V).
    2. Input: Any image file (e.g., 'sample-image.jpg').
    3. Output: The generated caption.
    4. Reference: SDK Options

Task 05: Speech-to-Text (Audio Model)

  • Goal: Use an audio model to transcribe speech from an audio file.
  • Steps:
    1. Write a Node.js script that sends an audio file to a speech-to-text model (e.g., Whisper).
    2. Input: Any audio file (e.g., 'sample-audio.mp3').
    3. Output: The transcribed text.
    4. Reference: SDK Options

Task 06: Open-Source Model (Llama/Mistral)

  • Goal: Use an open-source model to generate text based on a prompt.
  • Steps:
    1. Write a Node.js script that calls an open-source model (e.g., Mistral) for text generation.
    2. Input: Any prompt string.
    3. Output: The generated text.
    4. Reference: SDK Options

Complete these tasks, upload your solutions to GitHub, and submit your repository link using the form in task/README.md to claim your course certificate.