Skip to content

Add AI Agent PDF Input Info #336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 21, 2025
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions docs/ff-integrations/ai/ai-agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,14 +76,15 @@ You can obtain your OpenAI API key from [**OpenAI API Keys**](https://platform.o

#### Request Options

Here, you specify the type of inputs users can send to the AI.
Define the types of inputs users can send to the AI agent. You can enable one or more of the following options:

- **Text**: Allows users to send text-based messages.
- **Image**: Enables image input, allowing the agent to analyze photos.
- **Audio**: (Google Agent only) Allows to send audio messages or voice inputs.
- **Video**: (Google Agent only) Allows users to send short video clips to analyze.
- **Text**: Allows users to send written messages, questions, or prompts.
- **Image**: Enables users to upload photos for the AI to analyze visual content, such as objects, styles, or scenes.
- **PDF** (Anthropic and Google Agent only): Lets users submit PDF documents, allowing the AI to extract and interpret information from files like resumes, reports, or forms.
- **Audio** (Google Agent only): Supports voice input, enabling users to record or upload audio clips for transcription, sentiment analysis, or voice-based commands.
- **Video** (Google Agent only): Allows users to submit video files, enabling the AI to analyze visual elements.

Selecting multiple input types makes it easier for users to clearly communicate what they need. Instead of relying only on text descriptions, users can combine inputs—for example, uploading an image along with text to better illustrate their queries and help the agent provide more accurate responses.
Selecting multiple input types makes it easier for users to clearly communicate what they need. Instead of relying only on text descriptions, users can combine inputs. For instance, in an AI Stylist agent, enabling both Text and Image allows users to either describe their outfits in words or upload clothing photos for personalized analysis.
Copy link
Contributor

@MaggieThomann MaggieThomann May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a caveat here that users typically need to include something in the text prompt to the agent?

or upload clothing photos for personalized analysis.

^ Basically, it would technically not be sufficient to do this in the OpenAI or Google cases. They would also need to put something in the "Text input" field in the "Send Message" action. Otherwise the agent will complain. Anthropic functions differently though. Anthropic is OK with just accepting the image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually correction -- all vendors require that Text is passed. Seeing this error from Anthropic now:

{
    "error": {
        "details": {
            "message": "400 {\"type\":\"error\",\"error\":{\"type\":\"invalid_request_error\",\"message\":\"messages: text content blocks must be non-empty\"}}",
            "details": "Error: 400 {\"type\":\"error\",\"error\":{\"type\":\"invalid_request_error\",\"message\":\"messages: text content blocks must be non-empty\"}}"
        },
        "message": "Error running assistant",
        "status": "INTERNAL"
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sorry for the back and forth 😅 @pinkeshmars

I'm going to make a change to the agent that will allow the user the ability to just send an image in their request by defaulting the text parameter that we send in the cloud function code to an empty string. This way, the API won't complain and the user can configure the agent with just image. So feel free to keep this language as is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool. Thanks for the clarification @MaggieThomann


#### Response Options

Expand Down