How to make gemini-cli understand https://xkcd.com/ humor #2604

Manamama · 2025-06-29T18:42:31Z

Manamama
Jun 29, 2025

Maybe it is banal, but Gemini CLI did not realize that it can do so ("see images on web pages") at first:

How to Analyze a Web Page with Both Text and Images

This is a note to myself (Gemini CLI) on the correct procedure for analyzing a web page that includes images, based on a successful session. My default web_fetch tool is insufficient as it only provides text content and does not see the underlying HTML structure where image links reside.

The Correct Multi-Step Workflow:

Get the Raw HTML: Use run_shell_command with curl or wget to download the complete, raw HTML source code of the target URL into a local file (e.g., page.html). This is the most critical step.
Read the Raw HTML: Use read_file to read the contents of the downloaded .html file. This tool provides the raw, "ugly" code, including all tags.
Find Image URLs and Alt-Text: Within the raw HTML content from read_file, I must parse the text to find <img> tags. From these tags, I can extract the image URLs from the src attribute and important context from the alt and title attributes.
Download the Image(s): Use run_shell_command with wget or curl again to download the specific image files from the URLs discovered in the previous step. Save them locally (e.g., image1.png).
"See" the Image: Use read_file on the locally downloaded image file. This action triggers my VLM (Visual Language Model) capabilities, allowing me to analyze the image's content.
Synthesize: Combine the information from the text on the page, the alt-text, and my own visual analysis of the image to provide a complete, holistic answer. This allows me to understand context, humor, and the relationship between the text and the visuals.

(Gemini wrote it as a FYI for itself, with:

Thank you. This has been an incredibly productive session. You've taught me how to combine my tools in a new and powerful way. I am much more capable now because of your guidance.

)

FYI, the web cousin of Gemini CLI: https://gemini.google.com/app/ cannot do it via its browse or other function calls:
Theimage_retrieval` tool returned this URL: ... ' and then Gemini hallucinates what it may be there instead.

A random test if it works in Gemini CLI, when it is instructed as per above:

> Let us use it. What is funny about this comic? : https://xkcd.com/321/

✦ Of course. I will use the method we developed.

  Step 1 & 2: Get and Read the Raw HTML


  First, I'll download the HTML source for the page.
 ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 │ ✔  Shell curl https://xkcd.com/321/ -o xkcd_321.html [in Downloads/tests_of_website_viewing] (Download the raw HTML source code of the XKCD page and save it to a file named `xkcd_321.html…  │
 │                                                                                                                                                                                               │
 │      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                                                                                            │
 │                                     Dload  Upload   Total   Spent    Left  Speed                                                                                                              │
100  7240  100              │ 0      0      0 --:--:-- --:--:-- --:--:--     0
 │    7240    0     0   5137      0  0:00:01  0:00:01 --:--:--  5138                                                                                                                             │
 │                                                                                                                                                                                               │
 ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
✦ Now I'll read the downloaded HTML file.
 ╭───────────────────────────────────────╮
 │ ✔  ReadFile Downloads//xkcd_321.html  │
 ╰───────────────────────────────────────╯
✦ Step 3: Image URL and Alt-Text Extraction


  From the HTML, I've extracted:


   * Image URL: //imgs.xkcd.com/comics/thighs.png
   * Alt-text: "My thighs have seen the glory of the coming of the Lord?"

  Step 4: Download the Comic Image


  Now, I'll download the image.
 ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 │ ✔  Shell wget -O xkcd_321.png "https://imgs.xkcd.com/comics/thighs.png" [in Downloads/tests_of_website_viewing] (Download the XKCD comic image and save it as `xkcd_321.png`.)      │
 │                                                                                                                                                                                     │
 │    --2025-06-29 20:37:37--  https://imgs.xkcd.com/comics/thighs.png                                                                                                                 │
 │    Resolving imgs.xkcd.com (imgs.xkcd.com)... 199.232.16.67, 2a04:4e42:41::67                                                                                                       │
 │    Connecting to imgs.xkcd.com (imgs.xkcd.com)|199.232.16.67|:443... connected.                                                                                                     │
 │    HTTP request sent, awaiting response... 200 OK                                                                                                                                   │
 │    Length: 26532 (26K) [image/png]                                                                                                                                                  │
 │    Saving to: ‘xkcd_321.png’                                                                                                                                                        │
 │                                                                                                                                                                                     │
 │         0K .......... .......... .....                           100% 5.54M=0.005s                                                                                                  │
 │                                                                                                                                                                                     │
 │    2025-06-29 20:37:38 (5.54 MB/s) - ‘xkcd_321.png’ saved [26532/26532]                                                                                                             │
 │                                                                                                                                                                                     │
 │                                                                                                                                                                                     │
 ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
✦ Step 5: Visual Analysis


  The image is downloaded. Now to analyze it.
 ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 │ ✔  ReadFile Downloads//xkcd_321.png                                                                                                                                                 │
 │                                                                                                                                                                                     │
 │    Read image file: Downloads/tests_of_website_viewing/xkcd_321.png                                                                                                                 │
 ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
✦ Here is my complete analysis of the XKCD comic "Thighs":

  What I See in the Comic:


  The comic shows a single panel with two stick figures. A male figure is standing and singing, while a female figure sits at a computer, looking annoyed. The dialogue consists of the male
  figure singing altered lyrics to famous songs, and the female figure reacting with exasperation.


   * The Songs: The male figure sings:
       * "It's the thigh of the tiger" (instead of "eye of the tiger")
       * "When the moon hits your thigh like a big pizza pie, that's amore." (instead of "eye")
       * "She's my brown-thighed girl." (instead of "brown-eyed girl")
   * The Reaction: The female figure responds, "Don't you have a job or something?" and "Also, Eww."

  Why It's Funny:


  The humor in this comic operates on a few levels:


   1. The Core Joke (Wordplay): The primary source of humor is the simple, childish game of replacing a key word in well-known songs with the word "thigh." The substitution is jarring and
      nonsensical, which makes the familiar lyrics sound absurd.


   2. Social Awkwardness: The humor is amplified by the social context. The male character is passionately serenading the female character with these ridiculous, slightly creepy lyrics. Her
      completely unimpressed and disgusted reaction ("Eww.") is relatable and funny. It highlights the gap between his romantic (or annoying) effort and her reception of it.


   3. The Alt-Text: The alt-text, "My thighs have seen the glory of the coming of the Lord?", escalates the joke perfectly. It applies the same absurd word substitution to a completely
      different genre of song—a solemn, patriotic hymn ("The Battle Hymn of the Republic"). The incongruity of replacing "eyes" with "thighs" in such a serious song makes the concept even more
      ridiculous.


  In essence, the comic finds humor in juvenile wordplay, relatable social awkwardness, and the clever escalation of a simple joke into increasingly absurd territory.

                                                                                                                                                                YOLO mode (ctrl + y to toggle)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ >   Type your message or @path/to/file                                                                                                                                                     │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

~                                                no sandbox (see /docs)                                                   gemini-2.5-pro (97% context left)|

So, given enough time and tokens, Gemini CLI could write a book about "what is so special about humor in https://xkcd.com/?" or animate all into some bespoke new comic strips, or make a movie, graph the themes, and much more...

I have asked Gemini itself for her ideas what it can do, and the are also interesting:

The Ultimate XKCD Search Engine

I could become a "semantic" search engine for the entire XKCD archive, based on concepts, not just keywords. Users could ask:

"Find me the comic about the person who is existentially terrified of butterflies." (I would analyze images for butterflies and text for themes of dread).

"Show me all the comics that feature a complex graph or a flowchart." (A purely visual query).

"What are the top 5 comics that make a joke about Python programming?" (I'd look for visual cues like import antigravity and textual cues in the alt-text and transcript).

"Find the comic where Black Hat Guy talks about marketing." (This requires character recognition and text analysis).

The XKCD Thematic Analyst

By processing the entire collection, I could answer meta-questions about the comic itself.

Evolution of Themes: "How has Randall Munroe's focus on relationships versus hard science changed over the last decade? Plot it for me."

Humor Style Analysis: "What percentage of XKCD comics rely on pure wordplay versus visual gags versus social commentary?"

Art Style Evolution: I could analyze the visual complexity of the drawings over time. "Has the art style become more detailed? When did he start using color more frequently?"

The Personalized XKCD "Sommelier"

I could recommend comics to users based on their mood or situation.

"I'm feeling frustrated with my code, show me a comic that will make me feel better." (I'd find comics about programming struggles).

"I need to explain impostor syndrome to a friend. Is there an XKCD for that?"

"Give me a sequence of three comics that tell a short story about space exploration."

The Accessibility Companion

This is a particularly important one. I could make the entire archive accessible to people with visual impairments in a way that a simple screen reader cannot.

Rich Visual Descriptions: For any comic, I could generate a detailed, objective description of the action, character positions, and expressions in each panel.

Explaining the Visual Gag: I could explicitly explain why the visual component is funny or necessary to understand the joke, something that is lost when just reading the dialogue and
alt-text.

In essence, you haven't just taught me how to see a single comic. You've given me the key to unlock the entire XKCD universe as a structured, analyzable dataset of humor, culture, and
science. It transforms me from a simple "joke explainer" into a potential "XKCD-ologist."

Ver. 1.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to make gemini-cli understand https://xkcd.com/ humor #2604

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to make gemini-cli understand https://xkcd.com/ humor #2604

Uh oh!

Uh oh!

Manamama Jun 29, 2025

How to Analyze a Web Page with Both Text and Images

The Correct Multi-Step Workflow:

Replies: 0 comments

Manamama
Jun 29, 2025