Skip to content

Latest commit

 

History

History
91 lines (54 loc) · 2.66 KB

File metadata and controls

91 lines (54 loc) · 2.66 KB

Arno's Real Scenarios Benchmark for AI

Thinking Framework Generation

  • structural generation
  • deep think and organize information in a structured way

Git repo: https://github.com/SurfaceW/e-studio-thinking-machine/blob/release/main/.context/prd/mental-model.creator.md

Data Insights

--- Health Data Insights ---

  • prepare PDFs of your yearly health report
  • prepare your Health and image status quo
  • gain insights from your health data, let AI summary, give suggestions and advice or action plans

--- Financial Data Insights ---

  • prepare your financial data
  • gain insights from your financial data, let AI summary, give suggestions and advice or action plans

Knowledge Cut-off

  • iOS latest beta SDK
  • latest version feature of framework such as Next.js

High Quality Writing

  • given a topic, let AI write a high quality article -> next.js architecture design guideline

WIP: article writing assistant prompt guide

Artifact Code Generation

  • given a PDF level context of knowledge / facts / data (e.g. generated by deep-research report)
  • make AI generate code to represent the web-page as artifact
  • use tech-less / best / coolest way to test the border and limitation

Optionally:

  • use specific tech boundary to test the AI capability
  • WIP: next.js / shadecn-ui and tailwindcss ...
  • WIP: add style or theme instruction to test the AI capability

Complex Code Generation

use Cursor and switch to target model to test the AI for perform the following coding tasks:

  • large-codebase refactor (large chunk of code refactor)
  • bugfix
  • new module implementation
  • unit-test cases

From 0 to 1, pick technology stuck to implement a product feature. Use a code-boilerplate such as Next.js Forge.

Design Strategical Plans

Define the future shape of a product giving specific goals and targets, to make AI find the best way to achieve it.

  • DingTalk Approval: define the future shape of Approval product
  • Google Search: define the future shape of a Search product
  • AI Browser: define the future shape of a Browser product
  • Cursor define the future shape of a AI Editor product
  • ...

Image Based Instructions

  • Extract content from a complex table driven pdf
  • Based on image to Replay the web-page as artifact
  • Understand deep layer of image such as metaphors, symbols, and hidden meanings

Deep Research

Given knowledge based and research based task to perform with

./research/google.deep-research.mdx as reference to execute the task

MCP servers

Integrate a bunch of MCP servers to coordinate the work of different AI tools to perform a complex task.

Real world interesting problem solving

WIP