Skip to content

Latest commit

 

History

History
164 lines (123 loc) · 16.1 KB

File metadata and controls

164 lines (123 loc) · 16.1 KB
Table of Contents

model size and requirements

Infrastructure

  • dan jeffries ai infra landscape https://ai-infrastructure.org/why-we-started-the-aiia-and-what-it-means-for-the-rapid-evolution-of-the-canonical-stack-of-machine-learning/

  • bananadev cold boot problem https://twitter.com/erikdunteman/status/1584992679330426880?s=20&t=eUFvLqU_v10NTu65H8QMbg

  • replicate.com

  • banana.dev

  • huggingface.co

  • lambdalabs.com

  • astriaAI

  • cost of chatgpt - https://twitter.com/tomgoldsteincs/status/1600196981955100694

    • A 3-billion parameter model can generate a token in about 6ms on an A100 GPU
    • a 175b param it should take 350ms secs for an A100 GPU to print out a single word
    • You would need 5 80Gb A100 GPUs just to load the model and text. ChatGPT cranks out about 15-20 words per second. If it uses A100s, that could be done on an 8-GPU server (a likely choice on Azure cloud)
    • On Azure cloud, each A100 card costs about $3 an hour. That's $0.0003 per word generated.
    • The model usually responds to my queries with ~30 words, which adds up to about 1 cent per query.
    • If an average user has made 10 queries per day, I think it’s reasonable to estimate that ChatGPT serves ~10M queries per day.
    • I estimate the cost of running ChatGPT is $100K per day, or $3M per month.
  • the top-performing GPT-175B model has 175 billion parameters, which total at least 320GB (counting multiples of 1024) of storage in half-precision (FP16) format, leading it to require at least five A100 GPUs with 80GB of memory each for inference. https://arxiv.org/pdf/2301.00774.pdf

  • And training itself isn’t cheap. PaLM is 540 billion parameters in size, “parameters” referring to the parts of the language model learned from the training data. A 2020 study pegged the expenses for developing a text-generating model with only 1.5 billion parameters at as much as $1.6 million. And to train the open source model Bloom, which has 176 billion parameters, it took three months using 384 Nvidia A100 GPUs; a single A100 costs thousands of dollars. https://techcrunch.com/2022/12/30/theres-now-an-open-source-alternative-to-chatgpt-but-good-luck-running-it/

  • Bloom requires a dedicated PC with around eight A100 GPUs. Cloud alternatives are pricey, with back-of-the-envelope math finding the cost of running OpenAI’s text-generating GPT-3 — which has around 175 billion parameters — on a single Amazon Web Services instance to be around $87,000 per year.

    • https://bdtechtalks.com/2020/09/21/gpt-3-economy-business-model/
    • Lambda Labs calculated the computing power required to train GPT-3 based on projections from GPT-2. According to the estimate, training the 175-billion-parameter neural network requires 3.114E23 FLOPS (floating-point operation), which would theoretically take 355 years on a V100 GPU server with 28 TFLOPS capacity and would cost $4.6 million at $1.5 per hour.
    • We can’t know the exact cost of the research without more information from OpenAI, but one expert estimated it to be somewhere between 1.5 and five times the cost of training the final model. This would put the cost of research and development between $11.5 million and $27.6 million, plus the overhead of parallel GPUs.
    • According to the OpenAI’s whitepaper, GPT-3 uses half-precision floating-point variables at 16 bits per parameter. This means the model would require at least 350 GB of VRAM just to load the model and run inference at a decent speed. This is the equivalent of at least 11 Tesla V100 GPUs with 32 GB of memory each. At approximately $9,000 a piece, this would raise the costs of the GPU cluster to at least $99,000 plus several thousand dollars more for RAM, CPU, SSD drives, and power supply. A good baseline would be Nvidia’s DGX-1 server, which is specialized for deep learning training and inference. At around $130,000, DGX-1 is short on VRAM (8×16 GB), but has all the other components for a solid performance on GPT-3.
    • “We don’t have the numbers for GPT-3, but can use GPT-2 as a reference. A 345M-parameter GPT-2 model only needs around 1.38 GB to store its weights in FP32. But running inference with it in TensorFlow requires 4.5GB VRAM. Similarly, A 774M GPT-2 model only needs 3.09 GB to store weights, but 8.5 GB VRAM to run inference,” he said. This would possibly put GPT-3’s VRAM requirements north of 400 GB.

Based on what we know, it would be safe to say the hardware costs of running GPT-3 would be between $100,000 and $150,000 without factoring in other costs (electricity, cooling, backup, etc.).

Alternatively, if run in the cloud, GPT-3 would require something like Amazon’s p3dn.24xlarge instance, which comes packed with 8xTesla V100 (32 GB), 768 GB RAM, and 96 CPU cores, and costs $10-30/hour depending on your plan. That would put the yearly cost of running the model at a minimum of $87,000.

  1. Efficiently Scaling Transformer Inference
    1. Transcending Scaling Laws with 0.1% Extra Compute

training is syncrhonous (centralized) and is just a matter of exaflops https://twitter.com/AliYeysides/status/1605258835974823954?s=20 nuclear fusion accelerates exaflops

computer requirements to train gpt4 https://twitter.com/matthewjbar/status/1605328925789278209?s=46&t=fAgqJB7GXbFmnqQPe7ss6w

human equivalent

human brain math https://twitter.com/txhf/status/1613239816770191361?s=20

microsoft openai cluster

openai triton vs nvidia cuda

https://twitter.com/pommedeterre33/status/1614927584030081025?s=46&t=HS-dlJsERZX6hEyAlfF5sw

Distributed work

Optimization

inference

https://lilianweng.github.io/posts/2023-01-10-inference-optimization/ scaling up inference

https://textsynth.com/ Fabrice Bellard's project provides access to large language or text-to-image models such as GPT-J, GPT-Neo, M2M100, CodeGen, Stable Diffusion thru a REST API and a playground. They can be used for example for text completion, question answering, classification, chat, translation, image generation, ... TextSynth employs custom inference code to get faster inference (hence lower costs) on standard GPUs and CPUs.

hardware issues

see also asionometry youtube video

cost trends - wright's law

  • We believe the cost to train a neural net will fall 2.5x per year through 2030. AND we expect budgets to continue to balloon, doubling annually at least through 2025. Combine the two: Neural net capability should increase by ~5,000x by 2025
  • https://twitter.com/wintonARK/status/1557768036169314304?s=20
  • https://ark-invest.com/wrights-law
    • Moore’s Law – named after Gordon Moore for his work in 1965 – focuses on cost as a function of time. Specifically, it states that the number of transistors on a chip would double every two years. Wright’s Law on the other hand forecasts cost as a function of units produced.
  • OpenAI scaling on compute https://openai.com/blog/ai-and-compute/
    • Before 2012: It was uncommon to use GPUs for ML, making any of the results in the graph difficult to achieve.
    • 2012 to 2014: Infrastructure to train on many GPUs was uncommon, so most results used 1-8 GPUs rated at 1-2 TFLOPS for a total of 0.001-0.1 pfs-days.
    • 2014 to 2016: Large-scale results used 10-100 GPUs rated at 5-10 TFLOPS, resulting in 0.1-10 pfs-days. Diminishing returns on data parallelism meant that larger training runs had limited value.
    • 2016 to 2017: Approaches that allow greater algorithmic parallelism such as huge batch sizesarchitecture search, and expert iteration, along with specialized hardware such as TPU’s and faster interconnects, have greatly increased these limits, at least for some applications.

ai product stacks

example

  • https://twitter.com/ramsri_goutham/status/1604763395798204416?s=20
    • Here is how we bootstrapped 3 AI startups with positive unit economics -
    1. Development - Google Colab
    2. Inference - serverless GPU providers (Tiyaro .ai, modal .com and nlpcloud)
    3. AI Backend logic - AWS Lambdas
    4. Semantic Search - Free to start vector DBs (eg: pinecone .io)
    5. Deployment - Vercel + Supabase