Skip to content

Latest commit

 

History

History
109 lines (84 loc) · 2.85 KB

File metadata and controls

109 lines (84 loc) · 2.85 KB

USDA FoodData Central Import

Import USDA FoodData Central database into the product_catalog and product_barcodes tables.

Data Sources

Dataset Records Size Description
Foundation Foods ~365 6.8 MB Generic foods with comprehensive nutrients
Branded Foods ~1.5M 3.3 GB Products with barcodes, brands, nutrition

Prerequisites

  1. Download USDA data files from https://fdc.nal.usda.gov/download-datasets/

    • Download "Foundation Foods" JSON
    • Download "Branded Foods" JSON
  2. Place files in the data directory:

    cp ~/Downloads/FoodData_Central_foundation_food_json_*.json ./data/
    cp ~/Downloads/FoodData_Central_branded_food_json_*.json ./data/
  3. Run database migrations (from project root):

    supabase db push
  4. Set environment variables:

    # Option 1: Create .env file
    echo "SUPABASE_URL=your-project-url" > .env
    echo "SUPABASE_SERVICE_ROLE_KEY=your-service-role-key" >> .env
    
    # Option 2: Export directly
    export SUPABASE_URL="your-project-url"
    export SUPABASE_SERVICE_ROLE_KEY="your-service-role-key"

Usage

# Install dependencies
npm install

# Import Foundation Foods only (~30 seconds)
npm run import:foundation
# or: node index.js --foundation

# Import Branded Foods only (~2-4 hours)
npm run import:branded
# or: node index.js --branded

# Import both datasets
npm run import:all
# or: node index.js --all

# Start fresh (ignore checkpoint)
node index.js --branded --no-resume

Features

  • Streaming: Branded Foods uses streaming JSON parser for memory efficiency
  • Batching: Inserts in batches of 1000 for performance
  • Resume: Automatically resumes from checkpoint if interrupted
  • Barcodes: Inserts GTINs to product_barcodes table with FK
  • Deduplication: Uses ON CONFLICT for upsert semantics

Field Mapping

Macronutrients (columns)

USDA Nutrient ID Column
1008 calories
1003 protein
1005 carbs
1004 fat
1079 fiber
2000 sugar

Micronutrients (JSONB)

Stored in micros column with structure:

{
  "usda_fdc_id": "12345",
  "calcium": { "amount": 100, "unit": "mg" },
  "iron": { "amount": 2.5, "unit": "mg" },
  "sodium": { "amount": 500, "unit": "mg" }
}

Troubleshooting

"Connection error"

  • Verify SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY are correct
  • Check that migrations have been applied

"File not found"

  • Ensure JSON files are in ./data/ directory
  • File names must match pattern FoodData_Central_*_json_*.json

"Out of memory"

  • Branded Foods uses streaming, but ensure you have ~2GB RAM available
  • Reduce batch size in config.js if needed

"Resume not working"

  • Check checkpoint.json exists and is readable
  • Use --no-resume to start fresh