Skip to content

nekkoai/hacks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Summary of the Repository

This repository is focused on ET accelerator development and ONNX inference optimization.

Most things in here eventually will (should?) move into either their own repositories or a different maintained repository.

Here's what it contains:

Repository Structure:

  • Apache 2.0 License - Open source friendly
  • Two main directories:
    • docker/ - Container and infrastructure tools, including Dockerfiles for building simplified containers
    • onnx/ - ONNX model utilities
    • compose/ - Docker compose files for running models

Key Components:

1. Docker Infrastructure (docker/)

  • Dockerfile.et - ET-specific container build

    • Based on ghcr.io/nekkoai/et-sw-sdk-develop-stack
    • Sets up environment variables for ET hardware
    • Includes device mapping for /dev/et0_mgmt and /dev/et0_ops
  • README.md - Comprehensive container usage guide

    • Instructions for running with/without ET SoC1 devices
    • Docker volume management for models and datasets
    • Examples using ghcr.io/nekkoai/onnx-eis containers
    • Full workflow for ResNet50 + ImageNet2012 inference
  • examples.sh - Command examples for various inference tools:

    • ONNXModelRunner for model execution
    • InferenceServer for server-based inference
    • InferenceServerApp and client tools
    • LLaMA model examples with Vicuna-7B
  • art2dock.sh - Artifact-to-Docker utility

    • Creates containers from datasets, models, or bundles
    • Pushes to ghcr.io/nekkoai registry
    • Supports tagging and versioning
  • packit.sh - Package extraction tool

    • Extracts Conan packages for container builds
    • Handles dependencies and BOM (Bill of Materials)

2. ONNX Utilities (onnx/)

  • Dockerfile - Custom container build with et-onnxruntime integration

    • Multi-stage build using et-sw-sdk-develop-stack base
    • Extracts et-onnxruntime package via packit.sh
    • Configures PYTHONPATH for EtGlow provider support
  • accelerator-container-setup.md - Container usage documentation

    • Complete setup instructions for ET accelerator access
    • Device mounting and environment configuration
    • Troubleshooting guide and best practices
  • matmul_helloworld.py - Matrix multiplication ONNX model generator with ETGlow support

    • Creates custom MatMul models with configurable dimensions
    • Includes CPU vs ETGlow performance comparison
    • Demonstrates minimal ETGlow configuration requirements
    • Validates results between CPU and accelerator execution
  • llm-kvc.py - Advanced LLM inference with Key-Value Cache optimization ⭐

    • Production-ready LLM inference script with ETGlow acceleration
    • Supports AWQ INT4 quantized models (tested with LLaMA-3 8B)
    • Advanced features:
      • Dynamic ONNX dimension fixing for ETGlow compatibility
      • IO bindings for zero-copy memory operations
      • KV-cache management with efficient memory reuse
      • Comprehensive provider configuration and optimization
      • Performance profiling and perplexity calculation
    • Proven results: 4.5x speedup (1.16 → 5.22 tokens/s) over CPU
    • Includes CPU fallback and result validation

Technologies Used:

  • Docker for containerization
  • ONNX for model format and inference
  • ET SoC1 accelerator hardware
  • Conan package management
  • Python for model utilities
  • Shell scripting for automation

Purpose:

This appears to be a development toolkit for working with ET accelerators and ONNX inference, providing:

  1. Container infrastructure for reproducible environments
  2. Workflow automation for model deployment
  3. Testing utilities for accelerator performance
  4. Integration examples with various inference servers

About

Random hacks of kindness

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •