This repository is focused on ET accelerator development and ONNX inference optimization.
Most things in here eventually will (should?) move into either their own repositories or a different maintained repository.
Here's what it contains:
- Apache 2.0 License - Open source friendly
- Two main directories:
-
Dockerfile.et- ET-specific container build- Based on
ghcr.io/nekkoai/et-sw-sdk-develop-stack - Sets up environment variables for ET hardware
- Includes device mapping for
/dev/et0_mgmtand/dev/et0_ops
- Based on
-
README.md- Comprehensive container usage guide- Instructions for running with/without ET SoC1 devices
- Docker volume management for models and datasets
- Examples using
ghcr.io/nekkoai/onnx-eiscontainers - Full workflow for ResNet50 + ImageNet2012 inference
-
examples.sh- Command examples for various inference tools:ONNXModelRunnerfor model executionInferenceServerfor server-based inferenceInferenceServerAppand client tools- LLaMA model examples with Vicuna-7B
-
art2dock.sh- Artifact-to-Docker utility- Creates containers from datasets, models, or bundles
- Pushes to
ghcr.io/nekkoairegistry - Supports tagging and versioning
-
packit.sh- Package extraction tool- Extracts Conan packages for container builds
- Handles dependencies and BOM (Bill of Materials)
-
Dockerfile- Custom container build with et-onnxruntime integration- Multi-stage build using et-sw-sdk-develop-stack base
- Extracts et-onnxruntime package via packit.sh
- Configures PYTHONPATH for EtGlow provider support
-
accelerator-container-setup.md- Container usage documentation- Complete setup instructions for ET accelerator access
- Device mounting and environment configuration
- Troubleshooting guide and best practices
-
matmul_helloworld.py- Matrix multiplication ONNX model generator with ETGlow support- Creates custom MatMul models with configurable dimensions
- Includes CPU vs ETGlow performance comparison
- Demonstrates minimal ETGlow configuration requirements
- Validates results between CPU and accelerator execution
-
llm-kvc.py- Advanced LLM inference with Key-Value Cache optimization ⭐- Production-ready LLM inference script with ETGlow acceleration
- Supports AWQ INT4 quantized models (tested with LLaMA-3 8B)
- Advanced features:
- Dynamic ONNX dimension fixing for ETGlow compatibility
- IO bindings for zero-copy memory operations
- KV-cache management with efficient memory reuse
- Comprehensive provider configuration and optimization
- Performance profiling and perplexity calculation
- Proven results: 4.5x speedup (1.16 → 5.22 tokens/s) over CPU
- Includes CPU fallback and result validation
- Docker for containerization
- ONNX for model format and inference
- ET SoC1 accelerator hardware
- Conan package management
- Python for model utilities
- Shell scripting for automation
This appears to be a development toolkit for working with ET accelerators and ONNX inference, providing:
- Container infrastructure for reproducible environments
- Workflow automation for model deployment
- Testing utilities for accelerator performance
- Integration examples with various inference servers