Skip to content

Scivity/autonomous-science-stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”¬ Autonomous Science Stack

A curated list of 150+ tools and libraries for building self-driving laboratories and autonomous research platforms.

From GPU compute to lab hardware, from multi-agent orchestration to experiment verification β€” everything you need to build systems that do science autonomously.

Maintained by Scivity Labs

Stars License PRs Welcome


Quick Links

Category Description
πŸ§ͺ Experiment Orchestration & Workflow Schedulers, DAGs, and workflow engines that drive long-running scientific pipelines.
πŸ€– Multi-Agent Frameworks Libraries for building coordinated agent systems that plan, act, and reason.
πŸ”¬ Self-Driving / Autonomous Labs Reference projects and companies operating real autonomous laboratories.
βš™οΈ Hardware & Lab Automation Drivers, protocols, and robotics frameworks that bridge code to physical instruments.
πŸ–₯️ GPU Compute Platforms On-demand and reserved GPU infrastructure for training and inference.
πŸ“Š ML Experiment Tracking Metrics, artifact, and hyperparameter tracking for ML workflows.
πŸ” Verification & Reproducibility Data versioning, validation, and environment-pinning tools.
πŸ“š Scientific Knowledge Management Literature APIs, reference managers, and citation graph tools.
πŸ“„ Scientific Data Extraction Parsers for PDFs, tables, equations, and scientific figures.
🧠 LLM for Science Models and agent systems purpose-built for scientific discovery.
🧬 Scientific Simulation & Modeling Physics-based simulators used as in-silico filters in autonomous-science loops.
🎯 Bayesian Optimization & Active Learning Decision core of closed-loop autonomous experimentation β€” picks the next experiment under a budget.
πŸ”— RL for Scientific Discovery Reinforcement learning libraries applicable to scientific search problems.
πŸ—„οΈ Vector Databases & Embeddings Stores and models for semantic retrieval over scientific corpora.
πŸ“‘ Data Pipeline & Event Systems Message brokers and streaming systems for lab and compute events.
πŸ›‘οΈ Safety & Guardrails for Autonomous Systems Policy enforcement, prompt-injection defense, and output validation.
πŸ“ˆ Monitoring & Observability Metrics, logs, traces, and LLM-specific observability.

πŸ§ͺ Experiment Orchestration & Workflow

Orchestrators schedule long-running, failure-prone scientific workflows β€” think multi-day simulations, instrument sweeps, and data ingestion pipelines. Good ones handle retries, artifact passing, and dynamic task graphs without forcing you into a rigid DSL. Pick one that matches your infra: Kubernetes-native, Python-native, or bioinformatics-specialized.

Library Description Link
Prefect Python workflow orchestrator with dynamic task graphs, typed results, and work-pool-based deployment. github.com/PrefectHQ/prefect
Apache Airflow DAG-based scheduler with a large operator ecosystem, used widely for batch data pipelines. github.com/apache/airflow
Dagster Asset-oriented orchestrator with typed inputs and outputs, software-defined assets, and testable pipelines. github.com/dagster-io/dagster
Nextflow Dataflow workflow language widely used in bioinformatics, with first-class HPC and container support. github.com/nextflow-io/nextflow
Snakemake Python-like rules engine for reproducible scientific workflows, common in genomics. github.com/snakemake/snakemake
Metaflow Netflix's human-centric framework for data science workflows with versioning and AWS/K8s backends. github.com/Netflix/metaflow
Argo Workflows Kubernetes-native workflow engine where each step runs in its own container. github.com/argoproj/argo-workflows
Luigi Spotify's Python module for building complex batch pipelines with dependency resolution. github.com/spotify/luigi
Kedro Python framework for reproducible, modular data science pipelines with a catalog abstraction. github.com/kedro-org/kedro
Flyte Kubernetes-native orchestrator for typed, reproducible ML and data pipelines. github.com/flyteorg/flyte
Temporal Durable execution platform for long-running workflows with retries and versioning as primitives. github.com/temporalio/temporal
Kubeflow Pipelines ML workflow system for portable, scalable pipelines on Kubernetes. github.com/kubeflow/pipelines

πŸ€– Multi-Agent Frameworks

Multi-agent frameworks coordinate LLM-driven workers that plan, call tools, and hand off tasks. In an autonomous-science context they sit between the orchestrator and the lab, turning a research goal into concrete experiments. The tradeoff is control vs. flexibility β€” graph-based frameworks are easier to debug, role-based ones move faster.

Library Description Link
LangGraph Graph-based agent runtime from LangChain with checkpoints, streaming, and human-in-the-loop primitives. github.com/langchain-ai/langgraph
CrewAI Role-based multi-agent framework that assembles specialized agents into crews with shared tasks. github.com/crewAIInc/crewAI
AutoGen Microsoft's framework for multi-agent conversations with pluggable model backends and code execution. github.com/microsoft/autogen
Swarms Multi-agent orchestration framework with hierarchical, concurrent, and mixture-of-agents structures. github.com/kyegomez/swarms
CAMEL Research-oriented multi-agent framework focused on role-playing agents and agent-scaling studies. github.com/camel-ai/camel
Agno Stateless Python agent runtime with memory, knowledge, and a focus on low-latency instantiation. github.com/agno-agi/agno
smolagents Hugging Face's minimal agent library centered on code-writing agents. github.com/huggingface/smolagents
PydanticAI Typed agent framework that uses Pydantic schemas for inputs, outputs, and tool calls. github.com/pydantic/pydantic-ai
OpenAI Agents SDK OpenAI's Python SDK for building agents with handoffs, guardrails, and tracing. github.com/openai/openai-agents-python
Claude Agent SDK Anthropic's SDK for building agents on top of Claude with hooks, subagents, and custom tools. github.com/anthropics/claude-agent-sdk-python
BeeAI Framework IBM-backed framework for building agents in Python and TypeScript with MCP and A2A support. github.com/i-am-bee/beeai-framework
Atomic Agents Lightweight framework built around typed input/output schemas and composable atomic agents. github.com/BrainBlend-AI/atomic-agents
Langroid Multi-agent Python framework with first-class tools, vector stores, and message-provenance logging. github.com/langroid/langroid

πŸ”¬ Self-Driving / Autonomous Labs

These are operational self-driving laboratories and companies β€” not libraries, but reference points for what a working autonomous lab looks like. Some publish papers, some ship APIs, a few run cloud labs that anyone can submit experiments to. Study their architectures when designing your own.

Project Description Link
A-Lab (LBNL / Berkeley) Autonomous lab for inorganic solid-state synthesis combining robotics, ML, and literature data from the Ceder group. ceder.berkeley.edu
Emerald Cloud Lab Commercial cloud lab in Austin with 200+ instruments controlled remotely via a Wolfram-based command language. emeraldcloudlab.com
Strateos Automation-as-a-service cloud lab for drug discovery with programmatic APIs and hybrid on-prem deployments. strateos.com
Chemify Glasgow-based Chemputation facility that compiles digital code into physical organic-chemistry syntheses. chemify.io
Arctoris Oxford-based fully automated drug discovery platform running assay cascades on the Ulysses robotic system. arctoris.com
LabGenius London-based closed-loop platform for therapeutic antibody discovery built around the EVA robot. labgeniustx.com
Kebotix Boston-area self-driving lab for materials discovery combining generative models with robotic synthesis. kebotix.com
Atinary Bayesian-optimization and SDLabs software for closed-loop R&D, integrable with external automation. atinary.com
PNNL Autonomous Science DOE program applying autonomous-lab methods across chemistry, biology, and energy storage at PNNL. pnnl.gov/autonomous-science
Lila Sciences Flagship Pioneering company building closed-loop AI Science Factories across life, chemical, and materials science. lila.ai
Argonne Polybot Autonomous robotic platform for electronic-polymer discovery at Argonne's Center for Nanoscale Materials. anl.gov polybot

βš™οΈ Hardware & Lab Automation

Everything physical bolts onto this layer β€” pipetting robots, plate readers, mass specs, flow reactors. The libraries here speak protocols (OPC UA, MQTT, VISA, SiLA2) or wrap vendor SDKs. Expect to write glue code; there is no universal driver.

Library Description Link
opcua-asyncio Asyncio-based OPC UA client and server for Python, common in industrial and lab PLC integration. github.com/FreeOpcUa/opcua-asyncio
PyVISA Python bindings to the VISA standard for controlling test and measurement instruments over GPIB, USB, or serial. github.com/pyvisa/pyvisa
paho-mqtt (Python) Eclipse Paho MQTT client for Python, used for lightweight pub/sub between lab devices and controllers. github.com/eclipse/paho.mqtt.python
Eclipse Mosquitto Open-source MQTT broker often deployed as the message bus inside automated labs. github.com/eclipse/mosquitto
LabVIEW National Instruments' graphical programming environment for instrument control and DAQ. ni.com/labview
Opentrons Python API and firmware for the OT-2 and Flex liquid-handling robots. github.com/Opentrons/opentrons
PyLabRobot Hardware-agnostic Python SDK for liquid handlers, plate readers, heater-shakers, and scales. github.com/PyLabRobot/pylabrobot
SiLA 2 Python Python implementation of the SiLA 2 gRPC-based lab instrument interoperability standard. gitlab.com/SiLA2/sila_python
Labman Automation Commercial systems integrator that built Berkeley's A-Lab synthesis robot and similar custom platforms. labmanautomation.com
ROS 2 Robot Operating System with DDS-based middleware for real-time control of robotic cells and mobile platforms. github.com/ros2
Bluesky Python experiment-control framework for synchrotrons and scattering beamlines, used across NSLS-II. github.com/bluesky/bluesky

πŸ–₯️ GPU Compute Platforms

On-demand GPU platforms spare you from standing up Kubernetes just to run a fine-tune or a large batch inference job. They differ on cold-start latency, GPU availability (H100, B200, MI300), and whether they expose raw VMs or serverless functions. For autonomous-science pipelines, the serverless models integrate cleanly into workflow engines.

Platform Description Link
RunPod Community and secure GPU cloud with on-demand pods and serverless endpoints that scale to zero. runpod.io
Modal Serverless Python platform that provisions GPU containers from decorators in under a second. modal.com
Lambda GPU cloud focused on H100 and Blackwell clusters with pre-installed ML frameworks and bare metal. lambda.ai
Vast.ai Marketplace for renting community GPUs at spot-style prices. vast.ai
CoreWeave Hyperscale NVIDIA-specialized cloud offering H100/B200 clusters and high-throughput networking. coreweave.com
Paperspace (DigitalOcean) GPU notebooks and droplets folded into DigitalOcean's AI platform. paperspace.com
Lightning AI Studio-based cloud workspaces on H100/H200 with persistent environments and multi-GPU training. lightning.ai
Anyscale Managed Ray platform for distributed training, batch inference, and serving on GPU clusters. anyscale.com
Together AI GPU cloud and inference API focused on open models, fine-tuning, and low-latency serving. together.ai
Replicate Hosted inference for open-source models with a simple HTTP API and per-second billing. replicate.com

πŸ“Š ML Experiment Tracking

Tracking tools log metrics, parameters, artifacts, and code versions so experiments are reproducible and comparable. For autonomous science, the key feature is headless logging from long-running agents β€” not just interactive notebooks. Many of these integrate with orchestrators above.

Library Description Link
MLflow Open-source tracking, model registry, and deployment framework with a large integration ecosystem. github.com/mlflow/mlflow
Weights & Biases Hosted tracking, artifact, and sweep service with deep SDK integration across frameworks. github.com/wandb/wandb
Neptune.ai Tracker built for foundation-model training with per-layer metrics and long-run logging. github.com/neptune-ai/neptune-client
Comet Experiment tracking and LLM evaluation platform with open-source Opik for LLM observability. comet.com
Aim Self-hosted open-source tracker with a fast UI for comparing thousands of runs. github.com/aimhubio/aim
ClearML End-to-end platform spanning experiment tracking, data management, agents, and serving. github.com/clearml/clearml
Sacred Lightweight Python library for configuring, organizing, and logging reproducible experiments. github.com/IDSIA/sacred
DVC Git-based data and model versioning with experiment tracking and pipeline DAGs. github.com/iterative/dvc
TensorBoard TensorFlow's local metric and graph visualization tool, also used with PyTorch and JAX. github.com/tensorflow/tensorboard
Optuna Hyperparameter optimization framework with pruning, distributed trials, and tracker integrations. github.com/optuna/optuna

πŸ” Verification & Reproducibility

Autonomous systems generate more data, configurations, and artifacts than humans can audit. These tools version datasets, validate schemas, check distributions, and pin environments so a run from six months ago still reproduces. Pair a data-versioning tool with a validation framework; each alone is half the story.

Library Description Link
Pachyderm Kubernetes-native pipelines with immutable data versioning and lineage across pipeline stages. github.com/pachyderm/pachyderm
lakeFS Git-like branching and time travel over object stores for data-lake version control. github.com/treeverse/lakeFS
Great Expectations Declarative data-quality framework with expectations, profiling, and automated docs. github.com/great-expectations/great_expectations
Evidently AI Library for data and ML model validation, drift detection, and monitoring reports. github.com/evidentlyai/evidently
Deepchecks Open-source testing for data and ML models covering integrity, drift, and performance. github.com/deepchecks/deepchecks
pytest Python testing framework with fixtures, parametrization, and a plugin ecosystem. github.com/pytest-dev/pytest
Hypothesis Property-based testing for Python that auto-generates edge-case inputs. github.com/HypothesisWorks/hypothesis
Nix Purely functional package manager for reproducible, declarative environments. github.com/NixOS/nix
GNU Guix Reproducible, transactional package manager with a focus on scientific workflows. guix.gnu.org
Pixi Rust-based cross-platform package manager built on the conda ecosystem with workspace lockfiles. github.com/prefix-dev/pixi

πŸ“š Scientific Knowledge Management

Before running an experiment, an autonomous agent needs to know what's already been tried. This category covers literature APIs, citation graphs, and reference managers agents can query programmatically. Coverage, freshness, and rate limits vary β€” most real systems combine two or three sources.

Tool Description Link
Semantic Scholar API Free API from Allen AI over hundreds of millions of papers, with paper search, citations, and recommendations. api.semanticscholar.org
arXiv API Official query API over arXiv's preprint corpus, returning Atom/XML metadata. info.arxiv.org/help/api
OpenAlex Open catalog of the global research system with a free, no-auth REST API over works, authors, and venues. openalex.org
Zotero Open-source reference manager with a web API, browser connectors, and a Python client. github.com/zotero/zotero
Paperpile Cloud-based reference manager for Google Docs and the web with an API for library access. paperpile.com
Connected Papers Graph-based visual explorer that surfaces similar papers via co-citation and bibliographic coupling. connectedpapers.com
Elicit AI research assistant with paper search, data extraction, and systematic-review workflows. elicit.com
Consensus Evidence-focused search engine over peer-reviewed literature with a Consensus Meter for agreement. consensus.app
ScholarAI Research assistant over 200M+ papers and patents with citation generation and Zotero sync. scholarai.io
Scite Smart Citations platform classifying citations as supporting, contrasting, or mentioning. scite.ai

πŸ“„ Scientific Data Extraction

Most scientific knowledge is trapped in PDFs with tables, equations, and figures. These parsers extract structured content β€” some use layout models, some use LLMs, some are classic rule-based engines tuned for scientific papers. For high-throughput ingestion, benchmark on your actual corpus before committing.

Library Description Link
Docling IBM/LF AI open-source document parser with unified DocTags output for PDFs, slides, and images. github.com/docling-project/docling
PyMuPDF Python bindings to MuPDF for PDF parsing, rendering, and text and image extraction. github.com/pymupdf/PyMuPDF
Crawl4AI LLM-friendly web crawler that outputs clean markdown and structured data for ingestion. github.com/unclecode/crawl4ai
GROBID Java machine-learning library that structures scholarly PDFs into TEI XML with high accuracy on references. github.com/kermitt2/grobid
Marker Fast PDF and office-doc to Markdown/JSON converter with optional LLM-assisted refinement. github.com/datalab-to/marker
Nougat Meta's vision transformer for converting scientific PDFs to Markdown with equation support. github.com/facebookresearch/nougat
tabula-py Maintained Python wrapper around Tabula for extracting tables from text-based PDFs. github.com/chezou/tabula-py
Camelot Python library focused on extracting tables from PDFs with stream and lattice parsers. github.com/camelot-dev/camelot
Unstructured ETL library for partitioning, cleaning, and chunking 25+ document types for LLM pipelines. github.com/Unstructured-IO/unstructured
pdfplumber Pure-Python PDF parser with fine-grained access to characters, tables, and layout metadata. github.com/jsvine/pdfplumber

🧠 LLM for Science

These are models and agent systems specifically designed for scientific reasoning, hypothesis generation, or domain-specific tasks (chemistry, biology, medicine). Some are open weights, some are research prototypes with released code, some are closed APIs. Expect the frontier to shift every few months.

Model / System Description Link
Sakana AI Scientist v2 End-to-end agentic system that generates ideas, runs experiments, and drafts papers via agentic tree search. github.com/SakanaAI/AI-Scientist-v2
FunSearch DeepMind method pairing LLM program search with an evaluator, used to find new cap-set and bin-packing solutions. github.com/google-deepmind/funsearch
ChemCrow LangChain-based chemistry agent with 18 tool integrations for synthesis planning and molecule property lookup. github.com/ur-whitelab/chemcrow-public
Coscientist GPT-4-driven autonomous chemistry agent from the Gomes group, demonstrated on palladium cross-couplings. github.com/gomesgroup/coscientist
DARWIN Open 7B foundation model fine-tuned on physics, chemistry, and materials-science literature. github.com/MasterAI-EAM/Darwin
Galactica Meta's 120B scientific language model (2022); publicly retracted shortly after release due to hallucination issues β€” kept here as historical reference with weights on Hugging Face. github.com/paperswithcode/galai
SciBERT BERT variant trained on 1.14M scientific papers from Semantic Scholar, still used as a scientific-NLP baseline. github.com/allenai/scibert
BioGPT Microsoft biomedical GPT pretrained on PubMed abstracts, available via Hugging Face Transformers. github.com/microsoft/BioGPT
Med-PaLM Google Research family of medical LLMs evaluated on USMLE-style and clinical-reasoning benchmarks. research.google med-palm
ESM3 EvolutionaryScale's multimodal protein model for joint sequence, structure, and function generation. github.com/evolutionaryscale/esm

🧬 Scientific Simulation & Modeling

LLM-driven science agents reach wet labs faster when they pre-screen candidates in silico. The libraries here solve physics-based problems β€” molecular dynamics, electronic structure, finite-element PDEs, reaction kinetics β€” cheaply enough to rank thousands of hypotheses before committing reagent or instrument time in an autonomous loop.

Library Description Link
LAMMPS Classical molecular dynamics engine scaling from single workstations to GPU supercomputers. github.com/lammps/lammps
GROMACS Biomolecular MD package optimized for proteins, lipids, and drug-binding free-energy work. github.com/gromacs/gromacs
PySCF Python quantum chemistry package covering Hartree-Fock, DFT, MP2, and coupled cluster. github.com/pyscf/pyscf
Psi4 Open-source quantum chemistry program with a Python API for electronic-structure methods. github.com/psi4/psi4
ASE Python toolkit wrapping 30+ DFT and MD calculators behind a uniform Atoms object. gitlab.com/ase/ase
RDKit Cheminformatics toolkit for molecular descriptors, substructure search, and SMARTS filters. github.com/rdkit/rdkit
OpenMM GPU-accelerated biomolecular MD engine with a first-class Python scripting API. github.com/openmm/openmm
FEniCS / DOLFINx Python FEM framework for solving PDEs via variational forms in UFL. github.com/FEniCS/dolfinx
Cantera Chemical kinetics, thermodynamics, and transport library for reactors and combustion. github.com/Cantera/cantera
deal.II C++ finite element library with Python bindings for adaptive-mesh PDE solvers. github.com/dealii/dealii

🎯 Bayesian Optimization & Active Learning

BO/AL is the decision core of closed-loop autonomous experimentation β€” given what has been observed so far, pick the next experiment to run under a limited budget. The libraries split along two lines: Gaussian-process-based frameworks (BoTorch, GPyTorch, Emukit, Trieste) that shine on smooth, low-dimensional design spaces, and algorithm-agnostic optimizers (Nevergrad, SMAC3, Hyperopt, Vizier) that scale to high-dimensional or conditional search spaces.

Library Description Link
BoTorch PyTorch library for Bayesian optimization with Monte Carlo acquisition functions. github.com/pytorch/botorch
Ax Meta platform built on BoTorch for adaptive experimentation and closed-loop tuning. github.com/facebook/Ax
GPyTorch Scalable Gaussian processes in PyTorch; the GP backbone underneath BoTorch. github.com/cornellius-gp/gpytorch
Emukit Decision-making toolbox covering BO, experimental design, and sensitivity analysis. github.com/EmuKit/emukit
Hyperopt Distributed hyperparameter optimization using Tree-structured Parzen Estimators. github.com/hyperopt/hyperopt
SMAC3 Sequential model-based algorithm configuration for AutoML and black-box tuning. github.com/automl/SMAC3
Nevergrad Gradient-free optimization platform with evolutionary and population-based algorithms. github.com/facebookresearch/nevergrad
Trieste TensorFlow/GPflow-based BO library with trust-region and multi-fidelity strategies. github.com/secondmind-labs/trieste
Vizier Open-source release of Google's internal black-box optimization service. github.com/google/vizier
scikit-activeml Active learning library with pool-based query strategies on scikit-learn models. github.com/scikit-activeml/scikit-activeml

πŸ”— RL for Scientific Discovery

Reinforcement learning fits scientific problems where the agent must choose sequential experiments under uncertainty β€” active learning, Bayesian optimization loops, molecule design, reaction planning. The libraries below are general-purpose RL frameworks used as substrates in science RL work.

Library Description Link
Gymnasium Maintained fork of OpenAI Gym with the standard RL environment API and a large environment registry. github.com/Farama-Foundation/Gymnasium
Stable-Baselines3 PyTorch implementations of standard RL algorithms with a consistent, production-friendly API. github.com/DLR-RM/stable-baselines3
CleanRL Single-file PyTorch RL implementations used widely for reproducible benchmarks and teaching. github.com/vwxyzjn/cleanrl
TRL Hugging Face library for RLHF, DPO, PPO, and GRPO fine-tuning of language models. github.com/huggingface/trl
Ray RLlib Distributed RL library inside Ray with a large algorithm zoo and offline-RL support. github.com/ray-project/ray
Tianshou PyTorch RL library with modular agents, on-policy and off-policy algorithms, and offline RL. github.com/thu-ml/tianshou
Sample Factory High-throughput async PPO library optimized for single-machine, multi-GPU training. github.com/alex-petrenko/sample-factory
PettingZoo Multi-agent environment API and environment zoo from the Farama Foundation. github.com/Farama-Foundation/PettingZoo
PFRL PyTorch-based deep RL library from Preferred Networks with a suite of modern algorithms. github.com/pfnet/pfrl

πŸ—„οΈ Vector Databases & Embeddings

Retrieval-augmented agents need a vector store for papers, protocols, lab notes, and prior results. Pick based on scale and operational model β€” some run embedded, some are managed services, some are Postgres extensions. For embeddings, science-tuned models often beat general-purpose ones on domain corpora.

Tool Description Link
Qdrant Rust-based vector database with filtering, quantization, and a managed cloud option. github.com/qdrant/qdrant
Weaviate Vector database with built-in hybrid search, modules for embeddings, and GraphQL API. github.com/weaviate/weaviate
Milvus High-scale open-source vector database with GPU-accelerated index types. github.com/milvus-io/milvus
Chroma Embedded and server vector database focused on developer ergonomics for RAG apps. github.com/chroma-core/chroma
Pinecone Managed serverless vector database with hybrid search and metadata filtering. pinecone.io
FAISS Meta's library for efficient similarity search and clustering of dense vectors. github.com/facebookresearch/faiss
LanceDB Serverless vector database built on the Lance columnar format for multimodal search. github.com/lancedb/lancedb
pgvector PostgreSQL extension that adds vector types, HNSW/IVF indexing, and ANN search. github.com/pgvector/pgvector
sentence-transformers Python library for dense sentence and passage embeddings on top of Hugging Face models. github.com/UKPLab/sentence-transformers
BAAI bge-m3 Multi-lingual multi-function embedding model that handles dense, sparse, and ColBERT-style retrieval. huggingface.co/BAAI/bge-m3

πŸ“‘ Data Pipeline & Event Systems

Lab instruments, compute jobs, and agents emit events that need routing, buffering, and durable replay. These brokers and streaming systems form the nervous system of a distributed autonomous lab. Choose on durability guarantees and operational fit, not raw throughput.

Tool Description Link
NATS High-performance cloud-native messaging system with JetStream for durable streams and KV. github.com/nats-io/nats-server
Apache Kafka Distributed log-based event streaming platform with an enormous connector ecosystem. github.com/apache/kafka
RabbitMQ AMQP-centric broker for traditional messaging patterns like work queues and routing. github.com/rabbitmq/rabbitmq-server
Redis Streams Append-only log data structure built into Redis with consumer groups. redis.io streams
Apache Pulsar Pub/sub and queue system with tiered storage and geo-replication built in. github.com/apache/pulsar
ZeroMQ Embeddable messaging library for low-latency socket-level patterns. github.com/zeromq/libzmq
Celery Python distributed task queue with a broker-based backend, common in lab automation backends. github.com/celery/celery
Apache Flink Stateful stream-processing engine with exactly-once semantics and event-time windows. github.com/apache/flink
Apache Beam Unified programming model for batch and streaming pipelines across multiple runners. github.com/apache/beam
Redpanda Kafka-API-compatible streaming platform written in C++ with no ZooKeeper dependency. github.com/redpanda-data/redpanda

πŸ›‘οΈ Safety & Guardrails for Autonomous Systems

When agents can call tools that move real money, compounds, or lab hardware, you need input filters, output validators, and policy engines. This category ranges from prompt-injection detectors to structured-output validators to full red-teaming toolkits. Layer several β€” no single tool covers all failure modes.

Tool Description Link
Guardrails AI Python framework for structured output validation and input/output guards via reusable validators. github.com/guardrails-ai/guardrails
NeMo Guardrails NVIDIA toolkit for adding programmable rails to LLM apps via the Colang dialog language. github.com/NVIDIA/NeMo-Guardrails
LLM Guard Protect AI's prompt and output scanner covering toxicity, PII, secrets, and prompt injection. github.com/protectai/llm-guard
NVIDIA Garak LLM vulnerability scanner that probes for jailbreaks, hallucination, and prompt injection. github.com/NVIDIA/garak
DeepTeam LLM red-teaming framework with 40+ vulnerability scanners and OWASP-aligned test suites. github.com/confident-ai/deepteam
Lakera Guard Commercial real-time API for detecting prompt injection, jailbreaks, and data exfiltration. lakera.ai
Invariant Rule-based contextual guardrails for LLM and MCP tool-calling deployed as a proxy. github.com/invariantlabs-ai/invariant
LlamaFirewall Meta's agent guardrail framework combining PromptGuard 2, alignment checks, and CodeShield. github.com/meta-llama/PurpleLlama
PyRIT Microsoft's Python risk-identification toolkit for automated adversarial testing of generative AI. github.com/microsoft/PyRIT

πŸ“ˆ Monitoring & Observability

Beyond traditional infra monitoring, autonomous-science systems need traces across agent steps, tool calls, and LLM responses. The first group here covers generic infra observability; the second is LLM-specific. You want both β€” one tells you the cluster is healthy, the other tells you the agent is sane.

Tool Description Link
Prometheus Pull-based time-series monitoring system with a powerful query language and alerting. github.com/prometheus/prometheus
Grafana Open-source dashboarding tool that plots metrics, logs, and traces across datasources. github.com/grafana/grafana
Loki Horizontally scalable log aggregation system with a label-based query model similar to Prometheus. github.com/grafana/loki
OpenTelemetry CNCF standard for traces, metrics, and logs with SDKs and collectors across languages. github.com/open-telemetry
Jaeger CNCF distributed tracing platform for investigating latency in microservice architectures. github.com/jaegertracing/jaeger
Helicone Open-source LLM observability and AI gateway with tracing, costs, and prompt management. github.com/Helicone/helicone
LangSmith LangChain's hosted platform for agent tracing, evaluation, and prompt management. langchain.com/langsmith
Arize Phoenix OpenTelemetry-based AI observability with tracing, eval, datasets, and experiments. github.com/Arize-ai/phoenix
Opik Comet's open-source LLM observability tool with tracing, automated evaluations, and dashboards. github.com/comet-ml/opik
Langfuse Self-hostable LLM engineering platform with tracing, prompt management, and datasets. github.com/langfuse/langfuse

⭐ Star History

Star History Chart


πŸ™Œ Contributing

Contributions are welcome. See CONTRIBUTING.md for the rules on what belongs, what doesn't, and how to submit a PR.


πŸ“œ License

Released under the MIT License.


Built and maintained by Scivity Labs β€” building the operating system for autonomous science.

If you find this useful, please ⭐ star the repository.

About

A curated list of 150+ tools and libraries for building self-driving laboratories and autonomous research platforms.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Contributors