Log Detective

A tool, service and RHEL process integration to analyze logs using a Large Language Model (LLM) and a Drain template miner.

The service that explains logs is available here: https://logdetective.com/explain

Note: if you are looking for code of website logdetective.com it is in github.com/fedora-copr/logdetective-website.

Command Line Tool

Installation

Fedora 41+

dnf install logdetective

From Pypi repository

The logdetective project is published on the Pypi repository. The pip tool can be used for installation.

First, ensure that the necessary dependencies for the llama-cpp-python project are installed. For Fedora, install gcc-c++:

dnf install gcc-c++

Then, install the logdetective project using pip:

pip install logdetective

Local repository install

Clone this repository and install with pip:

pip install .

Usage

To analyze a log file, run the script with the following command with:

Required arguments:

file: The path or URL of the log file to be analyzed.

Optional arguments:

-M, --model MODEL_NAME (default: "fedora-copr/granite-3.2-8b-instruct-GGUF"): The path or Hugging space name of the language model for analysis. For models from Hugging Face, write them as namespace/repo_name. As we are using LLama.cpp we want this to be in the gguf format. If the model is already on your machine it will skip the download.
-F | --filename-suffix SUFFIX (default Q4_K.gguf): You can specify which suffix of the model file to use. This option is applied when specifying model (from the different quantizations) using the Hugging Face repository.
-C | --n-clusters N (default 8): Number of clusters for Drain to organize log chunks into. This only makes sense when you are summarizing with Drain.
-n, --no-stream: Print the full response at once, instead of token-by-token.
-v, --verbose: Increase output verbosity. Can be used multiple times (-v, -vv, -vvv) for different debug levels.
-q, --quiet: Suppress all output except the explanation.
--prompts PROMPTS (DEPRECATED, replaced by --prompts-config) Path to prompt configuration file.
--prompts-config PROMPTS (default logdetective/prompts.yml): Path to prompt configuration file.
--prompt-templates TEMPLATE_DIR (default logdetective/prompts): Path to prompt template directory. Prompts must be valid Jinja templates, and system prompts must include field system_time.
--temperature NUM (default 0.8): Temperature for inference. Higher temperatures lead to more creative, random responses.
--skip-snippets SNIPPETS (default logdetective/skip_snippets.yml): Path to patterns for skipping snippets.
--csgrep: Use csgrep to process the log. Requires csgrep to be installed separately.
--mib_limit NUMBER Limits the size (in MiB) of request (if submitting raw files) or file (if submitting via URL) for analyze endpoints (default 300). Logs or requests exceeding this will be rejected.

Examples:

Analyzing a log via URL or stored locally:

logdetective https://example.com/logs.txt
logdetective ./data/logs.txt

Examples of using different models. Note the use of --filename-suffix (or -F) option, useful for models that were quantized:

logdetective https://example.com/logs.txt --model QuantFactory/Meta-Llama-3-8B-Instruct-GGUF --filename-suffix Q5_K_S.gguf
logdetective https://kojipkgs.fedoraproject.org//work/tasks/3367/131313367/build.log --model 'fedora-copr/granite-3.2-8b-instruct-GGUF' -F Q4_K_M.gguf

Example of altered prompts:

cp -r ~/.local/lib/python3.13/site-packages/logdetective/prompts ~/my-prompts
vi ~/my-prompts/system_prompt.j2 # edit the system prompt there to better fit your needs
logdetective https://kojipkgs.fedoraproject.org//work/tasks/3367/131313367/build.log --prompt-templates ~/my-prompts

Example of altered prompts (Deprecated):

cp ~/.local/lib/python3.13/site-packages/logdetective/prompts.yml ~/my-prompts.yml
vi ~/my-prompts.yml # edit the prompts there to better fit your needs
logdetective https://kojipkgs.fedoraproject.org//work/tasks/3367/131313367/build.log --prompts ~/my-prompts.yml

Note that streaming with some models (notably Meta-Llama-3) is broken and can be worked around by no-stream option:

logdetective https://example.com/logs.txt --model QuantFactory/Meta-Llama-3-8B-Instruct-GGUF --filename-suffix Q5_K_M.gguf --no-stream

Choice of LLM

While Log Detective is compatible with a wide range of LLMs, it does require an instruction tuned model to function properly.

Whether or not the model has been trained to work with instructions can be determined by examining the model card, or simply by checking if it has instruct in its name.

When deployed as a server, Log Detective uses /chat/completions API as defined by OpenAI. The API must support both system and user roles, in order to properly work with a system prompt.

Configuration fields system_role and user_role can be used to set role names for APIs with non-standard roles.

Note: In cases when no system role is available, it is possible to set both fields to the same value. This will concatenate system and standard prompt. This may have negative impact coherence of response.

Real Example

Let's have a look at a real world example. Log Detective can work with any logs though we optimize it for RPM build logs.

We're going to analyze a failed build of a python-based library that happened in Fedora Koji buildsystem:

$ logdetective https://kojipkgs.fedoraproject.org//work/tasks/8157/117788157/build.log
Explanation:
[Child return code was: 0] : The rpm build process executed successfully without any errors until the 'check' phase.

[wamp/test/test_wamp_component_aio.py::test_asyncio_component] : Pytest found
two tests marked with '@pytest.mark.asyncio' but they are not async functions.
This warning can be ignored unless the tests are intended to be run
asynchronously.

[wamp/test/test_wamp_component_aio.py::test_asyncio_component_404] : Another
Pytest warning for the same issue as test_asyncio_component.

[-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html] :
This line is not related to the error, but it is a reminder to refer to Pytest
documentation for handling warnings.

[=========================== short test summary info
============================] : This section shows the summary of tests that
were executed. It shows the number of tests passed, failed, skipped,
deselected, and warnings.

[FAILED wamp/test/test_wamp_cryptosign.py::TestSigVectors::test_vectors] : A
failed test is reported with the name of the test file, the name of the test
method, and the name of the test case that failed. In this case,
TestSigVectors::test_vectors failed.

[FAILED
websocket/test/test_websocket_protocol.py::WebSocketClientProtocolTests::test_auto_ping]
: Another failed test is reported with the same format as the previous test. In
this case, it is WebSocketClientProtocolTests::test_auto_ping that failed.

[FAILED websocket/test/test_websocket_protocol.py::WebSocketServerProtocolTests::test_interpolate_server_status_template]
: A third failed test is reported with the same format as the previous tests.
In this case, it is
WebSocketServerProtocolTests::test_interpolate_server_status_template that
failed.

[FAILED websocket/test/test_websocket_protocol.py::WebSocketServerProtocolTests::test_sendClose_reason_with_no_code]
: Another failed test is reported. This time it is
WebSocketServerProtocolTests::test_sendClose_reason_with_no_code.

[FAILED websocket/test/test_websocket_protocol.py::WebSocketServerProtocolTests::test_sendClose_str_reason]
: Another failed test is reported with the same test file and test method name,
but a different test case name: test_sendClose_str_reason.

[==== 13 failed, 195 passed, 64 skipped, 13 deselected, 2 warnings in 6.55s
=====] : This is the summary of all tests that were executed, including the
number of tests that passed, failed, were skipped, deselected, or produced
warnings. In this case, there were 13 failed tests among a total of 211 tests.

[error: Bad exit status from /var/tmp/rpm-tmp.8C0L25 (%check)] : An error
message is reported indicating that the 'check' phase of the rpm build process
failed with a bad exit status.

It looks like a wall of text. Similar to any log. The main difference is that here we have the most significant lines of a logfile wrapped in [ ] : and followed by textual explanation of the log text done by local LLM.

Contributing

Contributions are welcome! Please submit a pull request if you have any improvements or new features to add. Make sure your changes pass all existing tests before submitting. For bigger code changes, please consult us first by creating an issue.

We are always looking for more annotated snippets that will increase the quality of Log Detective's results. The contributions happen in our website: https://logdetective.com/

Log Detective performs several inference queries while evaluating a log file. Prompts are stored in a separate file (more info below: https://github.com/fedora-copr/logdetective?tab=readme-ov-file#system-prompts). If you have an idea for improvements to our prompts, please open a PR and we'd happy to test it out.

To develop Log Detective, you should fork this repository, clone your fork, and install dependencies using pip:

git clone https://github.com/yourusername/logdetective.git
cd logdetective
pip install .

Make changes to the code as needed and run pre-commit.

Tests

Tests for code used by server must placed in the ./tests/server/ path, while tests for general code must be in the ./tests/base/ path.

The tox is used to manage tests. Please install tox package into your distribution and run:

tox

This will create a virtual environment with dependencies and run all the tests. For more information follow the tox help.

Tox environments for base and server tests are separate, each installs different dependencies. You can also run a specific test, execute tox like this:

tox run -e style # to run flake8, or
tox run -e lint # to run pylint
tox run -e pytest_base # running base tests:
tox run -e pytest_server # running server tests

To run server test suite you will need postgresql client utilities.

dnf install postgresql

Visual Studio Code testing with podman/docker-compose

In Containerfile, add debugpy as a dependency

-RUN pip3 install llama_cpp_python==0.2.85 sse-starlette starlette-context \
+RUN pip3 install llama_cpp_python==0.2.85 sse-starlette starlette-context debugpy\

Rebuild server image with new dependencies

make rebuild-server

Forward debugging port in docker-compose.yaml for server service.

     ports:
       - "${LOGDETECTIVE_SERVER_PORT:-8080}:${LOGDETECTIVE_SERVER_PORT:-8080}"
+      - "${VSCODE_DEBUG_PORT:-5678}:${VSCODE_DEBUG_PORT:-5678}"

Add debugpy code in a logdetective file where you want to stop at first.

+import debugpy
+debugpy.listen(("0.0.0.0", 5678))
+debugpy.wait_for_client()

Prepare .vscode/lunch.json configuration for Visual Studio Code (at least the following configuration is needed)

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Python Debugger: Remote Attach",
      "type": "debugpy",
      "request": "attach",
      "connect": {
        "host": "localhost",
        "port": 5678
      },
      "pathMappings": [
        {
          "localRoot": "${workspaceFolder}",
          "remoteRoot": "/src"
        }
      ]
    }
  ]
}

Run the server

podman-compose up server

Run Visual Stdio Code debug configuration named Python Debug: Remote Attach

Visual Studio Code CLI debugging

When debugging the CLI application, the ./scripts/debug_runner.py script can be used as a stand in for stump script created during package installation.

Using launch.json, or similar alternative, arguments can be specified for testing.

Example:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Debug Installed Module",
            "type": "debugpy",
            "request": "launch",
            "console": "integratedTerminal",
            "program": "${workspaceFolder}/scripts/debug_runner.py",
            "args": [<URL_OF_A_LOG>]
        }
    ]
}

Server

For locally setting up the FastAPI server, you would need a postgresql and some inference server. We base our service around llama.cpp image we build ourselves: https://quay.io/repository/logdetective/inference. However, even for development, we strongly encourage you to use containers, see Containerfile and Containerfile.llama.cpp. We also have composefile to run the containerized servers. Note that the inference provider can then be replaced with any OpenAI API compatible server. Check the dev and prod files for specific configuration details.

The simplest setup:

Make sure your MODELS_PATH environment variable points to a directory with your local LLM files. You can either edit the value in .env, create a symlink ln -s /directory/with/your/llms ./models, or:
```
$ export MODELS_PATH=/path/to/models/
$ ll $MODELS_PATH
-rw-r--r--. 1 tt tt 3.9G apr 10 17:18  granite-4.0-h-tiny-Q8_0.gguf
```
podman-compose up (or podman-compose up -d to detach from your current terminal)
In case sending requests yields timeout errors on your local deployment (you can check what happens in containers with podman logs), try increasing the llm_api_timeout value in server/config.yml. If you get nginx timeouts, try setting/increasing timeouts in server/nginx_dev.conf.template:
```
    server {
    listen ${INFERENCE_PROXY_PORT};
+   proxy_connect_timeout 300s;
+   proxy_send_timeout 300s;
+   proxy_read_timeout 300s;
    location / {
        proxy_pass http://inference_backend;
        proxy_set_header Host $host;
    }
```

You can then use POST requests (via browser, curl or http provided by the httpie package).

Here we analyze one log file submitted via URL:

curl --header "Content-Type: application/json" --request POST \
     --data '{"url": "https://address.of.your.log/some-path-example.log"}' \
     http://localhost:8080/analyze

http POST :8080/analyze url=https://address.of.your.log/some-path-example.log

We can also submit multiple files using files. This applies to all analyze* endpoints. At the moment, only the first file will be analyzed, multiple log analysis is planned soon. Using curl is trickier if you want to embed contents of the local files (can be achieved with jq).

jq -n --arg name "build.log" --rawfile content "path/to/your/build.log" \
'{files: [{name: $name, content: $content}]}' | \
curl --header "Content-Type: application/json" \
     --request POST \
     --data @- \
     http://localhost:8080/analyze

http :8080/analyze \
    files[0][name]='build.log' files[0][content]='@path/to/your/build.log' \
    files[1][name]='another.log' files[1][content]='@path/to/another.log'

For more accurate responses, you can use /analyze/staged endpoint. This will submit snippets to model for individual analysis first. Afterwards the model outputs are used to construct final prompt. This will take substantially longer, compared to plain /analyze.

curl --header "Content-Type: application/json" --request POST --data '{"url": "https://address.of.your.log/some-path-example.log"}' http://localhost:8080/analyze/staged

If the variable is not set, ./models is mounted inside by default.

Model can be downloaded from our Hugging Space by:

curl -L -o models/granite-3.2-8b-instruct-v0.3.Q4_K.gguf https://huggingface.co/fedora-copr/granite-3.2-8b-instruct-GGUF/resolve/main/ggml-model-Q4_K.gguf

Note that before any log or its snippets are sent to LLM for analysis, they are redacted. Log Detective removes certain personal information, such as emails and GPG fingerprints from logs, before calling LLM. LLM should be aware of this fact and factor it into its responses.

Filtering snippet analysis by relevance

When using /analyze/staged API, it is possible to enable filtering analyzed snippets by their estimated relavance, submitting only those with highest meansure of relevance for final analysis.

Note: This feautre requires LLM provider with support for JSON structured output. Smaller models, even though techically capable of providing structured output, may not be able to appropriatelly estimate snippet relevance.

Filtering is disabled by default and must be enabled by setting value of top_k_snippets field in general section of server configuration. Value indicates number of snippets with highest estimated relavance that will be submitted for final analysis.

Example:

general:
  devmode: False
  packages:
    - .*
  excluded_packages:
    - ^redhat-internal-.*
  top_k_snippets: 10

If all snippets are rated the same, the filtering is skipped and warning raised in logs. Values higher than total number of snippets, as set by max_clusters in the extrator section of config, also result in filtering being skipped.

Generate a new database revision with alembic

Modify the database models (logdetective/server/database/models/).

Generate a new database revision with the command:

Warning: this command will start up a new server and shut it down when the operation completes.

CHANGE="A change comment" make alembic-generate-revision

Our production instance

Our FastAPI server and model inference server run through podman-compose on an Amazon AWS instance. The VM is provisioned by an ansible playbook.

You can control the server through:

cd /root/logdetective
podman-compose -f docker-compose-prod.yaml ...

The /root directory contains valuable data. If moving to a new instance, please backup the whole directory and transfer it to the new instance.

In order to run containers with Nvidia GPU support, you need to have generate a CDI specification, which can be done through:

nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

HTTPS certificate generated through:

certbot certonly --standalone -d logdetective01.fedorainfracloud.org

Certificates need to be be placed into location specified by the LOGDETECTIVE_CERTDIR env var and the service should be restarted.

Querying statistics

You can query requests, responses and emojis statistics via metrics endpoints. They return JSON data with time_series array containing metric objects with metric, timestamps, and values fields. Metrics are GET methods and have the form /metrics/ENDPOINT_TYPE/QUERY_TYPE?parameter=value:

ENDPOINT_TYPE: analyze, analyze-staged, or analyze-gitlab.
QUERY_TYPE:

requests will return how many requests did the server receive at given endpoint.
responses will return average response times during the time period.
emojis will return ALL emoji reactions. This data is collected only for analyze-gitlab events, so the ENDPOINT_TYPE in the URL is ignored when querying for emojis.
all will retrieve all of the above. If QUERY_TYPE is left empty, it defaults to all.

parameter=value will specify the latest period for which metrics are returned. If unspecified, the query defaults to the last 2 days.

parameter is either hours, days, weeks.
value is a positive integer.
parameter type also controls the granularity of the response: ?days=2 will produce time series with max 2 entries, ?hours=48 will produce a time series with max 48 entries.

Examples:

http GET :8080/metrics/analyze/requests
http GET :8080/metrics/analyze-staged/responses
curl "http://localhost:8080/metrics/analyze-staged/responses"
curl "http://localhost:8080/metrics/analyze-gitlab/emojis?days=5"
curl "http://localhost:8080/metrics/analyze-staged/responses?hours=24"

System Prompts

Prompts are defined as Jinja templates and placed in location specified by --prompt-templates option of the CLI utility, or LOGDETECTIVE_PROMPT_TEMPLATES environment variable of the container service. With further, optional, configuration in the prompts.yml configuration file.

All system prompt templates must include place for system_time variable.

If references list is defined in prompts.yml, templates must also include a handling for a list of references.

Example:

{% if references %}
## References:

    {% for reference in references %}
    * {{ reference.name }} : {{ reference.link }}
    {% endfor %}
{% endif %}

Deprecated:

Prompt templates used by Log Detective are stored in the prompts.yml file. It is possible to modify the file in place, or provide your own. In CLI you can override prompt templates location using --prompts option, while in the container service deployment the LOGDETECTIVE_PROMPTS environment variable is used instead.

Prompts need to have a form compatible with python format string syntax with spaces, or replacement fields marked with curly braces, {} left for insertion of snippets.

Number of replacement fields in new prompts, must be the same as in originals. Although their position may be different.

Skip Snippets

Certain log chunks may not contribute to the analysis of the problem under any circumstances. User can specify regular expressions, matching such log chunks, along with simple description, using Skip Snippets feature.

Patterns to be skipped must be defined yaml file as a dictionary, where key is a description and value is a regular expression. For example:

child_exit_code_zero: "Child return code was: 0"

Special care must be taken not to write a regular expression which may match too many chunks, or which may be evaluated as data structure by the yaml parser.

Example of a valid pattern definition file: logdetective/skip_patterns.yml, can be used as a starting point and is used as a default if no other definition is provided.

Extracting snippets with csgrep

When working with logs containing messages from GCC, it can be beneficial to employ additional extractor based on csgrep tool, to ensure that the messages are kept intact. Since csgrep is not available as a python package, it must be installed separately, with a package manager or from source.

The binary is available as part of csdiff package on Fedora.

dnf install csdiff

When working with CLI Log Detective, the csgrep extractor can be activated using option --csgrep. While in server mode, the csgrep field in extractor config needs to be set to true.

csgrep: true

Both options are disabled by default and error will be produced if the option is used, but csgrep is not present in the $PATH.

The container images are built with csdiff installed.

License

This project is licensed under the Apache-2.0 License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 701 Commits
.fmf		.fmf
.github		.github
alembic		alembic
fedora_messaging_certs		fedora_messaging_certs
files		files
log		log
logdetective		logdetective
plans		plans
scripts		scripts
server		server
tests		tests
.env		.env
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc.tests		.pylintrc.tests
Containerfile		Containerfile
Containerfile.llama.cpp		Containerfile.llama.cpp
LICENSE		LICENSE
Makefile		Makefile
PUBLISHING		PUBLISHING
README.md		README.md
alembic.ini		alembic.ini
docker-compose-dev.yaml		docker-compose-dev.yaml
docker-compose-prod.yaml		docker-compose-prod.yaml
docker-compose.yaml		docker-compose.yaml
logdetective.1.asciidoc		logdetective.1.asciidoc
packit.yaml		packit.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Log Detective

Command Line Tool

Installation

Usage

Choice of LLM

Real Example

Contributing

Tests

Visual Studio Code testing with podman/docker-compose

Visual Studio Code CLI debugging

Server

Filtering snippet analysis by relevance

Generate a new database revision with alembic

Our production instance

Querying statistics

System Prompts

Skip Snippets

Extracting snippets with csgrep

License

About

Uh oh!

Releases 83

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Log Detective

Command Line Tool

Installation

Usage

Choice of LLM

Real Example

Contributing

Tests

Visual Studio Code testing with podman/docker-compose

Visual Studio Code CLI debugging

Server

Filtering snippet analysis by relevance

Generate a new database revision with alembic

Our production instance

Querying statistics

System Prompts

Skip Snippets

Extracting snippets with csgrep

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 83

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages