[1/4] `StartInference` and `FinishInference`

# Background

`MsgStartInference` and `MsgFinishInference` are too slow in production. Blocks should be processed by nodes within 1-2 seconds, so that block time stays below 6 seconds. This means that to process 1000 inferences in a block, we need to record 1000 `MsgStartInference`, 1000 `MsgFinishInference`, and 100-200 `MsgValidation` transactions. This means that these transactions should be processed faster than 1ms. Even though they are quite fast in tests, in production with a large state they require 10-20ms, and on some nodes 50ms or more.

There are 2 main areas identified that contribute most of the time to transactions:
- Signatures validation (57% of `FinishInference` and 63% of `StartInference`)
- Stats query and recording (40% of `FinishInference` and 30% of `StartInference`)

Download profiling file:
https://drive.google.com/file/d/1yxY91lzMHxv_MeloAxW1zczcpbkBjZ0t/
And use command:
```
go tool pprof -http=:8080 /Users/davidliberman/Downloads/pprof.inferenced.samples.cpu.001.pb.gz
```

Screen recoding: https://drive.google.com/file/d/1yxDaJllxCQ-l3ZO6ZuBb5bTEUgZ5t7Yu/view?usp=sharing

And choose flame graph to explore

_**Signature validation**_ can be significantly optimized, reducing the number of signatures to be validated in most scenarios by 5x (from 5 signatures to just 1).

https://github.com/gonka-ai/gonka/issues/608 - which is now implemented by @DimaOrekhovPS

https://github.com/gonka-ai/gonka/pull/779 

**_Stats query and recording_** is designed to make it easier to query usage statistics for inference operations by storing this data on a chain. However, it is too heavy for on-chain operations and should be removed. In the end, we shouldn't read and write any large state record in `MsgStartInference`, `MsgFinishInference`, or `MsgValidation`.

`SetInference` (including the second time it is executed in `HandleInferenceComplete`): 
- 10% of `FinishInference`, 
- 12% of `StartInference`, 
- 14% of Validation
- 33% is Logging, 
- 38% `SetOrUpdateInferenceStatsByEpoch`, 
- 22% `SetOrUpdateInferenceStatusByTime` w/o logging

`HandleInferenceComplete`, excluding `SetInference`, accounts for 16% of `FinishInference` and 4% of `StartInference` (as it is rare for `StartInference` to come second).
- 20% is Logging
- 45% is 2xGetEpochGroupData
- 5% GetEpochIndex
- 10% SetEpochGroupData, 
- 20% SetParticipant/GetParticipants w/o logging

`ProcessInferencePayment`: 14% of `FinishInference` and 12% of `StartInference` 
- 63% is Logging
- 18% `SetParticipant`/2x`GetParticipant` w/o logging
- 9% Add/GetTokenomicsData

# Tasks:
- I.1. Measure this transaction on mainnet but with INFO level logging turned off. Check if it decreases duration by 15% for `FinishInference` and 13% for `StartInference`. When completed report results before moving forward. If we confirm the difference, do the (I.2).
- I.2. We need to test if that changes if we write logs to files rather than stdout. When results are measured, report them before moving forward. If that works. Do final clean implementation. If didn’t work do (I.3)  (see more details below)
- I.3. Or we need to move most of the logs of `StartInference` and `FinishInference` (except one per transaction) to DEBUG. Measure results and report them.

**More details for I.2**

1. Question: where is most of the output coming from, and from which part of the code (which file/module in code/python)?
2. For terminal, Docker, or k8s, we can build a wrapper using redirect_stdout from contextlib.

Important: when redirecting to a file via >, full buffering kicks in (buffer ~8KB). The writes still go through stdout, and flushing can be delayed for minutes. This can significantly slow down the entire process.

After using a wrapper (writing to a file with explicit buffering):
	•	You control the buffer size yourself (even 1MB).
	•	No Docker daemon involvement in I/O.
	•	Ability to do batched writes and use a dedicated writer thread.

Bottom line: it won’t magically become “16x faster,” but the main bottleneck, passing through stdout, will be removed completely. The wrapper will give you the maximum performance your filesystem can provide.

Something like:
import sys
from contextlib import redirect_stdout

def run_with_buffered_file(func, log_path, buffer_size=1024*64):
    """
    Temproraly redirects stdout to the file with buffer  
    """
    with open(log_path, 'w', buffering=buffer_size, encoding='utf-8') as f:
        with redirect_stdout(f):
            func()
or
def my_task():
    print("this goes to the file, not to the docker logs")
    # Subprocess output will NOT be captured unless you redirect it explicitly.

run_with_buffered_file(my_task, "/app/logs/task.log", buffer_size=128*1024)

```
high-load example / max speed prod gives more >10k rows/sec. 

import threading
import queue
import sys
import time
from contextlib import redirect_stdout

class AsyncFileWriter:
    def init(self, filepath, buffer_size=1024*1024, max_queue=10000):
        self.file = open(filepath, 'a', buffering=buffer_size, encoding='utf-8')
        self.queue = queue.Queue(maxsize=max_queue)
        self.running = True
        threading.Thread(target=self._writer, daemon=True).start()

    def write(self, s):
        try:
            self.queue.put_nowait(s)
        except queue.Full:
            # Memory overflow protection: flush to the main writer thread.
            self.file.write(s)

    def flush(self):
        self.file.flush()

    def _writer(self):
        while self.running:
            try:
                s = self.queue.get(timeout=0.1)
                self.file.write(s)
            except queue.Empty:
                continue

    def close(self):
        self.running = False
        time.sleep(0.2)  # Give the queue time to record.
        self.file.close()

def run_async_logging(func, log_path):
    writer = AsyncFileWriter(log_path, buffer_size=256*1024)
    with redirect_stdout(writer):
        func()
    writer.close()
```

Never do this:
```
CMD python app.py > /logs/app.log
```
It kills performance and adds two unnecessary pipes.

Do this instead:
```
FROM python:3.11-slim
WORKDIR /app
COPY app.py .

# The -u flag disables Python stdout buffering, but we don’t rely on it:
# we control file buffering ourselves. Keeping it as a safety measure.
ENV PYTHONUNBUFFERED=1

CMD ["python", "app.py"]
```
Then, in `app.py`, wrap the entry point:
```
if __name__ == "__main__":
    with open("/var/log/app/out.log", "a", buffering=64*1024, encoding="utf-8") as f:
        tee = Tee(sys.stdout, f)
        with redirect_stdout(tee):
            main()
```
Result:
	•	docker logs: you see logs immediately (via sys.stdout).
	•	The file /var/log/app/out.log is written with a 64KB buffer.
	•	No Docker daemon involvement in file I/O.

Pay attention:
- A high-load, max-speed production example outputs >10k lines/sec, showing an 8 to 16x difference.
- A simple wrapper will already solve the problem and should give a solid 5 to 6x speedup.
- stdout is the slowest path and it holds a lock until the operation completes.
- An unbuffered file will be slower than stdout, because every print becomes a write() to disk.
- A buffered file will be much faster, because it only writes when the 64KB buffer fills.

# Important
This issue is one of five issues in the [0/4] StartInference and FinishInference series (and correspondingly [1/4], [2/4], [3/4], [4/4]).
These tasks can be completed independently of each other by different contributors.
However, this specific task requires maintaining and operating a node on mainnet in order to test and validate the result.

All five issues [0/4], [1/4], [2/4], [3/4], [4/4] in this series must be completed as part of the v0.2.11 upgrade, which is scheduled for the week of February 23. After the v0.2.11 upgrade, these tasks will no longer be relevant, because a different solution can/will be proposed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/4] `StartInference` and `FinishInference` #780

Background

Tasks:

Important

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[1/4] StartInference and FinishInference #780

Description

Background

Tasks:

Important

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[1/4] `StartInference` and `FinishInference` #780