You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I see that moving log data around is the primary use case for vector, but I've been wondering if it could be used to ingest clickstream events into Kafka and then from Kafka into ClickHouse. The pipeline would roughly be:
The http_source endpoint wouldn't be publicly exposed directly. For a production setup we'll have a load balancer in front of the vector processes and some sort of WAF in front of everything for security.
We're trying to avoid building and maintaining our own HTTP->Kafka solution, so that's why we're interested in using vector. Main questions I have:
Has anyone experimented or deployed something like this with vector and could share their experience?
Is this a "bad" use of the http_server source? I can't seem to find a reason why it would be.
Furthermore, I ran some quick stress tests with hey using the following settings:
# 100K requests, 250 connections, HTTP2 enabled, posting a single log entry as JSON.
$ hey -n 100000 -c 250 -h2 -m POST -D fake_log.txt -T application/json http://127.0.0.1:3010/
Performance was great but a few hundred requests returned non-200 with the error below. If I increase concurrency, I get more errors. Could it be hitting an open connection limit? I couldn't find a way to inspect or tweak this in the source settings.
[1] Post "http://127.0.0.1:3010/": read tcp 127.0.0.1:61797->127.0.0.1:3010: read: connection reset by peer`.
edit 1: I did test using only the blackhole sink and got the same results.
edit 2: ran hey from within a container in the same docker network and didn't see any issues with 500 connections. Probably something funky from running all those connections from host -> container.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello all,
I see that moving log data around is the primary use case for vector, but I've been wondering if it could be used to ingest clickstream events into Kafka and then from Kafka into ClickHouse. The pipeline would roughly be:
The
http_source
endpoint wouldn't be publicly exposed directly. For a production setup we'll have a load balancer in front of the vector processes and some sort of WAF in front of everything for security.We're trying to avoid building and maintaining our own HTTP->Kafka solution, so that's why we're interested in using vector. Main questions I have:
http_server
source? I can't seem to find a reason why it would be.Furthermore, I ran some quick stress tests with hey using the following settings:
Results:
Performance was great but a few hundred requests returned non-200 with the error below. If I increase concurrency, I get more errors. Could it be hitting an open connection limit? I couldn't find a way to inspect or tweak this in the source settings.
edit 1: I did test using only the
blackhole
sink and got the same results.edit 2: ran
hey
from within a container in the same docker network and didn't see any issues with 500 connections. Probably something funky from running all those connections from host -> container.Beta Was this translation helpful? Give feedback.
All reactions