-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
Dragonfly crashes when executing scripts with the --lua_auto_async=true flag active.
Without the flag everything runs fine.
To Reproduce
- Start dragonfly with this command:
dragonfly --cache_mode=false --lua_auto_async=true --table_growth_margin 0.1 - optionally Load the script
- execute the script with eval/evalsha
Expected behavior
I Expect the script to run just like without the lua_auto_async flag, but faster.
Environment (please complete the following information):
- OS: MacOS 26.2
- Kernel: Darwin MacBook-Pro-von-Heiko.local 25.2.0 Darwin Kernel Version 25.2.0: Tue Nov 18 21:07:05 PST 2025; root:xnu-12377.61.12~1/RELEASE_ARM64_T6020 arm64
- Containerized?: Docker Compose
- Dragonfly Version: tested on 1.35.1 and 1.36.0
Reproducible Code Snippet
Sorry, i have not yet been able to make it a simple reproducable snippet
Additional context
I tried manually replacing the couple of calls with acalls, but acall seems to have a different return value. I was unable to find documentation around acall and apcall. if i just replace call with apcall the script errors with this:
attempt to perform arithmetic on a function value
for this line:
actually_deleted_rules = actually_deleted_rules + redis.acall("HDEL", account_key, table.unpack(rules_to_delete))by default there is no crash output from DF. When i add the flags --vmodule=main_service=1,set_family=2, i get this crash report:
*** SIGSEGV received at time=1769763295 on cpu 0 ***
PC: @ 0xaaaac350f134 (unknown) util::fb2::detail::Scheduler::Preempt()
@ 0xaaaac39436c4 464 absl::lts_20250512::AbslFailureSignalHandler()
@ 0xffffa12e37a0 4960 (unknown)
@ 0xaaaac2c90570 224 util::fb2::EventCount::await<>()
@ 0xaaaac2d5fcc8 176 dfly::ScriptMgr::UpdateScriptCaches()
@ 0xaaaac2d634dc 224 dfly::ScriptMgr::Insert[abi:cxx11]()
@ 0xaaaac2d18cbc 656 dfly::Service::Eval()
@ 0xaaaac333bff0 208 dfly::CommandId::Invoke()
@ 0xaaaac2d0fc8c 80 dfly::Service::InvokeCmd()
@ 0xaaaac2d10a94 512 dfly::Service::DispatchCommand()
@ 0xaaaac34bb150 416 facade::Connection::DispatchSingle()
@ 0xaaaac34bb61c 304 facade::Connection::ParseRedis()
@ 0xaaaac34bc26c 1632 facade::Connection::IoLoop()
@ 0xaaaac34bc8ec 272 facade::Connection::ConnectionFlow()
@ 0xaaaac34bdb64 368 facade::Connection::HandleRequests()
@ 0xaaaac3520bc4 560 util::ListenerInterface::RunSingleConnection()
@ 0xaaaac35211c0 192 boost::context::detail::fiber_entry<>()
I use ioredis with ioredis.defineCommand. This means it usually uses EVALSHA, except for when the script has not been loaded yet, then it uses EVAL.
This is the script i'm trying to execute. In high-load scenarios it is called with up to 6k accounts and 400-500 entities, resulting in about 2-3 million hash-entries to be created
---@diagnostic disable: undefined-global
-- KEYS[1..M] = per-account keys (BR<accountId1>, IBR<accountId1>, BR<accountId2>, IBR<accountId2>, ...)
--
-- ARGV[1] = invalidateInfoTTL (seconds, e.g. "60"; "0" to skip)
-- ARGV[2..] = BR-Cache keys invalidate, one string per account:
-- format: "field␟field␟field..."
--
-- Return: deletedCount
local invalidate_info_ttl = tonumber(ARGV[1])
local separator = string.char(0x1f) -- ChatGPT is very sure that this is the best unit separator ever
local account_count = #KEYS / 2
-- check if the correct number of arguments are provided
if #ARGV ~= 1 + account_count then
return redis.error_reply("ARGV must be: invalidateInfoTtl, then exactly one blob per account")
end
-- Determine the timestamp that the rules where invalidated at (aka now)
local redis_time = redis.call("TIME")
local now_us = redis_time[1] * 1000000 + redis_time[2]
local actually_deleted_rules = 0
-- parses the string "field␟field␟field"
-- into the arary [field, field, field]
local function parse_fields(blob)
local fields = {}
local blen = #blob
local pos = 1
while pos <= blen do
local next_sep = string.find(blob, separator, pos, true)
local token_end = next_sep and (next_sep - 1) or blen
fields[#fields + 1] = string.sub(blob, pos, token_end)
pos = next_sep and (next_sep + 1) or (blen + 1)
end
return fields
end
for i = 1, account_count do
-- Name of the hash that contians the cached results for the current account
-- Format: BR<accountId>
local account_key = KEYS[2*i - 1]
-- Name of the hash that contians the timesstamps for recently invalidated
-- rules for the current account. Format: IBR<accountId>
local invalidated_account_key = KEYS[2*i]
-- names of rules to delete for the current account.
-- Format: "field␟field␟field..."
local deleted_rules_for_account = ARGV[1 + i]
-- For each account: delete the rules and record when the deletion happened
if deleted_rules_for_account and #deleted_rules_for_account > 0 then
local rules_to_delete = parse_fields(deleted_rules_for_account)
-- Fast path: one field
if #rules_to_delete == 1 then
actually_deleted_rules = actually_deleted_rules + redis.call("HDEL", account_key, rules_to_delete[1])
redis.call("HSET", invalidated_account_key, rules_to_delete[1], now_us)
else
-- Bulk path: multiple fields
actually_deleted_rules = actually_deleted_rules + redis.call("HDEL", account_key, table.unpack(rules_to_delete))
local deletion_timestamps = {}
for j = 1, #rules_to_delete do
local field_index = (j - 1) * 2 + 1
deletion_timestamps[field_index] = rules_to_delete[j]
deletion_timestamps[field_index + 1] = now_us
end
redis.call("HSET", invalidated_account_key, table.unpack(deletion_timestamps))
end
-- The info about when rules were invalidated are only relevant temprorarily.
-- Make sure they expire at some point.
if invalidate_info_ttl and invalidate_info_ttl > 0 then
redis.call("EXPIRE", invalidated_account_key, invalidate_info_ttl)
end
end
end
return actually_deleted_rules