You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
background:
I have implemented my own directory jumping tool (called it "sd" for "switch directory") since about 2011, for a long time in full ignorance of other tools (mea culpa). but I have finally now looked on the scoring algorithm of established tools, notably z and zoxide (which seem to "rule" :)), and discovered that they do something quite different. I post this here in the hope that it might be interesting for your project one way or another. I am just interested in the technical discussion (intentionally not dropping a link to my tool here...). my current understanding (please correct me if I am wrong) is:
the frecency approach used by zoxide and z reduces each directory's history to two numbers: a rank counter (raw visit frequency) and a last-access timestamp. this is compact and fast but lossy in a specific way: it cannot distinguish between 100 visits concentrated last week plus one now, and 100 visits spread over three months plus one now. The aggregate rank is identical in both cases, and only the timestamp differ minimally, so they score essentially identically.
I have implemented a different approach: retain the full sequence of cd events (a history file, if you like) up to a configurable limit (say 10,000) and derive scores directly from that sequence. a power-law aging kernel is used to compute an age-weighted sum over all visits which constitutes the score. apart from the chosen window width on which to compute (typically 500-1000), the exponent controlling the kernel shape then becomes a single tuneable parameter that spans continuously from pure frequency (exponent near zero) to near-pure recency (large exponent), without changing the underlying model.
a second difference worth considering is the time base. the wall-clock aging used by z/zoxide means directories decay during periods of inactivity — a week's holiday degrades your scores regardless of whether your navigation habits have actually changed. using cd event count as the clock instead means scores only change due to user's directory navigation, which seems more faithful to the intent of frecency.
a third difference is how pruning interacts with scoring. threshold-based pruning as used in zoxide couples the two concerns: a directory that scores low gets dropped permanently. separating them — scoring from a sliding event window, pruning by simple fifo logfile cutoff — means a directory is only lost when it has genuinely had no visits within the retained history, not because its score happened to cross a threshold during an inactive period.
curious whether these tradeoffs have been considered and whether there are reasons the aggregate approach is preferred beyond implementation simplicity (which of course is always a valid argument).
while I naturally do not suggest that zoxide does fundamentally change its scoring algorithm, I wonder whether a switch from wall-clock timestamping to cd-event timestamping could not possibly be provided: zoxide does use sum_over_ranks anyway and it seems it would be possible to store as "timestamp" of each dir the value of that sum at the time of last visit. it would require to rephrase the aging penalty in terms of "number of elapsed events" (difference between current rank sum and "time stamp rank sum") but would remove the wall clock time influence altogether. would this be desirable for zoxide or would it be bad?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
background:
I have implemented my own directory jumping tool (called it "sd" for "switch directory") since about 2011, for a long time in full ignorance of other tools (mea culpa). but I have finally now looked on the scoring algorithm of established tools, notably z and zoxide (which seem to "rule" :)), and discovered that they do something quite different. I post this here in the hope that it might be interesting for your project one way or another. I am just interested in the technical discussion (intentionally not dropping a link to my tool here...). my current understanding (please correct me if I am wrong) is:
the frecency approach used by zoxide and z reduces each directory's history to two numbers: a rank counter (raw visit frequency) and a last-access timestamp. this is compact and fast but lossy in a specific way: it cannot distinguish between 100 visits concentrated last week plus one now, and 100 visits spread over three months plus one now. The aggregate rank is identical in both cases, and only the timestamp differ minimally, so they score essentially identically.
I have implemented a different approach: retain the full sequence of cd events (a history file, if you like) up to a configurable limit (say 10,000) and derive scores directly from that sequence. a power-law aging kernel is used to compute an age-weighted sum over all visits which constitutes the score. apart from the chosen window width on which to compute (typically 500-1000), the exponent controlling the kernel shape then becomes a single tuneable parameter that spans continuously from pure frequency (exponent near zero) to near-pure recency (large exponent), without changing the underlying model.
a second difference worth considering is the time base. the wall-clock aging used by z/zoxide means directories decay during periods of inactivity — a week's holiday degrades your scores regardless of whether your navigation habits have actually changed. using cd event count as the clock instead means scores only change due to user's directory navigation, which seems more faithful to the intent of frecency.
a third difference is how pruning interacts with scoring. threshold-based pruning as used in zoxide couples the two concerns: a directory that scores low gets dropped permanently. separating them — scoring from a sliding event window, pruning by simple fifo logfile cutoff — means a directory is only lost when it has genuinely had no visits within the retained history, not because its score happened to cross a threshold during an inactive period.
curious whether these tradeoffs have been considered and whether there are reasons the aggregate approach is preferred beyond implementation simplicity (which of course is always a valid argument).
while I naturally do not suggest that zoxide does fundamentally change its scoring algorithm, I wonder whether a switch from wall-clock timestamping to cd-event timestamping could not possibly be provided: zoxide does use sum_over_ranks anyway and it seems it would be possible to store as "timestamp" of each dir the value of that sum at the time of last visit. it would require to rephrase the aging penalty in terms of "number of elapsed events" (difference between current rank sum and "time stamp rank sum") but would remove the wall clock time influence altogether. would this be desirable for zoxide or would it be bad?
Beta Was this translation helpful? Give feedback.
All reactions