Skip to content

Conversation

@antiguru
Copy link
Member

Previously, the log action would receive an empty container when we'd like
to report progress but no new data. We'd only send the empty container when
we didn't have any other data to send. This makes it hard for users to
distinguish between flushing and just data updates. It might be important
to distinguish the two because only on flush we might not be called again
for a while, but otherwise it's very likely that the logger might receive
more data, or sees a flush.

This change alters the signature of the action to accept a &mut Option<C>
(where C is CB::Container), and we pass Some(container) on data, and
None on flush.

Clients using the logging API need to change their implementation, as both
vectors and Option offer a iter function, but with different results.

Signed-off-by: Moritz Hoffmann [email protected]

Previously, the log action would receive an empty container when we'd like
to report progress but no new data. We'd only send the empty container when
we didn't have any other data to send. This makes it hard for users to
distinguish between flushing and just data updates. It might be important
to distinguish the two because only on flush we might not be called again
for a while, but otherwise it's very likely that the logger might receive
more data, or sees a flush.

This change alters the signature of the action to accept a `&mut Option<C>`
(where `C` is `CB::Container`), and we pass `Some(container)` on data, and
`None` on flush.

Clients using the logging API need to change their implementation, as both
vectors and `Option` offer a `iter` function, but with different results.

Signed-off-by: Moritz Hoffmann <[email protected]>
We don't regularly drop the inner logger, so one additional flush doesn't
justify the added complexity.

Signed-off-by: Moritz Hoffmann <[email protected]>
`publish_batch` accepts a mutable reference to an option instead of a
mutable reference to a container.

Signed-off-by: Moritz Hoffmann <[email protected]>
Copy link
Member

@frankmcsherry frankmcsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me. I left a few comments, but they are roughly nits, and can be followed on later if we conclude they are worth it!

let mut c = Some(std::mem::take(container));
(self.action)(&elapsed, &mut c);
if let Some(mut c) = c {
c.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion, but is this clearing new behavior? Does it e.g. prevent passing back owned data, into which the logger can write? Again, really no strong opinion, but just checking whether the force-clear is new, and whether it is intentional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was purely defensive. Thinking about it, it should be the container builder's responsibility to enforce what needs to be true about a container after extracting or finishing it, so it doesn't make sense to have the clear call here.


self.dirty = false;
// Send no container to indicate flush.
(self.action)(&elapsed, &mut None);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit, but one of the reasons to take a &mut Option<_> rather than a Option<&mut _> is to allow the None call to pass back resources, and at least with Push the intent is that you keep calling this as long as you get a non-None back. We don't have to do that here, and probably massively over-thinking this, but wanted to call out the gap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, but I'm wondering if this is actually true! Or put differently, is it something we'd like to be true, or is it something that we had at some point and forgot about?

The reason I'm asking is because most (all?) places where we call Push::done, we don't have a loop to drain resources. If I recall correctly, the only place where we loop is in Differential merge batchers to drain the stash of allocations once we're done merging chains.

We could change Push's done function to look like this instead:

    fn done(&mut self) { 
        let mut container = None;
        loop {
            self.push(&mut container);
            if container.is_none() { break; }
        }
    }

Signed-off-by: Moritz Hoffmann <[email protected]>
@antiguru antiguru merged commit 291de98 into TimelyDataflow:master Jan 16, 2025
7 checks passed
@antiguru antiguru deleted the logger_flush branch January 16, 2025 10:44
@github-actions github-actions bot mentioned this pull request Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants