You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
use timely::dataflow::operators::{ToStream, Inspect};
181
183
182
-
fn main() {
183
-
timely::example(|scope| {
184
-
(0..10).to_stream(scope)
185
-
.inspect(|x| println!("seen: {:?}", x));
186
-
});
187
-
}</code></pre></pre>
184
+
timely::example(|scope| {
185
+
(0..10).to_stream(scope)
186
+
.inspect(|x| println!("seen: {:?}", x));
187
+
});
188
+
<spanclass="boring">}</span></code></pre></pre>
188
189
<p>This program gives us a bit of a flavor for what a timely dataflow program might look like, including a bit of what Rust looks like, without getting too bogged down in weird stream processing details. Not to worry; we will do that in just a moment!</p>
189
190
<p>If we run the program up above, we see it print out the numbers zero through nine.</p>
190
191
<pre><codeclass="language-ignore"> Echidnatron% cargo run --example simple
<p>Why would we want to make our life so complicated? The main reason is that we can make our program <em>reactive</em>, so that we can run it without knowing ahead of time the data we will use, and it will respond as we produce new data.</p>
<p>Timely dataflow means to capture a large number of idioms, so it is a bit tricky to wrap together one example that shows off all of its features, but let's look at something that shows off some core functionality to give a taste.</p>
178
178
<p>The following complete program initializes a timely dataflow computation, in which participants can supply a stream of numbers which are exchanged between the workers based on their value. Workers print to the screen when they see numbers. You can also find this as <ahref="https://github.com/TimelyDataflow/timely-dataflow/blob/master/examples/hello.rs"><code>examples/hello.rs</code></a> in the <ahref="https://github.com/TimelyDataflow/timely-dataflow/tree/master/examples">timely dataflow repository</a>.</p>
<p>We can run this program in a variety of configurations: with just a single worker thread, with one process and multiple worker threads, and with multiple processes each with multiple worker threads.</p>
212
213
<p>To try this out yourself, first clone the timely dataflow repository using <code>git</code></p>
<p>We can check out the examples <code>examples/capture_send.rs</code> and <code>examples/capture_recv.rs</code> to see a paired use of capture and receive demonstrating the generality.</p>
266
266
<p>The <code>capture_send</code> example creates a new TCP connection for each worker, which it wraps and uses as an <code>EventPusher</code>. Timely dataflow takes care of all the serialization and stuff like that (warning: it uses abomonation, so this is not great for long-term storage).</p>
<p>The <code>capture_recv</code> example is more complicated, because we may have a different number of workers replaying the stream than initially captured it.</p>
<p>Almost all of the code up above is assigning responsibility for the replaying between the workers we have (from <code>worker.peers()</code>). We partition responsibility for <code>0 .. source_peers</code> among the workers, create <code>TcpListener</code>s to handle the connection requests, wrap them in <code>EventReader</code>s, and then collect them up as a vector. The workers have collectively partitioned the incoming captured streams between themselves.</p>
317
317
<p>Finally, each worker just uses the list of <code>EventReader</code>s as the argument to <code>replay_into</code>, and we get the stream magically transported into a new dataflow, in a different process, with a potentially different number of workers.</p>
318
318
<p>If you want to try it out, make sure to start up the <code>capture_recv</code> example first (otherwise the connections will be refused for <code>capture_send</code>) and specify the expected number of source workers, modifying the number of received workers if you like. Here we are expecting five source workers, and distributing them among three receive workers (to make life complicated):</p>
<p>The <code>ExchangeData</code> trait is more complicated, and is established in the <code>communication/</code> module. The trait is a synonym for</p>
<p>where <code>serde</code> is Rust's most popular serialization and deserialization crate. A great many types implement these traits. If your types does not, you should add these decorators to their definition:</p>
<p>You must include the <code>serde</code> crate, and if not on Rust 2018 the <code>serde_derive</code> crate.</p>
190
196
<p>The downside to is that deserialization will always involve a clone of the data, which has the potential to adversely impact performance. For example, if you have structures that contain lots of strings, timely dataflow will create allocations for each string even if you do not plan to use all of them.</p>
<p>Let's imagine you would like to play around with a tree data structure as something you might send around in timely dataflow. I've written the following candidate example:</p>
<p>This doesn't work. You'll probably get two errors, that <code>TreeNode</code> doesn't implement <code>Clone</code>, nor does it implement <code>Debug</code>. Timely data types need to implement <code>Clone</code>, and our attempt to print out the trees requires an implementation of <code>Debug</code>. We can create these implementations by decorating the <code>struct</code> declaration like so:</p>
@@ -265,19 +271,22 @@ <h3 id="exchanging-data"><a class="header" href="#exchanging-data">Exchanging da
265
271
fn new(data: D) -> Self {
266
272
Self { data, children: Vec::new() }
267
273
}
268
-
}</code></pre>
274
+
}</code></pre></pre>
269
275
<p>We get a new error. A not especially helpful error. It says that it cannot find an <code>exchange</code> method, or more specifically that one exists but it doesn't apply to our type at hand. This is because the data need to satisfy the <code>ExchangeData</code> trait but do not. It would be better if this were clearer in the error messages, I agree.</p>
<p>Communication in timely dataflow starts from the <code>timely_communication</code> crate. This crate includes not only communication, but is actually where we start up the various worker threads and establish their identities. As in timely dataflow, everything starts by providing a per-worker closure, but this time we are given only a channel allocator as an argument.</p>
178
178
<p>Before continuing, I want to remind you that this is the <em>internals</em> section; you could write your code against this crate if you really want, but one of the nice features of timely dataflow is that you don't have to. You can use a nice higher level layer, as discussed previously in the document.</p>
179
179
<p>That being said, let's take a look at the example from the <code>timely_communication</code> documentation, which is not brief but shouldn't be wildly surprising either.</p>
<p>There is only a limited amount of configuration you can currently do in a timely dataflow computation, and it all lives in the <code>initialize::Configuration</code> type. This type is a simple enumeration of three ways a timely computation could run:</p>
<p>A dataflow graph hosts some number of operators. For progress tracking, these operators are simply identified by their index. Each operator has some number of <em>input ports</em>, and some number of <em>output ports</em>. The dataflow operators are connected by connecting each input port to a single output port (typically of another operator). Each output port may be connected to multiple distinct input ports (a message produced at an output port is to be delivered to all attached input ports).</p>
186
186
<p>In timely dataflow progress tracking, we identify output ports by the type <code>Source</code> and input ports by the type <code>Target</code>, as from the progress coordinator's point of view, an operator's output port is a <em>source</em> of timestamped data, and an operator's input port is a <em>target</em> of timestamped data. Each source and target can be described by their operator index and then an operator-local index of the corresponding port. The use of distinct types helps us avoid mistaking input and output ports.</p>
<p>The structure of the dataflow graph can be described by a list of all of the connections in the graph, a <code>Vec<(Source, Target)></code>. From this, we could infer the number of operators and their numbers of input and output ports, as well as enumerate all of the connections themselves.</p>
201
204
<p>At this point we have the structure of a dataflow graph. We can draw a circle for each operator, a stub for each input and output port, and edges connecting the output ports to their destination input ports. Importantly, we have names for every location in the dataflow graph, which will either be a <code>Source</code> or a <code>Target</code>.</p>
0 commit comments