Skip to content

Streams and Handshakes

Julian Kemmerer edited this page Dec 7, 2024 · 11 revisions

WORK IN PROGRESS

TODO

starting with dump of Discord conversation:


Streams (Data and Valid Signals)

you had asked about what PipelineC has to offer in terms of a stream abstraction, and working with things like AXI4-Stream

So I wrote up some example code with plenty of comments here https://github.com/JulianKemmerer/PipelineC/blob/master/examples/stream_io.c

And I want to walk through some of it here

Starting off we can define a C struct that looks like AXIS (you might do something like this with vhdl record or verilog struct) typedef struct my_axis_32_t{ uint8_t data[4]; uint1_t keep[4]; uint1_t last; // TODO user field }my_axis_32_t;

immediately you can see that this is not flexible if you need a different width AXIS, or one with or without tuser of a specific size, then you need to define another struct

The closest thing to a stream abstraction in pipelinec is a helper for the 'data and valid' parts of a valid-ready handshake (like that used in AXIS)

imagine a struct like typedef struct my_axis_32_t_stream_t{ my_axis_32_t data; uint1_t valid; }my_axis_32_t_stream_t;

that is what is handled for you by using the DECL_STREAM_TYPE macro along with the helper stream(type) macro for _stream_t

//ex. DECL_STREAM_TYPE(my_axis_32_t) ... stream(my_axis_32_t) axis_in; // axis_in.data is a my_axis_32_t

Ready Signal

You may have noticed the missing 'ready' signal for flow control we did not mix the ready signal into the data and valid stream(my_axis_32_t) because it goes in the opposite direction PipelineC doesnt have an equivalent of system verilog modports with in and out inside one variable etc

so instead you will see separate inputs and outputs for data+valid vs ready when writing a module/function:

typedef struct my_func_out_t{ // Outputs from module // Output .data and .valid stream stream(my_axis_32_t) axis_out; // Output ready for input axis stream uint1_t ready_for_axis_in; }my_func_out_t; my_func_out_t my_func( // Inputs to module // Input .data and .valid stream stream(my_axis_32_t) axis_in, // Input ready for output axis stream uint1_t ready_for_axis_out );

^A function that takes as inputs data+valid stream incoming AXIS and a flag for outgoing AXIS ready

and outputs data+valid stream of outgoing AXIS and a flag for incoming AXIS ready

The generated output VHDL for such a module looks like entity my_func_0CLK_262c3538 is port( clk : in std_logic; CLOCK_ENABLE : in unsigned(0 downto 0); axis_in : in my_axis_32_t_stream_t; ready_for_axis_out : in unsigned(0 downto 0); return_output : out my_func_out_t ); end my_func_0CLK_262c3538;

Ready Feedback Pragma

next will look at the PipelineC to wire together two of those modules -> first -> second-> chained dataflow

Lets talk about just the feed forward data part of things that is just the data and valid parts of stream not the feedback ready signal

a sketch of

input axis -> first instance -> second instance -> output axis

my_func_out_t func0 = my_func( input_axis, );

my_func_out_t func1 = my_func( func0.axis_out, );

output_axis = func1.axis_out

notice output of first instance is input to second output of second is connected to final output

adding in ready signals requires some use of pipelinec specific #pragma FEEDBACK meaning: the first time you read from this wire, the value is from feedback, from the last point where the wire was assigned

in practice it looks like variable has no/zero value but is still used, you can just pretend the variable has its correct feedback value when 'running this in your head':

With the ready signals included: // Input stream into first instance
uint1_t ready_for_func0_axis_out; // Note: FEEDBACK not assigned a value yet #pragma FEEDBACK ready_for_func0_axis_out my_func_out_t func0 = my_func( input_axis, ready_for_func0_axis_out ); uint1_t ready_for_input_axis = func0.ready_for_axis_in;

// Output of first instance into second uint1_t ready_for_func1_axis_out = ready_for_output_axis; my_func_out_t func1 = my_func( func0.axis_out, ready_for_func1_axis_out ); // Note: FEEDBACK assigned here ready_for_func0_axis_out = func1.ready_for_axis_in;

notice the #pragma FEEDBACK ready_for_func0_axis_out and ready_for_func0_axis_out = func1.ready_for_axis_in; the ready input into the first instance is driven by the ready output of the second instance

^which has a block diagram like so image

Full Example

The final two parts of the example https://github.com/JulianKemmerer/PipelineC/blob/master/examples/stream_io.c are 1 ) filling what the my_func block actually does: not super critical, but the demo shows how to make a skid buffer to avoid having ready as a critical path through block and 2) the final hooks to build / compile this design / make some vhdl to synthesize and use on fpga

notice earlier how the function returning a struct in C my_func_out_t my_func(... turned into a VHDL module outputting a record return_output : out my_func_out_t

similarly, we could try to use structs as outputs on the top level of the final vhdl but thats not particulaly friendly across tools and/or when mixing into a verilog flow

so instead you can declare top level ports like so and will use xilinx axis style names with simple uint types DECL_INPUT(uint32_t, s_axis_tdata) DECL_INPUT(uint4_t, s_axis_tkeep) DECL_INPUT(uint1_t, s_axis_tlast) DECL_INPUT(uint1_t, s_axis_tvalid) DECL_OUTPUT(uint1_t, s_axis_tready) DECL_OUTPUT(uint32_t, m_axis_tdata) DECL_OUTPUT(uint4_t, m_axis_tkeep) DECL_OUTPUT(uint1_t, m_axis_tlast) DECL_OUTPUT(uint1_t, m_axis_tvalid) DECL_INPUT(uint1_t, m_axis_tready)

After those inputs are declared we can define a space to wire together the multiple instances #pragma PART "xc7a35ticsg324-1l" // Artix 7 35T (Arty) #pragma MAIN_MHZ top 100.0 void top(){ ... code using s_axis_tdata, m_axis_tdata, etc here }

notice that function top has no inputs or outputs (since they were separately declared as global variables)

The final rendered VHDL if run with --top my_two_instances to ends up looking like entity my_two_instances is port( clk_100p0 : in std_logic; s_axis_tdata_val_input : in unsigned(31 downto 0); s_axis_tkeep_val_input : in unsigned(3 downto 0); s_axis_tlast_val_input : in unsigned(0 downto 0); s_axis_tvalid_val_input : in unsigned(0 downto 0); s_axis_tready_return_output : out unsigned(0 downto 0); m_axis_tdata_return_output : out unsigned(31 downto 0); m_axis_tkeep_return_output : out unsigned(3 downto 0); m_axis_tlast_return_output : out unsigned(0 downto 0); m_axis_tvalid_return_output : out unsigned(0 downto 0); m_axis_tready_val_input : in unsigned(0 downto 0) );

and you would be free to instantiate that my_two_instances in some external flow

generally can refer to getting started info on how to use output https://github.com/JulianKemmerer/PipelineC/wiki/Running-the-Tool

So in full with the connection to the s/m_axis top level global signals the two instances of the function along with some type massaging from uint to/from arrays looks like so void top(){ // Connect top level input ports to local stream type variables // Input stream data stream(my_axis_32_t) input_axis; UINT_TO_BYTE_ARRAY(input_axis.data.data, 4, s_axis_tdata) UINT_TO_BIT_ARRAY(input_axis.data.keep, 4, s_axis_tkeep) input_axis.data.last = s_axis_tlast; input_axis.valid = s_axis_tvalid; // Output stream ready uint1_t ready_for_output_axis = m_axis_tready;

// Input stream into first instance
uint1_t ready_for_func0_axis_out; // Note: FEEDBACK not assigned a value yet #pragma FEEDBACK ready_for_func0_axis_out my_func_out_t func0 = my_func( input_axis, ready_for_func0_axis_out ); uint1_t ready_for_input_axis = func0.ready_for_axis_in;

// Output of first instance into second uint1_t ready_for_func1_axis_out = ready_for_output_axis; my_func_out_t func1 = my_func( func0.axis_out, ready_for_func1_axis_out ); // Note: FEEDBACK assigned here ready_for_func0_axis_out = func1.ready_for_axis_in;

// Connect top level output ports from local stream type variables // Output stream data m_axis_tdata = uint8_array4_le(func1.axis_out.data.data); // Array to uint m_axis_tkeep = uint1_array4_le(func1.axis_out.data.keep); // Array to uint m_axis_tlast = func1.axis_out.data.last; m_axis_tvalid = func1.axis_out.valid; // Input stream ready s_axis_tready = ready_for_input_axis; }

In this case because the FEEDBACK in the top() function, there isnt room to add autopipelining to this design (ex. if you needed to do a bunch of math on an stream, ex. any crypto)

that requires a little bit different of a design style to connect an auto pipeline into one of these streams with data,valid,ready (can talk about that we we get to it, if curious look at #include "global_func_inst.h" and macro for GLOBAL_VALID_READY_PIPELINE_INST)

Handshakes (Data, Valid, and Ready Signals)

Put together a way of describing things that hides this FEEDBACK separate ready signal that can be confusing sometimes.

https://github.com/JulianKemmerer/PipelineC/blob/master/examples/handshake_io.c

to explain whats been done inside that handshake_io.c demo, its easiest to talk about what has changed from the previous stream_io.c

In addition to the old DECL_STREAM_TYPE(my_axis_32_t) these were also added DECL_HANDSHAKE_TYPE(my_axis_32_t) DECL_HANDSHAKE_INST_TYPE(my_axis_32_t, my_axis_32_t) // out type, in type // needed for my_axis_32_t some_func(my_axis_32_t)

declaring types for 'handshakes' that include ready signals

the second 'INST_TYPE' macro is specific to declaring signals for a module that is in type -> module -> out type one input handshake/stream into the module/function and one output handshake/stream output

handshake.h comes with hs_in/out helper macros to be used like

hs_out(my_axis_32_t) my_func( hs_in(my_axis_32_t) inputs );

ex. hs_out(my_axis_32_t) outputs; outputs.stream_out = ...; outputs.ready_for_stream_in = ...;

and similar for input side Finally getting to the top() function where the main dataflow is specified DECL_INPUT(uint32_t, s_axis_tdata) DECL_INPUT(uint4_t, s_axis_tkeep) DECL_INPUT(uint1_t, s_axis_tlast) DECL_INPUT(uint1_t, s_axis_tvalid) DECL_OUTPUT(uint1_t, s_axis_tready) DECL_OUTPUT(uint32_t, m_axis_tdata) DECL_OUTPUT(uint4_t, m_axis_tkeep) DECL_OUTPUT(uint1_t, m_axis_tlast) DECL_OUTPUT(uint1_t, m_axis_tvalid) DECL_INPUT(uint1_t, m_axis_tready) void top(){ ... }

The first thing to see is that making an instance of your my_func is now not simply 'use the function in code' instead helper macros like so exist to make instances to be wired up later // func0: my_axis_32_t my_func(my_axis_32_t) DECL_HANDSHAKE_INST(func0, my_axis_32_t, my_func, my_axis_32_t) // func1: my_axis_32_t my_func(my_axis_32_t) DECL_HANDSHAKE_INST(func1, my_axis_32_t, my_func, my_axis_32_t)

Most importantly is the simplified syntax for connecting the input stream -> func0 -> func1 -> output stream data data flow

// Input stream into first instance // func0 input handshake = input_axis, s_axis_tready HANDSHAKE_FROM_STREAM(func0, input_axis, s_axis_tready)

// Output of first instance into second // func1 input handshake = func0 output handshake HANDSHAKE_CONNECT(func1, func0)

// Output stream from second instance stream(my_axis_32_t) output_axis; // output_axis, m_axis_tready = func1 output handshake STREAM_FROM_HANDSHAKE(output_axis, m_axis_tready, func1)

image

So if this interests you, happy to explain more and help guide through using this very new handshake stuff

idea is to easily use HANDSHAKE_CONNECT for wiring up a data flow

maybe can use it for an ethernet demo refresh

eventually can get into handshakes that are more than one input -> func -> one output and get into 'split' and 'join' of multiple streams/handshakes

Clone this wiki locally