Skip to content

Automatically Generated Functionality

Julian Kemmerer edited this page Mar 24, 2022 · 87 revisions

This isn't PipelineC++ folks. One step at a time.

I want PipelineC code to compatible with regular C compilers for easy functional verification of your hardware. There would need to be some extra work done to setup running main in a loop to mimic clock cycles, etc, but it can be done. I mostly don't want to add new language features to standard C / change languages completely unless I have a parser and software compiler to support them. Would love some help in this area.

With that said, some concepts just beg to be implemented with template types and so really benefit from being implemented as auto generated C code.

Table of Contents

Top Level IO + Modules + Registers + Processes, etc

See this documentation.

u/intN_t Types

  • Any 'N' is supported, ex. uint13_t
  • Generated typedefs in header files "uintN_t.h" and "intN_t.h"

Bit Manipulation

Unions can be complicated since they can rely on the layout of bytes in memory. There is no memory model. So for now there are auto generated bit manipulation functions for built in types. These functions are implemented as raw VHDL.

  • Bit select
    • uint<Y-X+1>_t <type_prefix>_Y_X(<type>)
    • Ex. uint16_t uintX_15_0(uintX_t data); // data[15:0], select bits 15 down to 0
  • Bit concatenation
    • uintX_t <type_prefix_0>_<type_prefix_1>(<type0>, <type1>)
    • Result size X is sum of input sizes
    • Ex. uint16_t uint14_uint4(uint14_t x, uint4_t y); // Upper 14 bits and lower 4 bits of a uint16_t
  • Bit duplication
    • uint<Y>_t <type_prefix>_X(<type>)
    • Result width is X times the input width
    • Ex. uint16_t uint4_4(uint4_t x); // Repeat a uint4_t 4 times to form uint16_t
  • Rotate left/right
    • uint<size>_t rot[l|r]<size>_<amount>(uint<size>_t x)
    • Ex. uint64_t rotl64_7(uint64_t x); // Rotate the uint64_t value to the left by 7
  • Bit assignment
    • base_t <base_type_prefix>_<assignment_type_prefix>_X(<base>, <assignment>)
    • Assign data at bit position X in the base value. Result width is same as base.
    • Ex. uint64_t uint64_uint16_2(uint64_t in, uint16_t x); // in[17:2] = x
  • Float SEM construction
    • Only 32b float supported right now, can easily add other widths
    • float float_uint<exponent_bits>_uint<mantissa_bits>(sign, exponent, mantissa)
    • Ex. float float_uint8_uint23(uint1_t sign, uint8_t exponent, uint23_t mantissa);
  • Float uint32_t construction
    • Interpret uint32_t as a 32b float
    • float float_uint32(uint32_t data);
  • Byte swap
    • Swap the byte ordering of an unsigned value
    • uintN_t bswap_<bit_width>(input)
    • Ex. uint32_t bswap_32(uint32_t input);
  • Array to unsigned
    • Concatenate elements of an array to form a single value
    • Element ordering (typically bytes) is either 'big endian' or 'little endian'
    • uintY_t uintX_arrayN_[be|le](uintX_t input[N])
    • Result width is N times input width
    • Ex. uint64_t uint8_array8_be/le(uint8_t x[8]);

Bit Math

Little helper functions for common 'math' operations. These functions are implemented in PipelineC.

  • Negate
    • intN_t <type_prefix>_negate(uintN_t input)
    • Ex. int32_t uint31_negate(uint31_t input); // Negate adds sign bit to 31b bit unsigned value
  • Absolute value
    • uintN_t <type_prefix>_abs(intN_t input)
    • Ex. uint32_t int32_abs(int32_t input); // Absolute value removes sign bit
  • N Mux
    • Binary tree of multiplexers selecting a single value from N values
    • <type> <type_prefix>_muxN(select, input0, input1, ... inputN);
    • Ex. uint8_t uint8_mux4(uint2_t select, uint8_t input0, uint8_t input1, uint8_t input2, uint8_t input3);
  • Count zeros starting from upper/left bits
    • <count_type> count0s_<type_prefix>(type)
    • Ex. uint3_t count0s_uint7(uint7_t data); // Max zeros possible is 7
  • N Binary Operations
    • Binary tree of binary operations.
    • Only AND, OR, SUM(add) supported right now
    • Only float and u/intN_t types supported.
    • <output_type> <type_prefix>_<operation>N(input0, input1, ... inputN);
    • Ex. uint5_t uint3_sum3(uint3_t input0, uint3_t input1, uint3_t input2);
      • uint3 max = 7, 7*3 = 21, stored in uint5_t
    • Arrays are supported as well: Ex. uint5_t uint3_array_sum3(uint3_t input[3]);

Fixed Size Array Types

Any type can be generated into fixed size arrays of that type by including a specific header file. This is mostly to support returning fixed sized arrays from C functions. Ex.

typedef struct point
{
  uint8_t x;
  uint8_t y;
} point;
#include "point_array_N_t.h"
// Types like this are generated for you
typedef struct point_array_3_t
{
  point data[3];
} point_array_3_t;

Casting to and from bytes

Any type can be converted to and from byte arrays by including a specific header file. This is mostly to support moving C structs from software C to PipelineC buffers. Both PipelineC and regular C code (i.e. using pointers instead of fixed size buffers) is generated. Ex.

typedef struct point ... ;
#include "point_bytes_t.h"

// A header like this is generated for you
#define point_bytes_t uint8_t_array_2_t  // 2 bytes, auto gen fixed size array struct
#define point_size_t uint2_t // 0-2
point_bytes_t point_to_bytes(point x);
point bytes_to_point(point_bytes_t bytes);
// And similar functions using pointers for real C code

Clock Crossings

Clock crossing code is generated by including a specifically named header file. Ex.

message_t in_msg;
#include "clock_crossing/in_msg.h"

The READ and WRITE function signatures generated in that file depend on how and where the READ and WRITE functions are used. See clock domain crossing documentation here.

RAMs

  • Described as a global variable (or static local) array with an element type and dimensions
  • Ex. elem_t the_ram[8][8][8][8]; // 4096 elements
    • Four 3b indices/addresses is concatenated to form 12b address
  • Access to the RAM takes the form of global functions acting on that storage data
    • elem_t <var_name>_<RAM_type>(address0,...,addressN, write_data, write_enable)
    • Input is read/write information and output is read data
    • RAM types:
      • SP_RF_0: Single port, read first, 0 clock
      • SP_RF_2: Single port, read first, 2 clock (in and out registers)
      • DP_RF_2: Dual port, read first, 2 clock (in and out registers)
    • Ex. elem_t the_ram_RAM_SP_RF_2(uint3_t addr0, uint3_t addr1, uint3_t addr2, uint3_t addr3, elem_t write_data, uint1_t write_enable);
      • Reads and writes are completed/returned on the _2 second iteration/call of the function.
    • Single Port:
      One 36 Kb primitive to fit the elem_t=uint8_t 4Kbs per the above example
      Report Cell Usage: 
      +------+-----------+------+
      |      |Cell       |Count |
      +------+-----------+------+
      |1     |BUFG       |     1|
      |2     |RAMB36E1_1 |     1|
      |3     |FDRE       |    29|
      |4     |IBUF       |    22|
      |5     |OBUF       |     8|
      +------+-----------+------+
      
      #include "uintN_t.h"
      #define elem_t uint8_t
      #pragma MAIN_MHZ main 100.0
      elem_t main(uint3_t addr0, uint3_t addr1, uint3_t addr2, uint3_t addr3, elem_t write_data, uint1_t write_enable)
      {
        static elem_t the_ram[8][8][8][8]; // 4096 elements
        return the_ram_RAM_SP_RF_2(addr0, addr1, addr2, addr3, write_data, write_enable);
      }
    • Dual Port
      #include "uintN_t.h"
      #define elem_t uint8_t
      #pragma MAIN_MHZ main 100.0
      elem_t main(
        uint3_t addr_r0, uint3_t addr_r1, uint3_t addr_r2, uint3_t addr_r3, // Read port
        uint3_t addr_w0, uint3_t addr_w1, uint3_t addr_w2, uint3_t addr_w3, elem_t write_data, uint1_t write_enable) // Write port
      {
        static elem_t the_ram[8][8][8][8]; // 4096 elements
        return the_ram_RAM_DP_RF_2(addr_r0, addr_r1, addr_r2, addr_r3,
                addr_w0, addr_w1, addr_w2, addr_w3, write_data, write_enable);
      }

DSPs

By default the PipelineC tool will infer pipelined multipliers from the * operator. In VHDL the multiply operator is surrounded by input and output registers as needed, allowing the synthesis tool to infer as much DSP primitive pipelining as possible. This can be changed globally with the --mult command line option. See -h for more information.

On a per function basis to disable use of inferred multipliers and instead use FPGA fabric implementations use the FUNC_MULT_STYLE pragma to set fabric multiplier style. Ex.

#pragma FUNC_MULT_STYLE mult fabric  // default: infer
uint32_t mult(uint16_t x, uint16_t y)
{
  return x * y;
}

Operator Overloading

See operator name constants near the top of C_TO_LOGIC.py. Make sure to use argument names like left, right, and expr. Finally, multiplies are distinguished by if they use inferred hard DSP blocks or are implemented in FPGA fabric, see below examples.

// An example user type
typedef struct complex
{
  float re;
  float im;
}complex;

// Override the '+' operator
complex BIN_OP_PLUS_complex_complex(complex left, complex right)
{
  complex rv;
  rv.re = left.re + right.re;
  rv.im = left.im + right.im;
  return rv;
}
complex BIN_OP_PLUS_complex_float(complex left, float right)
{
  complex rv;
  rv.re = left.re + right;
  rv.im = left.im + right;
  return rv;
}
// Override the '*' operator
// Implementation to use when inferred DSPs for multiplies are used
complex BIN_OP_INFERRED_MULT_complex_complex(complex left, complex right)
{
  complex rv;
  rv.re = left.re * right.re;
  rv.im = left.im * right.im;
  return rv;
}
// Implementation to use when multipliers are implemented in logic/FPGA fabric (same)
complex BIN_OP_MULT_complex_complex(complex left, complex right)
{
  complex rv;
  rv.re = left.re * right.re;
  rv.im = left.im * right.im;
  return rv;
}
// Override the '!' operator
complex UNARY_OP_NOT_complex(complex expr)
{
  complex rv;
  rv.re = expr.re * -1.0;
  rv.im = expr.im * -1.0;
  return rv;
}

Pragmas

See the pragmas page.

VHDL Escape Hatch

Write raw HDL if you must.

Clone this wiki locally