-
Notifications
You must be signed in to change notification settings - Fork 56
C Puzzle
Imagine a sequence of zeros and ones: 1,1,0,1,0,0,1 etc
Binary bytes (ASCII characters) have been encoded into the sequence based on some given integer constant N(ex.=1,2,etc) as follows:
- The sequence begins with 2N or more 1's
- Upon seeing the first 1->0 transition skip the next 3N-1 values
- The value arrived at is the first and least significant bit in the byte
- The next seven more-significant bits of the byte are obtained by skipping every 2N-1 values
- After the 8th bit completing the byte, repeat the above for next byte, beginning with looking for the 1->0 transition
Fill in this C function such that it returns zero/NULL until a full eight bit byte/character is parsed.
- The function is run once for each element in the sequence in order
- Only add code inside the function
- Only return a completed non-zero/null character once
- Use a single return statement at the end of the function
- No global variables, no pointers, pass by value only
- No libraries, no OS functionality, etc
- Use
+=1or-=1instead of++or-- - Use
bool|u/int_ttypes - Hint: static local variables maintain state
uint8_t the_function(bool seq_val){
uint8_t return_value = 0;
// CODE HERE
// One return statement at end of function
return return_value;
}You can compile and run your version of the_function (along with a working solution) as shown in c_puzzle.c:
Set #define N to either 1 or 2. The input_values array sequences encodes two characters "Hi".
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#define N 2 // Some integer ex. 1,2
#define SKIP3 ((3*N)-1)
#define SKIP2 ((2*N)-1)
uint8_t the_function(bool seq_val){
// Hint: static uint8_t char_buffer; // Buffer to hold eight bits
uint8_t return_value = 0;
// CODE HERE
// One return statement at end of function
return return_value;
}
// Wrapper function for unrelated reasons...
void testbench(){
// These input_values sequences encode "Hi" with N=1 or N=2
#if N==1
// N=1
#define TEST_SIZE 45
bool input_values[TEST_SIZE] = {
1, // 2*N=2 or more 1's to start
1, // Ex. 3
1,
//
0, // The 1->0 transition \/
0, // Skip 3*N - 1=2
// 'H' = 0x48 = 0b01001000
0, // Skipped
0, // Bit[0]
//
0, // Skip 2*N - 1=1
0, // Bit[1]
//
0, // Skip 2*N - 1=1
0, // Bit[2]
//
1, // Skip 2*N - 1=1
1, // Bit[3]
//
0, // Skip 2*N - 1=1
0, // Bit[4]
//
0, // Skip 2*N - 1=1
0, // Bit[5]
//
1, // Skip 2*N - 1=1
1, // Bit[6]
//
0, // Skip 2*N - 1=1
0, // Bit[7]
//
1, // 2*N=2 or more 1's
1, // Ex. 3
1,
//
0, // The 1->0 transition \/
0, // Skip 3*N - 1=2
// 'i' = 0x69 = 0b01101001
1, // Skipped
1, // Bit[0]
//
0, // Skip 2*N - 1=1
0, // Bit[1]
//
0, // Skip 2*N - 1=1
0, // Bit[2]
//
1, // Skip 2*N - 1=1
1, // Bit[3]
//
0, // Skip 2*N - 1=1
0, // Bit[4]
//
1, // Skip 2*N - 1=1
1, // Bit[5]
//
1, // Skip 2*N - 1=1
1, // Bit[6]
//
0, // Skip 2*N - 1=1
0, // Bit[7]
//
1, // 2*N=2 or more 1's
1, //
1
};
#endif
#if N==2
// N=2
#define TEST_SIZE 87
bool input_values[TEST_SIZE] = {
1, // 2*N=4 or more 1's to start
1, // Ex. 5
1,
1,
1,
//
0, // The 1->0 transition \/
0, // Skip 3*N - 1=5
0, // Skipped
0, // Skipped
// 'H' = 0x48 = 0b01001000
0, // Skipped
0, // Skipped
0, // Bit[0]
0, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[1]
0, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[2]
0, // Skip 2*N - 1=3
//
1, // Skipped
1, // Skipped
1, // Bit[3]
1, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[4]
0, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[5]
0, // Skip 2*N - 1=3
//
1, // Skipped
1, // Skipped
1, // Bit[6]
1, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[7]
0,
//
1, // 2*N=4 or more 1's
1, // Ex. 5
1,
1,
1,
//
0, // The 1->0 transition \/
0, // Skip 3*N - 1=5
0, // Skipped
0, // Skipped
// 'i' = 0x69 = 0b01101001
1, // Skipped
1, // Skipped
1, // Bit[0]
1, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[1]
0, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[2]
0, // Skip 2*N - 1=3
//
1, // Skipped
1, // Skipped
1, // Bit[3]
1, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[4]
0, // Skip 2*N - 1=3
//
1, // Skipped
1, // Skipped
1, // Bit[5]
1, // Skip 2*N - 1=3
//
1, // Skipped
1, // Skipped
1, // Bit[6]
1, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[7]
0,
//
1, // 2*N=4 or more 1's
1, // Ex. 5
1,
1,
1
};
#endif
static uint32_t seq_index = 0;
uint8_t the_char = the_function(input_values[seq_index]);
if(the_char != 0){
printf("the_char = %c\n", the_char);
}
seq_index += 1;
}
// gcc c_puzzle.c -o c_puzzle && ./c_puzzle
int main(int argc, char** argv)
{
int i = 0;
while(i<TEST_SIZE)
{
testbench();
i++;
}
return 0;
}Expected output:
the_char = H
the_char = i
Why is this a page on a hardware description language wiki?
Read this and see if it sounds familiar at all 😏
This puzzle is the same problem present in digital universal asynchronous receivers(/transmitters) UART. N=one half of one bit period in time. 2N=uart bit period, 3N is skipping 1.5 bits to get into center of bit for sampling (minus 1 is for counting/off by one reasons). Units are the clock cycle time (ex. 100MHz = 10ns per tick, per run of function).
C code in c_puzzle.c written to solve this puzzle can be used with the PipelineC tool (see comments in code) to simulate and synthesize for FPGAs + ASICs.
Looking at the examples/c_puzzle.c/the_function/pipeline_map.gv.png dataflow graph you can see how each operation maps to which locations in the C code. The size of each operation's 'box'/node in the graph is scaled to the time+space requirement of the computation ~= combinatorial delay x a rough estimate of size based on number of IO wires (TODO measuring/optimizing for fewer registers/LUTs).
This solution was kindly submitted by Discord|GitHub user siddhpant and was not intended to target hardware, so it makes a great experiment for evaluating 'what is natural for software folks to write' vs. 'what hardware it makes':
Dataflow:
In the the above dataflow graph, some of the larger and first items you might see are:
-
BIN_OP_SL_uint8_t_uint32_tline 1124.4ns -
BIN_OP_PLUS_int32_t_int33_tline 1092.9ns
After synthesizing for an example Xilinx Artix 7 part, Vivado logs produced by the tool show how the_function has a maximum operating frequency of ~142 MHz:
Report Cell Usage:
+------+-------+------+
| |Cell |Count |
+------+-------+------+
|1 |CARRY4 | 43|
|2 |LUT1 | 5|
|3 |LUT2 | 2|
|4 |LUT4 | 18|
|5 |LUT5 | 40|
|6 |LUT6 | 55|
|7 |FDRE | 113|
+------+-------+------+
Source: the_function_0CLK_a6fd0288/bit_pos_reg[4]/C
Destination: return_output_output_reg_reg[0]/R
Data Path Delay: 6.522ns (logic 2.414ns (37.013%) route 4.108ns (62.987%))
Logic Levels: 9 (CARRY4=6 LUT4=1 LUT5=1 LUT6=1)
The reported critically long delay path is from the bit_pos register to the return output. This can be confirmed by tracing the path in the dataflow from the bit_pos | NOW current value of the bit_pos register through the noted-as-large BIN_OP_SL_uint8_t_uint32_t line 112 4.4ns node all the way to the right OUT | return_output return value of the function.
Based on the results below for the original solution, these are the recommended changes for a better hardware result:
- Don't use a variable-shifter
<<, ex.letter |= seq_val << bit_pos;- Instead of shifting
seq_valby some 'run-time' value = bit_pos, it uses less resources to shift by a constant amount for each bit (one shift per bit in letter).
- Instead of shifting
- Use smaller width integer types
- Of course on a CPU this barely matters - but in hardware every bit counts, can reduce counter sizes to as small as
Nallows.
- Of course on a CPU this barely matters - but in hardware every bit counts, can reduce counter sizes to as small as
This solution was kindly submitted by Discord user can.l, as an almost adversarial example of very concise C code. The code was broken apart into multiple lines for PipelineC + readability reasons.
Dataflow:
After synthesizing for an example Xilinx Artix 7 part, Vivado logs produced by the tool show how the_function has a maximum operating frequency of ~223 MHz:
Report Cell Usage:
+------+-------+------+
| |Cell |Count |
+------+-------+------+
|1 |CARRY4 | 2|
|2 |LUT1 | 1|
|3 |LUT2 | 10|
|4 |LUT3 | 5|
|5 |LUT4 | 5|
|6 |LUT5 | 15|
|7 |LUT6 | 22|
|8 |FDRE | 33|
+------+-------+------+
Source: the_function_0CLK_f98a8a12/b_reg[0]/C
Destination: the_function_0CLK_f98a8a12/s_reg[2]/D
Data Path Delay: 4.429ns (logic 1.917ns (43.283%) route 2.512ns (56.717%))
Logic Levels: 6 (CARRY4=2 LUT2=1 LUT4=1 LUT5=1 LUT6=1)
FMAX ~= 223MHz
This next solution was submitted by PoignardAzur and has the highest estimated operating frequency (i.e. shortest critical path length) of all solutions thus far.
Dataflow:
After synthesizing for an example Xilinx Artix 7 part, Vivado logs produced by the tool show how the_function has a maximum operating frequency of ~311 MHz:
Report Cell Usage:
+------+-----+------+
| |Cell |Count |
+------+-----+------+
|1 |LUT2 | 3|
|2 |LUT3 | 7|
|3 |LUT4 | 12|
|4 |LUT5 | 11|
|5 |LUT6 | 15|
|6 |FDRE | 35|
+------+-----+------+
Source: the_function_0CLK_8f4d4fc5/remaining_reg[5]/C
Destination: return_output_output_reg_reg[0]/D
Data Path Delay: 3.158ns (logic 0.999ns (31.634%) route 2.159ns (68.366%))
Logic Levels: 3 (LUT4=1 LUT6=2)
FMAX ~= 311MHz
The critical path has is shown to be from the remaining count storage register to the returned output character (a generated output register).
Here is a short ranking of how slow operations are, fastest first, prefer selecting from the top of the list:
- Constant amount bit shifts / selecting some constant Nth bit / constant index local variable array/struct access
- Bitwise operators (&,|,etc)
- Multiplexing (if-else logic)
- Integer add/sub
- Integer mult
- Variable amount bit shifting (barrel shifters)
- Random array access
- Float mult
- Float add
- Integer divide
- Float divide
Dataflow:
In the the above dataflow graph, some of the larger and first items you might see are:
-
BIN_OP_MINUS_uint8_t_uint1_tline 882.3ns -
BIN_OP_EQ_uint8_t_uint3_tline 782.3ns
This version of the_function has a maximum operating frequency of ~290 MHz:
Report Cell Usage:
+------+-----+------+
| |Cell |Count |
+------+-----+------+
|1 |LUT3 | 1|
|2 |LUT4 | 3|
|3 |LUT5 | 5|
|4 |LUT6 | 14|
|5 |FDRE | 29|
+------+-----+------+
Source: the_function_0CLK_83e31706/counter_reg[0]/C
Destination: return_output_output_reg_reg[0]/R
Data Path Delay: 2.925ns (logic 0.875ns (29.915%) route 2.050ns (70.085%))
Logic Levels: 2 (LUT4=1 LUT6=1)
The reported critically long delay path is from the counter register to the returned output. This can be confirmed by tracing the path in the dataflow from the counter | NOW current value of the counter register through the noted-as-large BIN_OP_MINUS_uint8_t_uint1_t line 88 2.3ns node all the way to the right OUT | return_output return value of the function.







