Here are my solutions to the "Advent of code" challenge of 2021 implemented in bash. See https://adventofcode.com/2021
I tried to implement "smart" bash solutions (see at the end, "Algorithmic tricks"), that rely on fast GNU/Linux utilities like grep, sed, sort... (the way I use bash in real life), but most of the time my goal was readability and compliance with shellcheck, not terseness or efficiency. I am an old (retired) proficient bash programmer, so my goal here is not to learn bash, but rather un-learn some bad dirty habits accumulated over the years and force myself to code in a "shellcheck-friendly" modern way. Only once (for day 24) I resorting to having the bash script generate C code to compile a brute force calculation part.
Although I invented, designed and implemented a commercial programming language (The SML - System Management Language - in the Bull ISM Network Management platform), and worked professionnally with "real" languages, I grew in love with bash because the intellectual challenges it poses to write efficient code, making mundane tasks exciting, and that it (nearly) never breaks backwards compatibility, a code that runs now will still run in 20 years and more... And since I retired in 2021 I now have time to play with things, and discovered the "Advent of code" challenge.
Usage:
- Everything below is in the
days/subdirectory, with no subdirectories. All files are in this directory, no subdirs. I hate navigating a maze of small directories, all alike. - Day
NNsolutions are scriptsdNN-1.shanddNN-2.shfor problem 1 and 2. E.g.d05-2.sh - Run them without input data file as argument (defaults to
dNN.input). E.g.d14.input - Input data is in
dNN.input. Note that the data may be different for different accounts on the web site. - The small sample input in the problem text is in
dNN.example - So running all the codes can be done by:
for i in d*.sh; do ./$i; done(or viaEXECALL) - There is nearly no error checking, to keep the code small and readable. We only check for non-obvious errors.
- Alternate solutions are sometime left for reference in files
dNN-otherM.sh, e.g:d06-other2.sh
Notes:
- These scripts need bash version 4.4+, and GNU linux utilities, so they will not run on non-GNU unix systems such as vanilla MacOS or FreeBSD.
- Scripts should pass shellcheck: run
CHECKALL, or:for i in d*.sh; do shellcheck -f gcc $i; done - When temporary files are used, they are cleaned with a clean function called on any exit with trap 0
- If we detect errors, they are listed on stderr, with the function
err - Simple regression tests embedded in the scripts with the comments
#TEST: input-name expected-valuecan be run with the command TESTALL. Usage by./TESTALL.
Style:
- We use
(( ... ))and[[ ... ]]rather than let and[ ... ] - We put all variable evaluations in double quotes:
"$foo" - We use
$( ... )and never backquotes - We avoid forking subprocess, by using
bar < <(foo)instead offoo | bar. This waybaris not run in a separate variable scope fromfoo.
At the start of the scripts, some convenient code can be found if needed.
It was generated by creating test files for day N by running: MAKEDAY N
Set variable in to the input file argument, with dNN.input as default for script dNN.sh and exit in error if the file does not exists.
in="${1:-${0%-[0-9].*}.input}"; [[ -e $in ]] || exit 1
Raise a fatal error:
err(){ echo "***ERROR: $*" >&2; exit 1;}
Use temporary files $tmp or any prefixed by $tmp. and delete them on any exit
tmp=tmp.$$; clean(){ rm -f "$tmp" "$tmp".*;}; trap clean 0
Optional regression tests via #TEST: input-name expected-value comments
- The classic ones:
grep,sed,sort,uniq,trNote that the perl-compatiblegrep -oPis only available in GNU grep, and is very useful in bash scripts. revreverses the parameters:echo a1 b2 c3 | revprints3b 2c 1a
- https://adventofcode.com/ The "advent of code" (aka AOC) website
- https://github.com/Bogdanp/awesome-advent-of-code Bogdan Popa list of AOC-related resources and solutions
- https://www.reddit.com/r/adventofcode/ The reddit to discuss AOC
- Big inputs for AOC 2021 A collection of huge inputs to stress your solutions
Author: (c)2021 Colas Nahaboo, https://colas.nahaboo.net License: free of use via the MIT License.
These are the execution times in seconds of the second exercises of each day on their full inputs, done with the script BENCHALL -l.
| test | time | bar graph of times (logarithmic) |
|---|---|---|
| d01 | 0.025 | ------------- |
| d02 | 0.005 | ------ |
| d03 | 0.086 | ------------------- |
| d04 | 0.238 | ~~~~~~~~~~~~~~~~~~~~~~~ |
| d05 | 0.727 | ++++++++++++++++++++++++++++ |
| d06 | 0.052 | ----------------- |
| d07 | 7.717 | ====================================== |
| d08 | 0.226 | ~~~~~~~~~~~~~~~~~~~~~~~ |
| d09 | 0.395 | ~~~~~~~~~~~~~~~~~~~~~~~~~ |
| d10 | 0.108 | ~~~~~~~~~~~~~~~~~~~~ |
| d11 | 0.370 | ~~~~~~~~~~~~~~~~~~~~~~~~~ |
| d12 | 12.202 | ######################################## |
| d13 | 0.591 | +++++++++++++++++++++++++++ |
| d14 | 0.092 | ------------------- |
| d15 | 989.442 | ########################################################### |
| d16 | 0.171 | ~~~~~~~~~~~~~~~~~~~~~~ |
| d17 | 12.453 | ######################################## |
| d18 | 78.641 | ################################################ |
| d19 | 16.839 | ########################################## |
| d20 | 64.393 | ################################################ |
| d21 | 8.028 | ======================================= |
| d22 | 520.855 | ######################################################### |
| d23 | 1078.000 | ############################################################ |
| d24 | 2494.000 | ############################################################### |
| d25 | 61.705 | ############################################### |
Legend:
0s < ------ < 0.1s < ~~~~~~ < 0.5s < ++++++ < 1s < ====== < 10s < ######
For some problems, solving the problem the straightforward way is too slow in Bash. So I have used algorithmic tricks in some solutions to stay under a minute execution time, and if possible, a second. I have commented them in the scripts, but here they are, collected for reference.
I solve all the problems myself first, without looking for any hint, but afterwards, I often look for other interesting solutions on the web, that I list under the label "Also:".
To split a binary number into an array digits of its bits, in reverse (little endian) order, I use: rev < "$in" | sed -e 's/./\0 /g | read -a digits'
revto reverse the characters of the input string which is the binary numbersedto split the number into a space-separated list of bitsread -ato create an array of these bits
Naive first version, using files (d04-files-1.sh and d04-files-2.sh)
In a textual representation of a board as lines of space-separated numbers, detecting an empty row is just grep-ing for empty lines. I thus used files containing the board followed by its "inverted" version, each line being a column instead of a row, so that looking for empty rows or columns is done by a grep of empty lines.
I also pad the lines of numbers by spaces before and after, so I can perform a sed to remove the drawn number N in all files by replacing {space}N{space} by {space} without taking into account the special case of the first or last number in the line.
The code is generic and has variables to adjust the size of boards.
Current fast version
I then concatenated all the tables in a big array "rows" in memory instead of having files. Each board thus occupied 10 consecutive elements of it, 5 strings for each row and 4 for each column. The strings were in the form: <index-in-rows: n1 n2 n3 n4 n5 > so that I could apply the ultrafast methods to:
-
remove the drawn number
Non all the boards:rows=("${rows[@]// $N / }") -
check on all the boards for a cleared row or column:
[[ "${rows[@]}" =~ \<([[:digit:]]+):\ *\> ]]With the index of the matching row inrows[]being thus in${BASH_REMATCH[1]}
Also: I have seen on the Reddit post Big inputs for AOC 2021 a smart hack by "p_tseng":
For each board, figure out what time it wins at. By building a map of called number -> time it's called, you can determine this very quickly. For each board, take maximum across each row/column, take minimum of those maxima.
Also: "einarjon" has a much faster bash solution, by doing everything in memory with arrays and strings, and replacement in strings without changing their lengths. It inspired me to come up with my current fast version.
Creating a (huge) 2-dimensional aray and plotting the lines in it would be too slow in bash, so we just list the lines points one by one, each on a line in a sequential file. The points that are thus parts of more than one line are simply the coordinates that appears multiple times in the file! So a simple classic sort|uniq -d can find them quickly. This is a huge gain in efficiency, both in time and space, for data with few lines sparse in a big space as the input files are.
There is no way we can solve this problem with an exponential algorithm, especially with the 256 number of steps of the second exercise. So we use a linear approach where we do not actually generate a representation of individual fishes, but just keep track of the number of newborns that will be born each day in the future. When we add a fish, either at the start or because it is born on the day, we just increment the count of future newborns at the days its timer will reach zero.
We can then just iterate on the days and add the its newborn fishes this way. Details in the comments of d06-1.sh.
I have kept some attempts that were correct but too slow for reference. They provided quickly some prototypes, unusable for the "production" use of tackling the 256 steps in reasonable time, but simple enough to give correct answers to test the validity of the final solution.
d06-other1.shthe straightforward version implementin naively the explanations.d06-other2.sha version trying to capitalize on the speed of grep with a full representation of the fishes in a file, but inverted.
Also: Most people on reddit used a similar approach of only having a set of counters, but instead of having one count per day like me, they just keep for all the timer values (0 to 8) the count of how many fishes have this timer. And they "rotate the bins" each day. See the solutions megathread for day 6
Nothing special, but:
Also: "throwaway7824365346" has written a math paper on the problem: https://www.reddit.com/r/adventofcode/comments/rawxad/2021_day_7_part_2_i_wrote_a_paper_on_todays/
Also: "einarjon" has a much faster bash solution
The second part was quite tricky. The trick was to detect the four digits 2 4 7 8 that have unique "word" lengths, and then distinguish the digits with lengths 5 and 6 by their intersections with (number of characters not in) the 2 and 4 digits.
Also: I didnt realise that the part before the left bar consisted of exactly the 10 digits. With this insight, "mnufat17" designed this nice solution. My solution worked for a variable number of inputs, and is the same as the first comment on the above post by "bunceandbean".
The naive approach to expanse the polymer is exponential.
Since the computation at each step is done pair per pair, independently, we can just on each step compute the new set of pairs and number letters used in the polymer to linearize the solution.
dq14-1.sh is the naive, not scalable approach, and d14-2.sh implements the efficient linear one.
The problem is interesting because it requires processing the SailFish Numbers both as flat strings (for the add, reduce, and split operations), and as hierachical trees (magnitude computation). I started implementing a general tree structure in d18-1.sh, with string data on a heap, (wrongly) anticipating that part 2 would consist in tree processing. But no, so I kept the tree+heap structure in d18-1.sh for reference, and implemented a simple one shot tree walk for computing mangnitudes in d18-2.sh, without actually building the tree structure, to same some time (about 8%) and complexity.
I used a distance that was the concatenation of 3 distances that are independant of rotations and translations: sum of the distances on the axes (aka: manhattan), sum of the squares of them, and product of them. This way I could find the matching pairs via their distances.
It was a trick question, as the "image enhancement algorithm" started with a 0 in the example, but with a 1 (a #) in the input. Thus all the points away from the image would be surrounded by 0 and the algorithm would set them on 1 on next step. But back to 0 on next step if the result was not to be infinite. So the computation would have to "pad" around the image with sufficient space for "simulating" and infinity, and only compute the parts that may have an impact on the "main" image after N steps.
And since the main image area of influence grows by 1 at each step, the final area to count the pixels wil be the original image plus N pixels on each side. And we need to add also N pixels padding to take into accoun the possible effect of the faw away pixels. I means that when we process an image of size S, we insert it into a blank image of size (S + 2 * 2 * steps), apply steps times the algoritm, and count the pixels only in a area of size (S + steps).
I implemented part in with a bands + painting algorithm:
-
I only consider points at the border of any cube, and I work in a coordinate system of only these points. I call XYZ this system of "bands" between cube borders and xyz the original coordinates
-
I then consider the boot orders as layers of paint, so I only peek at the latest ones to get the state of a point. I implemented
d22-1.shby explicitely creating the XYZ space with origin -50 so that all XYZ could be used as array indexes since positive. Then I made a variantd22-1-alt1.shwithout this constraint, to anticipate the second part. Alas, this same algorithm, ind22-1-alt1.shwas not scaling enough to tackle the full input. I this codedd22-2.shby the intersection of cuboids methods as described by "aexl" in the reddit megathread for Day 22:Algorithm: Parse the input cuboids. Then start with an empty list (clist) of cuboids. For each cuboid from the input, calculate the intersections with the cuboids in the clist. If the intersection is non-empty add it to the clist with inverted sign w.r.t to the current cuboid in clist. If that cuboid from the input is set on, add it to the clist as well. Then add together all the volumes (cuboids with a negative sign will have a negative volume).
This was the hardest puzzle for me. But I took the opportunity to properly implement the A-Star graph search algorithm, something that I had never done. I thus tried to have a quite generic code tha can be used for any room depth.
What I ended up doing is to pre-compute the topological properties of the graph of the possible moves bewteen states, with "rules" attached to positions. For instance, to find neighbor states, I create a table mapnexts that for all position gives the list of possible moves, each being a path (a list) of places, with the cost associated (the path length). To find the places deeper in the room, I have a deeper_rooms table, etc...
I represent a state as a comma-separated list of {pod class},{place number} always ordered, such as A14,A17,A19,A22,B7,B9,B13,B16,C8,C12,C18,C21,D10,D11,D15,D20 for the example. I hesitated with a more compact representation such as ...........BCBDCADCA, which may have been better or worse. But I do not feel the courage to try to re-code it this way.
As I had a hard time debugging my code (I was really feeling the pain of the lack of structures in bash) I resorted to downloading a solution (in python), not looking at its code, but using it to generate test cases for evaluating my code against.
- I first implemented a straightforward interpreter of the machine code in
d24-1-alt1.shbut it was too slow to complete in months - Then I translated the machine code into bash arithmetic expressions, and implemented a crude symbolic optimisations based on the max and min possible values of the symbols, in
d24-1-alt2.sh. Suprisingly, it was even slower than before. However, the bash arithmetic is exactly the same as the C one, so it was trvial to just use the computed bash expressions to create a C version. Alas, it was still too slow. The expression was 1 megabyte of text, I guess too much to process efficiently. - I had to understand the input to use its peculiarities, I was not going to solve the general case. I found that the programe was actually a serie of 14 distinct nearly identical sections of 18 instructions, each one working only the current input digit of the model number, and the Z value of the previous section. I thus separated the code in 18 similar simple computations steps of the form
z[n+1] = function-of(z[n], input-digit[n]). Still too slow in bash, but this time the C translation was able to complete in less than an hour, so I decided to stop there. - More optimisations could be done by abstracting more the code into some stack machine as is done in the reddit thread, but I will leave this as an exercise to the reader.
In the end, one could say I cheated by using C code, but it is debatable, as prototyping in shell and only rewriting too slow parts in C is a traditional unix culture of mixed programming, and moreover the C code is using the exact arithmetic syntax of the bash version, just copied untranslated. In a way I just used C as a ... bash compiler. I have left the generated C code, d24.c in the sources so that you can see what it is like, but it is re-generated, compiled and run on the fly by d24-1.sh and d24-2.sh.
This Advent of Code was quite fun, and quite instructive. What I learned:
- Coding in bash is not so bad, even if it can border on insanity at times :-).
- Modern bash features are often overlooked but very useful.
- Passing shellcheck should be a mandatory goal for each bash programmer. I resented it at first since I thought it was adding unecessary syntaxic sugar to my code until I realized that it was a symptom that my coding style was the problem.
- Use
[[...]]for strings and((...))for integers other any of the legacy constructs like[...],test, etc... The code is then much cleaner and safer (and a tad faster), as you do not have to quote as much. E.g:[[ -z foo ]]instead of[ -z "$foo" ], or even((i=j))instead ofi="$j"- but, this makes traditional debugging with
set -xless useful as the values of variables are not displayed anymore. E.g if j is 2, the tracing ofi="$j"showsi=2whereas the tracing of((i=j))only shows((i=j)). This could be where a bash debugger would be useful, but I know only one, bashdb, and it does not seem updated frequently, but I could find a beta version that I could compile with bash 5.1. Thetrap DEBUGtrick can also be useful,. - So, I tend to write now
i=$((j+k))while developing code, and maybe later for production switch to the a bit more efficient((i = j+k))
- but, this makes traditional debugging with
- I avoided arrays in bash because I found out they were abysmally slow when first introduced, to the point that managing data in files with grep, sed, ... was actually faster than using arrays. But not anymore! Bash arrays should now be used as much as possible.
- A lot of bash functions can "map" on arrays. For instance
${tab[@]//x/y}will string-replace x by y in all the elements of the array tab, and is super fast. - Use arrays rather than the classic way to represent lists in bash by space-separated (or tab-separated) substrings in a string.
- Bash can have typed variables: integer ones via
declare -iorlocal -i, and using them makes your code safer. - Bash functions can be passed variables by name, useful for efficiency to avoid copying big arrays or strings, and to provide multiple return values by modifying passed variables. But it cannot recurse as it is not a passing by reference, but by name.
- Working with arrays makes using
$(...)impractical, as commands are executed in a subshell and cannot access arrays anymore to update them in the parent shell. So I tend to pass the return value(s) into global variables of the same name of a function. E.g. instead ofx=$(foo), I writefoo; x="$foo". Or pass variables by name to set them if I want to return multiple results. - To parse a space-separated string, the fastest is using
setto map the elements in the positional parameters$1,$2, ... then the${string#* }and${string% *}operators are the fastest, closely followed by a read, the full[[ $string =~ ([-[:digit:]]+)[[:space:]]... ]]being 3 times slower. And if possible, using indexes is even faster:${string:i:j}. - Use the faster
$(< filename)instead of$(cat filename). - To copy an associative array A1 to A2 in bash 4.4+, do:
A1_def=$(declare -p A1) && declare -A A2="${A1_def#*=}" - To access more than 9 parameters in a function: use braces:
$10wont work, but${10}does.