Skip to content

Commit b3a9964

Browse files
aryxclaude
andcommitted
docs: add bytecode generation chapter explanations in Shell.nw
Add %claude: blocks for status handling (exits(), truestatus, pipeline status, rc vs bash design), simple commands transition, AST/bytecode diagram, fork optimization, redirection section (Redir struct, doredir reversal, bytecode scoping, Xwrite, Xpopredir), and pipe section (4-process ASCII diagram, Xpipewait flow). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 933e45b commit b3a9964

File tree

1 file changed

+185
-18
lines changed

1 file changed

+185
-18
lines changed

shells/Shell.nw

Lines changed: 185 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -4351,12 +4351,14 @@ if(p==nil)
43514351
% was $$ in bash
43524352
%claude:
43534353
The exit status of the last command is stored in the
4354-
[[\$status]] variable. [<setstatus()>] sets it, and
4354+
[[$status]] variable. [<setstatus()>] sets it, and
43554355
[<getstatus()>] and [<truestatus()>] read it.
43564356
%
43574357
In [[rc]], a status is a string (not an integer like in the
43584358
Bourne shell). An empty string or the string [["0"]] means success;
43594359
anything else means failure.
4360+
This mirrors the \plan system call [[exits()]] (see \book{Kernel}),
4361+
which takes a string argument rather than a numeric exit code.
43604362

43614363
<<function [[setstatus]]>>=
43624364
void
@@ -4375,6 +4377,44 @@ getstatus(void)
43754377
}
43764378
@
43774379

4380+
%claude:
4381+
The [[|]] check in [<truestatus()>] below deserves explanation.
4382+
When a pipeline like [[ls | wc]] finishes, [[rc]] combines the
4383+
exit statuses of all commands into a single string separated
4384+
by [[|]], for example [["0|0"]] if both succeed or
4385+
[["error|0"]] if the first fails.
4386+
So [<truestatus()>] considers a status ``true'' if it contains
4387+
only [[0]] and [[|]] characters---meaning every stage of the
4388+
pipeline succeeded.
4389+
%
4390+
Why not reduce the pipeline status to a single value earlier?
4391+
Because keeping the full string lets the user inspect [[\$status]]
4392+
and see exactly which stage failed (e.g., [["0|error|0"]]
4393+
tells you the middle command had a problem).
4394+
With numeric exit codes (as in \unix), reducing is easy: just
4395+
take the maximum or the last non-zero value.
4396+
But with string statuses, there is no natural way to merge
4397+
[["can't open"]] and [["0"]] into a single meaningful string,
4398+
so [[rc]] preserves them all.
4399+
In [[bash]], [[\$?]] only reports the exit status of the
4400+
{\em last} command in the pipeline---a failure in an earlier
4401+
stage is silently lost.
4402+
[[bash]] later added the [[PIPESTATUS]] array to work around
4403+
this, but it requires explicit use.
4404+
4405+
%claude:
4406+
This is another example of [[rc]] getting the design right
4407+
from the start, while [[bash]] accumulates workarounds:
4408+
pipeline statuses are visible by default in [[\$status]]
4409+
(vs.\ [[PIPESTATUS]] added later in [[bash]]),
4410+
all variables are automatically exported via [[/env/]]
4411+
(vs.\ the error-prone [[export]] command),
4412+
[[if not]] avoids the parsing ambiguity of [[else]]
4413+
(vs.\ the heavyweight [[if...then...fi]] syntax),
4414+
and string exit statuses naturally carry diagnostic information
4415+
(vs.\ opaque numeric codes that require [[strerror()]] or
4416+
a manual to decode).
4417+
43784418
<<function [[truestatus]]>>=
43794419
bool
43804420
truestatus(void)
@@ -4465,6 +4505,12 @@ havewaitpid(int pid)
44654505
% or builtins or functions
44664506
\section{Simple commands}
44674507

4508+
%claude:
4509+
Now that we have seen the helper infrastructure---argument
4510+
stack management, process status, and subprocess tracking---we
4511+
can describe the bytecode generation and interpretation
4512+
for simple commands, which is the core of the shell.
4513+
44684514
\subsection{Bytecode generation}
44694515

44704516
% SIMPLE node built in simplemung()
@@ -4527,6 +4573,26 @@ else if(p->f==Xword || p->f==Xdelhere)
45274573
efree((++p)->s);
45284574
@
45294575

4576+
%yoann: ASCII diagram for a simple command where can see AST with ARGLIST, WORDS, etc on the left and then on the right the corresponding code generation
4577+
%claude:
4578+
For example, [[echo hello world]] produces the following
4579+
AST and bytecodes:
4580+
\begin{verbatim}
4581+
AST: codebuf:
4582+
4583+
SIMPLE Xmark
4584+
| Xword "world"
4585+
ARGLIST Xword "hello"
4586+
/ \ Xword "echo"
4587+
ARGLIST "world" Xsimple
4588+
/ \
4589+
"echo" "hello"
4590+
\end{verbatim}
4591+
%
4592+
Note how the arguments are emitted in reverse order
4593+
([[ARGLIST]] emits [[c1]] before [[c0]]), so that after
4594+
[<Xword>] prepends each word to the list, [[argv]] ends
4595+
up in the correct order: [[echo]], [[hello]], [[world]].
45304596

45314597
\subsection{[[Xsimple()]]}
45324598

@@ -4746,7 +4812,7 @@ When the command name starts with [[/]], [[./]], [[../]],
47464812
or [[#]] (device path in \plan), it is treated as an
47474813
absolute path and no search is performed.
47484814
Otherwise, [<searchpath()>] returns the list of directories
4749-
in [[\$path]] to try.
4815+
in [[$path]] to try.
47504816
%
47514817
[<Execute()>] then iterates over these directories,
47524818
prepending each to the command name and calling [[exec()]].
@@ -4898,6 +4964,18 @@ Waitfor(int pid, bool _persist)
48984964

48994965
% it is an optimisation then? mv to advanced topics?
49004966
% when have exit in codebuf?
4967+
%claude:
4968+
If the next bytecode after [<Xsimple>] is [<Xexit>]
4969+
(possibly preceded by [<Xpopredir>] calls), the shell
4970+
is about to exit anyway, so forking a child and waiting
4971+
for it is pointless. In that case, [<Xsimple>] calls
4972+
[<execexec()>] directly, replacing the current process
4973+
with the command. This is the same optimization that
4974+
the [[exec]] builtin provides, applied automatically.
4975+
This matters in practice for scripts like [[rc -c 'ls']]:
4976+
without this optimization, [[rc]] would fork, exec [[ls]]
4977+
in the child, wait, and then exit---wasting a process.
4978+
49014979
<<[[Xsimple()]] if exitnext()>>=
49024980
if(exitnext()){
49034981
/* fork and wait is redundant */
@@ -4945,7 +5023,7 @@ case ';':
49455023

49465024
%claude:
49475025
[[&&]] and [[||]] use a forward jump technique.
4948-
For [[cmd1 \&\& cmd2]], the compiler emits:
5026+
For [[cmd1 &\ cmd2]], the compiler emits:
49495027
the bytecodes of [[cmd1]], then [<Xtrue>] followed by a
49505028
placeholder jump offset, then the bytecodes of [[cmd2]].
49515029
After emitting [[cmd2]], [<stuffdot()>] patches the placeholder
@@ -5037,7 +5115,7 @@ Xbang(void)
50375115
\subsection{String matching}
50385116

50395117
%claude:
5040-
The [[~]] operator pushes two word lists onto [[argv]]
5118+
The \verb+~+ operator pushes two word lists onto [[argv]]
50415119
(the subject and the patterns), then calls [<Xmatch>].
50425120
Two [<Xmark>] bytecodes are needed because both the subject
50435121
and the patterns can be multi-word (e.g., if they come from
@@ -5159,6 +5237,20 @@ p->redir = p->startredir = runq ? runq->redir : nil;
51595237

51605238
\subsubsection{[[Redir]]}
51615239

5240+
%claude:
5241+
The [[Redir]] structure records a pending redirection: a [[type]]
5242+
(e.g., [[ROPEN]] for a regular file redirection), a [[from]] file
5243+
descriptor (the one that was opened), and a [[to]] file descriptor
5244+
(the one that should be replaced, such as 1 for stdout).
5245+
The thread's [[redir]] field points to a linked list of these records,
5246+
built up as the interpreter encounters redirection bytecodes.
5247+
%
5248+
The [[startredir]] field in [<start()>] deserves attention:
5249+
when a new thread is created, it inherits its parent's redirection
5250+
stack (so subshells see the parent's redirections), but
5251+
[[startredir]] remembers where the inherited portion ends.
5252+
This way, if an error occurs, the thread can pop only the redirections
5253+
it added, without disturbing the parent's state.
51625254
<<struct [[Redir]]>>=
51635255
struct Redir {
51645256
// enum<redirection_kind_bis>
@@ -5202,6 +5294,16 @@ pushredir(int type, int from, int to)
52025294

52035295
\subsubsection{[[doredir()]]}
52045296

5297+
%claude:
5298+
Because [<pushredir()>] prepends each new record to the front of the
5299+
list, the list is in reverse order relative to the command line.
5300+
For [[cmd > foo >[2=1]]], the list has [[>[2=1]]] first, then [[> foo]].
5301+
But the user expects the redirections to be applied left-to-right:
5302+
first redirect stdout to [[foo]], then duplicate fd~2 from fd~1.
5303+
%
5304+
[<doredir()>] solves this by recursing to the end of the list
5305+
before performing any [[dup()]] calls, so the deepest (oldest)
5306+
redirection executes first---restoring the original left-to-right order.
52055307
% execexec -> <>
52065308
<<function [[doredir]]>>=
52075309
void
@@ -5242,6 +5344,16 @@ case ROPEN:
52425344

52435345
\subsubsection{Bytecode generation}
52445346

5347+
%claude:
5348+
The bytecode for a [[REDIR]] node first evaluates the filename
5349+
(with globbing), then emits the appropriate redirection bytecode
5350+
(e.g., [<Xwrite>] for [[>]]), followed by the file descriptor number.
5351+
After the redirected command ([[c1]]) finishes, [<Xpopredir>] cleans
5352+
up by closing the opened file descriptor and removing the [[Redir]]
5353+
record from the stack.
5354+
This scoping ensures that in [[cmd1 > foo; cmd2]],
5355+
the redirection applies only to [[cmd1]]---once [[Xpopredir]]
5356+
runs, [[cmd2]] sees the original file descriptors.
52455357
<<[[outcode()]] cases>>=
52465358
case REDIR:
52475359
emitf(Xmark);
@@ -5265,6 +5377,13 @@ case REDIR:
52655377

52665378
\subsubsection{[[Xwrite()]]}
52675379

5380+
%claude:
5381+
[<Xwrite()>] handles the [[>]] redirection at runtime.
5382+
It pops the filename from the argv stack, opens (or creates) the file
5383+
with [<Creat()>], and pushes an [[ROPEN]] record mapping the newly
5384+
opened file descriptor to the target fd (typically~1 for stdout,
5385+
read from the bytecode stream).
5386+
The actual [[dup()]] will happen later in [<doredir()>].
52685387
<<[[outcode()]] when REDIR case, switch redirection type cases>>=
52695388
case WRITE:
52705389
emitf(Xwrite);
@@ -5300,6 +5419,14 @@ Xwrite(void)
53005419
@
53015420

53025421

5422+
%claude:
5423+
[<Xpopredir()>] is the cleanup counterpart: it pops the top [[Redir]]
5424+
record and, if it was an [[ROPEN]], closes the file descriptor
5425+
that was opened for the redirection.
5426+
This is important because the opened fd (from [[Creat()]] or [[Open()]])
5427+
is a temporary---after [[doredir()]] has [[dup()]]'d it onto the
5428+
target fd in the child process, the parent still holds the original,
5429+
and it must be closed to avoid leaking file descriptors.
53035430
<<function [[Xpopredir]]>>=
53045431
void
53055432
Xpopredir(void)
@@ -5437,19 +5564,59 @@ turfredir();
54375564
%\subsubsection{Trace of a pipe}
54385565
%claude:
54395566
Pipes are arguably the most important operator in a shell.
5440-
For [[cmd1 | cmd2]], three processes are involved:
5441-
the parent (the [[rc]] interpreter), a child running [[cmd1]],
5442-
and the parent running [[cmd2]] in a new thread.
5567+
For [[cmd1 | cmd2]], four processes end up being involved:
5568+
the parent [[rc]], a child [[rc]] that interprets [[cmd1]]'s
5569+
bytecodes, and then each side fork+exec's the actual command
5570+
(e.g., [[ls]] and [[wc]]).
54435571
%
5444-
The left side ([[cmd1]]) runs in a forked child process.
5445-
The right side ([[cmd2]]) runs in the parent's new thread.
5572+
The left side ([[cmd1]]) runs in a forked child [[rc]] process.
5573+
The right side ([[cmd2]]) runs in the parent [[rc]]'s new thread.
54465574
This asymmetry means that the right side can modify
54475575
the shell's state (e.g., variable assignments), while
54485576
the left side cannot.
54495577
%
54505578
The compiled bytecodes contain two jump offsets: one for
54515579
the right side's code, and one for the parent's [<Xpipewait>].
54525580

5581+
%claude:
5582+
Here is the overall picture for [[ls | wc -l]]:
5583+
\begin{verbatim}
5584+
pipe(pfd)
5585+
|
5586+
fork()
5587+
/ \
5588+
child rc parent rc
5589+
close(pfd[PRD]) close(pfd[PWR])
5590+
redir: PWR->1 redir: PRD->0
5591+
| |
5592+
start thread start thread
5593+
for "ls" for "wc -l"
5594+
| |
5595+
Xsimple: Xsimple:
5596+
fork+exec fork+exec
5597+
| |
5598+
+------+------+ +------+------+
5599+
| process "ls"| | process "wc"|
5600+
| stdout=pfd | | stdin=pfd |
5601+
+------+------+ +------+------+
5602+
| |
5603+
wait for ls wait for wc
5604+
| |
5605+
Xexit Xreturn
5606+
(child rc exits) (thread done)
5607+
|
5608+
Xpipewait:
5609+
wait for child rc
5610+
concat statuses
5611+
\end{verbatim}
5612+
%
5613+
The child runs [[cmd1]]'s bytecodes and ends with [[Xexit]]
5614+
(it must terminate, not return to the parent's thread stack).
5615+
The parent creates a new thread for [[cmd2]] that ends with
5616+
[[Xreturn]], then falls through to [<Xpipewait>] which waits
5617+
for the child process and concatenates both exit statuses
5618+
into a pipe status string like [["0|0"]].
5619+
54535620
% when do % ls | wc -l, what happens?
54545621
% first bootstrapped thread t1,
54555622
% then read, compile, and start new thread t2 for that (which when it will
@@ -5708,7 +5875,7 @@ The child's stdin is redirected to [[/dev/null]]
57085875
and the child is put in a new note group ([[RFNOTEG]])
57095876
so that interrupts sent to the shell do not kill it.
57105877
%
5711-
The parent records the child's PID in [[\$apid]]
5878+
The parent records the child's PID in [[$apid]]
57125879
(for ``asynchronous PID''), allowing the user to wait for
57135880
it later or send signals to it.
57145881

@@ -6190,7 +6357,7 @@ bool fnchanged;
61906357
% * Xfn(name){... Xreturn} define function
61916358
% * Xdelfn(name) delete function definition
61926359
%claude:
6193-
When defining a function with [[fn foo \{body\}]],
6360+
When defining a function with \verb+fn foo {body}+,
61946361
[<Xfn>] stores the current code vector (via [<codecopy>])
61956362
in the variable named [[foo]].
61966363
This is why bytecode vectors are reference-counted:
@@ -6295,11 +6462,11 @@ When [<Xsimple()>] detects that the command name matches
62956462
a function, it calls [<execfunc()>].
62966463
This function pops the command arguments from [[argv]],
62976464
starts a new thread with the function's bytecodes,
6298-
and binds the arguments to a local [[\$*]] variable.
6465+
and binds the arguments to a local [[$*]] variable.
62996466
%
63006467
This is how function arguments work in [[rc]]:
6301-
inside a function, [[\$*]] contains the arguments passed
6302-
to the function, shadowing the script's global [[\$*]].
6468+
inside a function, [[$*]] contains the arguments passed
6469+
to the function, shadowing the script's global [[$*]].
63036470

63046471
<<[[Xsimple()]] other locals>>=
63056472
var *v;
@@ -6461,8 +6628,8 @@ It looks up the variable name and pushes its value
64616628
(a list of words) onto the argument stack.
64626629
%
64636630
A subtle feature: if the variable name is a number
6464-
(e.g., [[\$1]], [[\$2]]), [<Xdol()>] treats it as a positional
6465-
parameter, indexing into [[\$*]] instead of looking up
6631+
(e.g., [[$1]], [[$2]]), [<Xdol()>] treats it as a positional
6632+
parameter, indexing into [[$*]] instead of looking up
64666633
a named variable.
64676634

64686635
<<[[outcode()]] cases>>=
@@ -6559,14 +6726,14 @@ Xcount(void)
65596726

65606727

65616728
%claude:
6562-
Subscripting ([[\$x(2)]] or [[\$x(2-5)]]) extracts elements
6729+
Subscripting ([[$x(2)]] or [[$x(2-5)]]) extracts elements
65636730
from a list variable by index or range.
65646731
[<subwords()>] does the heavy lifting: it parses each subscript
65656732
(which can be a single number or a [[n-m]] range, where a bare
65666733
[[n-]] means ``from~n to the end''), walks [[n-1]] links into
65676734
the variable's value list, and calls [<copynwords()>] to extract
65686735
the slice. The recursion on [[sub->next]] handles multiple
6569-
subscripts like [[\$x(1 3 5)]], accumulating results in reverse
6736+
subscripts like [[$x(1 3 5)]], accumulating results in reverse
65706737
so the final list comes out in the right order.
65716738

65726739
<<[[outcode()]] cases>>=

0 commit comments

Comments
 (0)