@@ -4351,12 +4351,14 @@ if(p==nil)
43514351% was $$ in bash
43524352%claude:
43534353The exit status of the last command is stored in the
4354- [[\ $status]] variable. [<setstatus()>] sets it, and
4354+ [[$status]] variable. [<setstatus()>] sets it, and
43554355[<getstatus()>] and [<truestatus()>] read it.
43564356%
43574357In [[rc]], a status is a string (not an integer like in the
43584358Bourne shell). An empty string or the string [["0"]] means success;
43594359anything else means failure.
4360+ This mirrors the \plan system call [[exits()]] (see \book{Kernel}),
4361+ which takes a string argument rather than a numeric exit code.
43604362
43614363<<function [[setstatus]]>>=
43624364void
@@ -4375,6 +4377,44 @@ getstatus(void)
43754377}
43764378@
43774379
4380+ %claude:
4381+ The [[|]] check in [<truestatus()>] below deserves explanation.
4382+ When a pipeline like [[ls | wc]] finishes, [[rc]] combines the
4383+ exit statuses of all commands into a single string separated
4384+ by [[|]], for example [["0|0"]] if both succeed or
4385+ [["error|0"]] if the first fails.
4386+ So [<truestatus()>] considers a status ``true'' if it contains
4387+ only [[0]] and [[|]] characters---meaning every stage of the
4388+ pipeline succeeded.
4389+ %
4390+ Why not reduce the pipeline status to a single value earlier?
4391+ Because keeping the full string lets the user inspect [[\$status]]
4392+ and see exactly which stage failed (e.g., [["0|error|0"]]
4393+ tells you the middle command had a problem).
4394+ With numeric exit codes (as in \unix), reducing is easy: just
4395+ take the maximum or the last non-zero value.
4396+ But with string statuses, there is no natural way to merge
4397+ [["can't open"]] and [["0"]] into a single meaningful string,
4398+ so [[rc]] preserves them all.
4399+ In [[bash]], [[\$?]] only reports the exit status of the
4400+ {\em last} command in the pipeline---a failure in an earlier
4401+ stage is silently lost.
4402+ [[bash]] later added the [[PIPESTATUS]] array to work around
4403+ this, but it requires explicit use.
4404+
4405+ %claude:
4406+ This is another example of [[rc]] getting the design right
4407+ from the start, while [[bash]] accumulates workarounds:
4408+ pipeline statuses are visible by default in [[\$status]]
4409+ (vs.\ [[PIPESTATUS]] added later in [[bash]]),
4410+ all variables are automatically exported via [[/env/]]
4411+ (vs.\ the error-prone [[export]] command),
4412+ [[if not]] avoids the parsing ambiguity of [[else]]
4413+ (vs.\ the heavyweight [[if...then...fi]] syntax),
4414+ and string exit statuses naturally carry diagnostic information
4415+ (vs.\ opaque numeric codes that require [[strerror()]] or
4416+ a manual to decode).
4417+
43784418<<function [[truestatus]]>>=
43794419bool
43804420truestatus(void)
@@ -4465,6 +4505,12 @@ havewaitpid(int pid)
44654505% or builtins or functions
44664506\section{Simple commands}
44674507
4508+ %claude:
4509+ Now that we have seen the helper infrastructure---argument
4510+ stack management, process status, and subprocess tracking---we
4511+ can describe the bytecode generation and interpretation
4512+ for simple commands, which is the core of the shell.
4513+
44684514\subsection{Bytecode generation}
44694515
44704516% SIMPLE node built in simplemung()
@@ -4527,6 +4573,26 @@ else if(p->f==Xword || p->f==Xdelhere)
45274573 efree((++p)->s);
45284574@
45294575
4576+ %yoann: ASCII diagram for a simple command where can see AST with ARGLIST, WORDS, etc on the left and then on the right the corresponding code generation
4577+ %claude:
4578+ For example, [[echo hello world]] produces the following
4579+ AST and bytecodes:
4580+ \begin{verbatim}
4581+ AST: codebuf:
4582+
4583+ SIMPLE Xmark
4584+ | Xword "world"
4585+ ARGLIST Xword "hello"
4586+ / \ Xword "echo"
4587+ ARGLIST "world" Xsimple
4588+ / \
4589+ "echo" "hello"
4590+ \end{verbatim}
4591+ %
4592+ Note how the arguments are emitted in reverse order
4593+ ([[ARGLIST]] emits [[c1]] before [[c0]]), so that after
4594+ [<Xword>] prepends each word to the list, [[argv]] ends
4595+ up in the correct order: [[echo]], [[hello]], [[world]].
45304596
45314597\subsection{[[Xsimple()]]}
45324598
@@ -4746,7 +4812,7 @@ When the command name starts with [[/]], [[./]], [[../]],
47464812or [[#]] (device path in \plan), it is treated as an
47474813absolute path and no search is performed.
47484814Otherwise, [<searchpath()>] returns the list of directories
4749- in [[\ $path]] to try.
4815+ in [[$path]] to try.
47504816%
47514817[<Execute()>] then iterates over these directories,
47524818prepending each to the command name and calling [[exec()]].
@@ -4898,6 +4964,18 @@ Waitfor(int pid, bool _persist)
48984964
48994965% it is an optimisation then? mv to advanced topics?
49004966% when have exit in codebuf?
4967+ %claude:
4968+ If the next bytecode after [<Xsimple>] is [<Xexit>]
4969+ (possibly preceded by [<Xpopredir>] calls), the shell
4970+ is about to exit anyway, so forking a child and waiting
4971+ for it is pointless. In that case, [<Xsimple>] calls
4972+ [<execexec()>] directly, replacing the current process
4973+ with the command. This is the same optimization that
4974+ the [[exec]] builtin provides, applied automatically.
4975+ This matters in practice for scripts like [[rc -c 'ls']]:
4976+ without this optimization, [[rc]] would fork, exec [[ls]]
4977+ in the child, wait, and then exit---wasting a process.
4978+
49014979<<[[Xsimple()]] if exitnext()>>=
49024980if(exitnext()){
49034981 /* fork and wait is redundant */
@@ -4945,7 +5023,7 @@ case ';':
49455023
49465024%claude:
49475025[[&&]] and [[||]] use a forward jump technique.
4948- For [[cmd1 \&\& cmd2]], the compiler emits:
5026+ For [[cmd1 &\ cmd2]], the compiler emits:
49495027the bytecodes of [[cmd1]], then [<Xtrue>] followed by a
49505028placeholder jump offset, then the bytecodes of [[cmd2]].
49515029After emitting [[cmd2]], [<stuffdot()>] patches the placeholder
@@ -5037,7 +5115,7 @@ Xbang(void)
50375115\subsection{String matching}
50385116
50395117%claude:
5040- The [[~]] operator pushes two word lists onto [[argv]]
5118+ The \verb+~+ operator pushes two word lists onto [[argv]]
50415119(the subject and the patterns), then calls [<Xmatch>].
50425120Two [<Xmark>] bytecodes are needed because both the subject
50435121and the patterns can be multi-word (e.g., if they come from
@@ -5159,6 +5237,20 @@ p->redir = p->startredir = runq ? runq->redir : nil;
51595237
51605238\subsubsection{[[Redir]]}
51615239
5240+ %claude:
5241+ The [[Redir]] structure records a pending redirection: a [[type]]
5242+ (e.g., [[ROPEN]] for a regular file redirection), a [[from]] file
5243+ descriptor (the one that was opened), and a [[to]] file descriptor
5244+ (the one that should be replaced, such as 1 for stdout).
5245+ The thread's [[redir]] field points to a linked list of these records,
5246+ built up as the interpreter encounters redirection bytecodes.
5247+ %
5248+ The [[startredir]] field in [<start()>] deserves attention:
5249+ when a new thread is created, it inherits its parent's redirection
5250+ stack (so subshells see the parent's redirections), but
5251+ [[startredir]] remembers where the inherited portion ends.
5252+ This way, if an error occurs, the thread can pop only the redirections
5253+ it added, without disturbing the parent's state.
51625254<<struct [[Redir]]>>=
51635255struct Redir {
51645256 // enum<redirection_kind_bis>
@@ -5202,6 +5294,16 @@ pushredir(int type, int from, int to)
52025294
52035295\subsubsection{[[doredir()]]}
52045296
5297+ %claude:
5298+ Because [<pushredir()>] prepends each new record to the front of the
5299+ list, the list is in reverse order relative to the command line.
5300+ For [[cmd > foo >[2=1]]], the list has [[>[2=1]]] first, then [[> foo]].
5301+ But the user expects the redirections to be applied left-to-right:
5302+ first redirect stdout to [[foo]], then duplicate fd~2 from fd~1.
5303+ %
5304+ [<doredir()>] solves this by recursing to the end of the list
5305+ before performing any [[dup()]] calls, so the deepest (oldest)
5306+ redirection executes first---restoring the original left-to-right order.
52055307% execexec -> <>
52065308<<function [[doredir]]>>=
52075309void
@@ -5242,6 +5344,16 @@ case ROPEN:
52425344
52435345\subsubsection{Bytecode generation}
52445346
5347+ %claude:
5348+ The bytecode for a [[REDIR]] node first evaluates the filename
5349+ (with globbing), then emits the appropriate redirection bytecode
5350+ (e.g., [<Xwrite>] for [[>]]), followed by the file descriptor number.
5351+ After the redirected command ([[c1]]) finishes, [<Xpopredir>] cleans
5352+ up by closing the opened file descriptor and removing the [[Redir]]
5353+ record from the stack.
5354+ This scoping ensures that in [[cmd1 > foo; cmd2]],
5355+ the redirection applies only to [[cmd1]]---once [[Xpopredir]]
5356+ runs, [[cmd2]] sees the original file descriptors.
52455357<<[[outcode()]] cases>>=
52465358case REDIR:
52475359 emitf(Xmark);
@@ -5265,6 +5377,13 @@ case REDIR:
52655377
52665378\subsubsection{[[Xwrite()]]}
52675379
5380+ %claude:
5381+ [<Xwrite()>] handles the [[>]] redirection at runtime.
5382+ It pops the filename from the argv stack, opens (or creates) the file
5383+ with [<Creat()>], and pushes an [[ROPEN]] record mapping the newly
5384+ opened file descriptor to the target fd (typically~1 for stdout,
5385+ read from the bytecode stream).
5386+ The actual [[dup()]] will happen later in [<doredir()>].
52685387<<[[outcode()]] when REDIR case, switch redirection type cases>>=
52695388case WRITE:
52705389 emitf(Xwrite);
@@ -5300,6 +5419,14 @@ Xwrite(void)
53005419@
53015420
53025421
5422+ %claude:
5423+ [<Xpopredir()>] is the cleanup counterpart: it pops the top [[Redir]]
5424+ record and, if it was an [[ROPEN]], closes the file descriptor
5425+ that was opened for the redirection.
5426+ This is important because the opened fd (from [[Creat()]] or [[Open()]])
5427+ is a temporary---after [[doredir()]] has [[dup()]]'d it onto the
5428+ target fd in the child process, the parent still holds the original,
5429+ and it must be closed to avoid leaking file descriptors.
53035430<<function [[Xpopredir]]>>=
53045431void
53055432Xpopredir(void)
@@ -5437,19 +5564,59 @@ turfredir();
54375564%\subsubsection{Trace of a pipe}
54385565%claude:
54395566Pipes are arguably the most important operator in a shell.
5440- For [[cmd1 | cmd2]], three processes are involved:
5441- the parent (the [[rc]] interpreter), a child running [[cmd1]],
5442- and the parent running [[cmd2]] in a new thread.
5567+ For [[cmd1 | cmd2]], four processes end up being involved:
5568+ the parent [[rc]], a child [[rc]] that interprets [[cmd1]]'s
5569+ bytecodes, and then each side fork+exec's the actual command
5570+ (e.g., [[ls]] and [[wc]]).
54435571%
5444- The left side ([[cmd1]]) runs in a forked child process.
5445- The right side ([[cmd2]]) runs in the parent's new thread.
5572+ The left side ([[cmd1]]) runs in a forked child [[rc]] process.
5573+ The right side ([[cmd2]]) runs in the parent [[rc]] 's new thread.
54465574This asymmetry means that the right side can modify
54475575the shell's state (e.g., variable assignments), while
54485576the left side cannot.
54495577%
54505578The compiled bytecodes contain two jump offsets: one for
54515579the right side's code, and one for the parent's [<Xpipewait>].
54525580
5581+ %claude:
5582+ Here is the overall picture for [[ls | wc -l]]:
5583+ \begin{verbatim}
5584+ pipe(pfd)
5585+ |
5586+ fork()
5587+ / \
5588+ child rc parent rc
5589+ close(pfd[PRD]) close(pfd[PWR])
5590+ redir: PWR->1 redir: PRD->0
5591+ | |
5592+ start thread start thread
5593+ for "ls" for "wc -l"
5594+ | |
5595+ Xsimple: Xsimple:
5596+ fork+exec fork+exec
5597+ | |
5598+ +------+------+ +------+------+
5599+ | process "ls"| | process "wc"|
5600+ | stdout=pfd | | stdin=pfd |
5601+ +------+------+ +------+------+
5602+ | |
5603+ wait for ls wait for wc
5604+ | |
5605+ Xexit Xreturn
5606+ (child rc exits) (thread done)
5607+ |
5608+ Xpipewait:
5609+ wait for child rc
5610+ concat statuses
5611+ \end{verbatim}
5612+ %
5613+ The child runs [[cmd1]]'s bytecodes and ends with [[Xexit]]
5614+ (it must terminate, not return to the parent's thread stack).
5615+ The parent creates a new thread for [[cmd2]] that ends with
5616+ [[Xreturn]], then falls through to [<Xpipewait>] which waits
5617+ for the child process and concatenates both exit statuses
5618+ into a pipe status string like [["0|0"]].
5619+
54535620% when do % ls | wc -l, what happens?
54545621% first bootstrapped thread t1,
54555622% then read, compile, and start new thread t2 for that (which when it will
@@ -5708,7 +5875,7 @@ The child's stdin is redirected to [[/dev/null]]
57085875and the child is put in a new note group ([[RFNOTEG]])
57095876so that interrupts sent to the shell do not kill it.
57105877%
5711- The parent records the child's PID in [[\ $apid]]
5878+ The parent records the child's PID in [[$apid]]
57125879(for ``asynchronous PID''), allowing the user to wait for
57135880it later or send signals to it.
57145881
@@ -6190,7 +6357,7 @@ bool fnchanged;
61906357% * Xfn(name){... Xreturn} define function
61916358% * Xdelfn(name) delete function definition
61926359%claude:
6193- When defining a function with [[ fn foo \ {body\}]] ,
6360+ When defining a function with \verb+ fn foo {body}+ ,
61946361[<Xfn>] stores the current code vector (via [<codecopy>])
61956362in the variable named [[foo]].
61966363This is why bytecode vectors are reference-counted:
@@ -6295,11 +6462,11 @@ When [<Xsimple()>] detects that the command name matches
62956462a function, it calls [<execfunc()>].
62966463This function pops the command arguments from [[argv]],
62976464starts a new thread with the function's bytecodes,
6298- and binds the arguments to a local [[\ $*]] variable.
6465+ and binds the arguments to a local [[$*]] variable.
62996466%
63006467This is how function arguments work in [[rc]]:
6301- inside a function, [[\ $*]] contains the arguments passed
6302- to the function, shadowing the script's global [[\ $*]].
6468+ inside a function, [[$*]] contains the arguments passed
6469+ to the function, shadowing the script's global [[$*]].
63036470
63046471<<[[Xsimple()]] other locals>>=
63056472var *v;
@@ -6461,8 +6628,8 @@ It looks up the variable name and pushes its value
64616628(a list of words) onto the argument stack.
64626629%
64636630A subtle feature: if the variable name is a number
6464- (e.g., [[\ $1]], [[\ $2]]), [<Xdol()>] treats it as a positional
6465- parameter, indexing into [[\ $*]] instead of looking up
6631+ (e.g., [[$1]], [[$2]]), [<Xdol()>] treats it as a positional
6632+ parameter, indexing into [[$*]] instead of looking up
64666633a named variable.
64676634
64686635<<[[outcode()]] cases>>=
@@ -6559,14 +6726,14 @@ Xcount(void)
65596726
65606727
65616728%claude:
6562- Subscripting ([[\ $x(2)]] or [[\ $x(2-5)]]) extracts elements
6729+ Subscripting ([[$x(2)]] or [[$x(2-5)]]) extracts elements
65636730from a list variable by index or range.
65646731[<subwords()>] does the heavy lifting: it parses each subscript
65656732(which can be a single number or a [[n-m]] range, where a bare
65666733[[n-]] means ``from~n to the end''), walks [[n-1]] links into
65676734the variable's value list, and calls [<copynwords()>] to extract
65686735the slice. The recursion on [[sub->next]] handles multiple
6569- subscripts like [[\ $x(1 3 5)]], accumulating results in reverse
6736+ subscripts like [[$x(1 3 5)]], accumulating results in reverse
65706737so the final list comes out in the right order.
65716738
65726739<<[[outcode()]] cases>>=
0 commit comments