@@ -347,270 +347,6 @@ However, only the value of register ``r1`` is important to successfully finish
347
347
verification. The goal of the liveness tracking algorithm is to spot this fact
348
348
and figure out that both states are actually equivalent.
349
349
350
- Data structures
351
- ~~~~~~~~~~~~~~~
352
-
353
- Liveness is tracked using the following data structures::
354
-
355
- enum bpf_reg_liveness {
356
- REG_LIVE_NONE = 0,
357
- REG_LIVE_READ32 = 0x1,
358
- REG_LIVE_READ64 = 0x2,
359
- REG_LIVE_READ = REG_LIVE_READ32 | REG_LIVE_READ64,
360
- REG_LIVE_WRITTEN = 0x4,
361
- REG_LIVE_DONE = 0x8,
362
- };
363
-
364
- struct bpf_reg_state {
365
- ...
366
- struct bpf_reg_state *parent;
367
- ...
368
- enum bpf_reg_liveness live;
369
- ...
370
- };
371
-
372
- struct bpf_stack_state {
373
- struct bpf_reg_state spilled_ptr;
374
- ...
375
- };
376
-
377
- struct bpf_func_state {
378
- struct bpf_reg_state regs[MAX_BPF_REG];
379
- ...
380
- struct bpf_stack_state *stack;
381
- }
382
-
383
- struct bpf_verifier_state {
384
- struct bpf_func_state *frame[MAX_CALL_FRAMES];
385
- struct bpf_verifier_state *parent;
386
- ...
387
- }
388
-
389
- * ``REG_LIVE_NONE `` is an initial value assigned to ``->live `` fields upon new
390
- verifier state creation;
391
-
392
- * ``REG_LIVE_WRITTEN `` means that the value of the register (or stack slot) is
393
- defined by some instruction verified between this verifier state's parent and
394
- verifier state itself;
395
-
396
- * ``REG_LIVE_READ{32,64} `` means that the value of the register (or stack slot)
397
- is read by a some child state of this verifier state;
398
-
399
- * ``REG_LIVE_DONE `` is a marker used by ``clean_verifier_state() `` to avoid
400
- processing same verifier state multiple times and for some sanity checks;
401
-
402
- * ``->live `` field values are formed by combining ``enum bpf_reg_liveness ``
403
- values using bitwise or.
404
-
405
- Register parentage chains
406
- ~~~~~~~~~~~~~~~~~~~~~~~~~
407
-
408
- In order to propagate information between parent and child states, a *register
409
- parentage chain * is established. Each register or stack slot is linked to a
410
- corresponding register or stack slot in its parent state via a ``->parent ``
411
- pointer. This link is established upon state creation in ``is_state_visited() ``
412
- and might be modified by ``set_callee_state() `` called from
413
- ``__check_func_call() ``.
414
-
415
- The rules for correspondence between registers / stack slots are as follows:
416
-
417
- * For the current stack frame, registers and stack slots of the new state are
418
- linked to the registers and stack slots of the parent state with the same
419
- indices.
420
-
421
- * For the outer stack frames, only callee saved registers (r6-r9) and stack
422
- slots are linked to the registers and stack slots of the parent state with the
423
- same indices.
424
-
425
- * When function call is processed a new ``struct bpf_func_state `` instance is
426
- allocated, it encapsulates a new set of registers and stack slots. For this
427
- new frame, parent links for r6-r9 and stack slots are set to nil, parent links
428
- for r1-r5 are set to match caller r1-r5 parent links.
429
-
430
- This could be illustrated by the following diagram (arrows stand for
431
- ``->parent `` pointers)::
432
-
433
- ... ; Frame #0, some instructions
434
- --- checkpoint #0 ---
435
- 1 : r6 = 42 ; Frame #0
436
- --- checkpoint #1 ---
437
- 2 : call foo() ; Frame #0
438
- ... ; Frame #1, instructions from foo()
439
- --- checkpoint #2 ---
440
- ... ; Frame #1, instructions from foo()
441
- --- checkpoint #3 ---
442
- exit ; Frame #1, return from foo()
443
- 3 : r1 = r6 ; Frame #0 <- current state
444
-
445
- +-------------------------------+-------------------------------+
446
- | Frame #0 | Frame #1 |
447
- Checkpoint +-------------------------------+-------------------------------+
448
- #0 | r0 | r1-r5 | r6-r9 | fp-8 ... |
449
- +-------------------------------+
450
- ^ ^ ^ ^
451
- | | | |
452
- Checkpoint +-------------------------------+
453
- #1 | r0 | r1-r5 | r6-r9 | fp-8 ... |
454
- +-------------------------------+
455
- ^ ^ ^
456
- |_______|_______|_______________
457
- | | |
458
- nil nil | | | nil nil
459
- | | | | | | |
460
- Checkpoint +-------------------------------+-------------------------------+
461
- #2 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... |
462
- +-------------------------------+-------------------------------+
463
- ^ ^ ^ ^ ^
464
- nil nil | | | | |
465
- | | | | | | |
466
- Checkpoint +-------------------------------+-------------------------------+
467
- #3 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... |
468
- +-------------------------------+-------------------------------+
469
- ^ ^
470
- nil nil | |
471
- | | | |
472
- Current +-------------------------------+
473
- state | r0 | r1-r5 | r6-r9 | fp-8 ... |
474
- +-------------------------------+
475
- \
476
- r6 read mark is propagated via these links
477
- all the way up to checkpoint #1.
478
- The checkpoint #1 contains a write mark for r6
479
- because of instruction (1), thus read propagation
480
- does not reach checkpoint #0 (see section below).
481
-
482
- Liveness marks tracking
483
- ~~~~~~~~~~~~~~~~~~~~~~~
484
-
485
- For each processed instruction, the verifier tracks read and written registers
486
- and stack slots. The main idea of the algorithm is that read marks propagate
487
- back along the state parentage chain until they hit a write mark, which 'screens
488
- off' earlier states from the read. The information about reads is propagated by
489
- function ``mark_reg_read() `` which could be summarized as follows::
490
-
491
- mark_reg_read(struct bpf_reg_state *state, ...):
492
- parent = state->parent
493
- while parent:
494
- if state->live & REG_LIVE_WRITTEN:
495
- break
496
- if parent->live & REG_LIVE_READ64:
497
- break
498
- parent->live |= REG_LIVE_READ64
499
- state = parent
500
- parent = state->parent
501
-
502
- Notes:
503
-
504
- * The read marks are applied to the **parent ** state while write marks are
505
- applied to the **current ** state. The write mark on a register or stack slot
506
- means that it is updated by some instruction in the straight-line code leading
507
- from the parent state to the current state.
508
-
509
- * Details about REG_LIVE_READ32 are omitted.
510
-
511
- * Function ``propagate_liveness() `` (see section :ref: `read_marks_for_cache_hits `)
512
- might override the first parent link. Please refer to the comments in the
513
- ``propagate_liveness() `` and ``mark_reg_read() `` source code for further
514
- details.
515
-
516
- Because stack writes could have different sizes ``REG_LIVE_WRITTEN `` marks are
517
- applied conservatively: stack slots are marked as written only if write size
518
- corresponds to the size of the register, e.g. see function ``save_register_state() ``.
519
-
520
- Consider the following example::
521
-
522
- 0: (*u64)(r10 - 8) = 0 ; define 8 bytes of fp-8
523
- --- checkpoint #0 ---
524
- 1: (*u32)(r10 - 8) = 1 ; redefine lower 4 bytes
525
- 2: r1 = (*u32)(r10 - 8) ; read lower 4 bytes defined at (1)
526
- 3: r2 = (*u32)(r10 - 4) ; read upper 4 bytes defined at (0)
527
-
528
- As stated above, the write at (1) does not count as ``REG_LIVE_WRITTEN ``. Should
529
- it be otherwise, the algorithm above wouldn't be able to propagate the read mark
530
- from (3) to checkpoint #0.
531
-
532
- Once the ``BPF_EXIT `` instruction is reached ``update_branch_counts() `` is
533
- called to update the ``->branches `` counter for each verifier state in a chain
534
- of parent verifier states. When the ``->branches `` counter reaches zero the
535
- verifier state becomes a valid entry in a set of cached verifier states.
536
-
537
- Each entry of the verifier states cache is post-processed by a function
538
- ``clean_live_states() ``. This function marks all registers and stack slots
539
- without ``REG_LIVE_READ{32,64} `` marks as ``NOT_INIT `` or ``STACK_INVALID ``.
540
- Registers/stack slots marked in this way are ignored in function ``stacksafe() ``
541
- called from ``states_equal() `` when a state cache entry is considered for
542
- equivalence with a current state.
543
-
544
- Now it is possible to explain how the example from the beginning of the section
545
- works::
546
-
547
- 0: call bpf_get_prandom_u32()
548
- 1: r1 = 0
549
- 2: if r0 == 0 goto +1
550
- 3: r0 = 1
551
- --- checkpoint[0] ---
552
- 4: r0 = r1
553
- 5: exit
554
-
555
- * At instruction #2 branching point is reached and state ``{ r0 == 0, r1 == 0, pc == 4 } ``
556
- is pushed to states processing queue (pc stands for program counter).
557
-
558
- * At instruction #4:
559
-
560
- * ``checkpoint[0] `` states cache entry is created: ``{ r0 == 1, r1 == 0, pc == 4 } ``;
561
- * ``checkpoint[0].r0 `` is marked as written;
562
- * ``checkpoint[0].r1 `` is marked as read;
563
-
564
- * At instruction #5 exit is reached and ``checkpoint[0] `` can now be processed
565
- by ``clean_live_states() ``. After this processing ``checkpoint[0].r1 `` has a
566
- read mark and all other registers and stack slots are marked as ``NOT_INIT ``
567
- or ``STACK_INVALID ``
568
-
569
- * The state ``{ r0 == 0, r1 == 0, pc == 4 } `` is popped from the states queue
570
- and is compared against a cached state ``{ r1 == 0, pc == 4 } ``, the states
571
- are considered equivalent.
572
-
573
- .. _read_marks_for_cache_hits :
574
-
575
- Read marks propagation for cache hits
576
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
577
-
578
- Another point is the handling of read marks when a previously verified state is
579
- found in the states cache. Upon cache hit verifier must behave in the same way
580
- as if the current state was verified to the program exit. This means that all
581
- read marks, present on registers and stack slots of the cached state, must be
582
- propagated over the parentage chain of the current state. Example below shows
583
- why this is important. Function ``propagate_liveness() `` handles this case.
584
-
585
- Consider the following state parentage chain (S is a starting state, A-E are
586
- derived states, -> arrows show which state is derived from which)::
587
-
588
- r1 read
589
- <------------- A[r1] == 0
590
- C[r1] == 0
591
- S ---> A ---> B ---> exit E[r1] == 1
592
- |
593
- ` ---> C ---> D
594
- |
595
- ` ---> E ^
596
- |___ suppose all these
597
- ^ states are at insn #Y
598
- |
599
- suppose all these
600
- states are at insn #X
601
-
602
- * Chain of states ``S -> A -> B -> exit `` is verified first.
603
-
604
- * While ``B -> exit `` is verified, register ``r1 `` is read and this read mark is
605
- propagated up to state ``A ``.
606
-
607
- * When chain of states ``C -> D `` is verified the state ``D `` turns out to be
608
- equivalent to state ``B ``.
609
-
610
- * The read mark for ``r1 `` has to be propagated to state ``C ``, otherwise state
611
- ``C `` might get mistakenly marked as equivalent to state ``E `` even though
612
- values for register ``r1 `` differ between ``C `` and ``E ``.
613
-
614
350
Understanding eBPF verifier messages
615
351
====================================
616
352
0 commit comments