@@ -14,16 +14,16 @@ A family of instructions has the following fundamental properties:
1414  it executes the non-adaptive instruction.
1515*  It has at least one specialized form of the instruction that is tailored
1616  for a particular value or set of values at runtime.
17- *  All members of the family have access to the same number of cache entries.
18-   Individual family members do not need to use all of the entries.
17+ *  All members of the family must have the same number of inline cache entries,
18+   to ensure correct execution.
19+   Individual family members do not need to use all of the entries,
20+   but must skip over any unused entries when executing.
1921
2022The current implementation also requires the following,
2123although these are not fundamental and may change:
2224
23- *  If a family uses one or more entries, then the first entry must be a
24-   ` _PyAdaptiveEntry `  entry.
25- *  If a family uses no cache entries, then the ` oparg `  is used as the
26-   counter for the adaptive instruction.
25+ *  All families uses one or more inline cache entries,
26+   the first entry is always the counter.
2727*  All instruction names should start with the name of the non-adaptive
2828  instruction.
2929*  The adaptive instruction should end in ` _ADAPTIVE ` .
@@ -76,6 +76,10 @@ keeping `Ti` low which means minimizing branches and dependent memory
7676accesses (pointer chasing). These two objectives may be in conflict,
7777requiring judgement and experimentation to design the family of instructions.
7878
79+ The size of the inline cache should as small as possible,
80+ without impairing performance, to reduce the number of
81+ ` EXTENDED_ARG `  jumps, and to reduce pressure on the CPU's data cache.
82+ 
7983### Gathering data  
8084
8185Before choosing how to specialize an instruction, it is important to gather
@@ -106,7 +110,7 @@ This can be tested quickly:
106110*  ` globals->keys->dk_version == expected_version ` 
107111
108112and the operation can be performed quickly:
109- *  ` value = globals->keys-> entries[index].value  ` .
113+ *  ` value = entries[cache-> index].me_value;  ` .
110114
111115Because it is impossible to measure the performance of an instruction without
112116also measuring unrelated factors, the assessment of the quality of a
@@ -119,8 +123,7 @@ base instruction.
119123
120124In general, specialized instructions should be implemented in two parts:
1211251 .  A sequence of guards, each of the form
122-   ` DEOPT_IF(guard-condition-is-false, BASE_NAME) ` ,
123-   followed by a ` record_cache_hit() ` .
126+   ` DEOPT_IF(guard-condition-is-false, BASE_NAME) ` .
1241272 .  The operation, which should ideally have no branches and
125128  a minimum number of dependent memory accesses.
126129
@@ -129,3 +132,11 @@ can be re-used in the operation.
129132
130133If there are branches in the operation, then consider further specialization
131134to eliminate the branches.
135+ 
136+ ### Maintaining stats  
137+ 
138+ Finally, take care that stats are gather correctly.
139+ After the last ` DEOPT_IF `  has passed, a hit should be recorded with
140+ ` STAT_INC(BASE_INSTRUCTION, hit) ` .
141+ After a optimization has been deferred in the ` ADAPTIVE `  form,
142+ that should be recorded with ` STAT_INC(BASE_INSTRUCTION, deferred) ` .
0 commit comments