|
4 | 4 | #ifndef _IFS_H_
|
5 | 5 | #define _IFS_H_
|
6 | 6 |
|
| 7 | +/** |
| 8 | + * DOC: In-Field Scan |
| 9 | + * |
| 10 | + * ============= |
| 11 | + * In-Field Scan |
| 12 | + * ============= |
| 13 | + * |
| 14 | + * Introduction |
| 15 | + * ------------ |
| 16 | + * |
| 17 | + * In Field Scan (IFS) is a hardware feature to run circuit level tests on |
| 18 | + * a CPU core to detect problems that are not caught by parity or ECC checks. |
| 19 | + * Future CPUs will support more than one type of test which will show up |
| 20 | + * with a new platform-device instance-id, for now only .0 is exposed. |
| 21 | + * |
| 22 | + * |
| 23 | + * IFS Image |
| 24 | + * --------- |
| 25 | + * |
| 26 | + * Intel provides a firmware file containing the scan tests via |
| 27 | + * github [#f1]_. Similar to microcode there is a separate file for each |
| 28 | + * family-model-stepping. |
| 29 | + * |
| 30 | + * IFS Image Loading |
| 31 | + * ----------------- |
| 32 | + * |
| 33 | + * The driver loads the tests into memory reserved BIOS local to each CPU |
| 34 | + * socket in a two step process using writes to MSRs to first load the |
| 35 | + * SHA hashes for the test. Then the tests themselves. Status MSRs provide |
| 36 | + * feedback on the success/failure of these steps. When a new test file |
| 37 | + * is installed it can be loaded by writing to the driver reload file:: |
| 38 | + * |
| 39 | + * # echo 1 > /sys/devices/virtual/misc/intel_ifs_0/reload |
| 40 | + * |
| 41 | + * Similar to microcode, the current version of the scan tests is stored |
| 42 | + * in a fixed location: /lib/firmware/intel/ifs.0/family-model-stepping.scan |
| 43 | + * |
| 44 | + * Running tests |
| 45 | + * ------------- |
| 46 | + * |
| 47 | + * Tests are run by the driver synchronizing execution of all threads on a |
| 48 | + * core and then writing to the ACTIVATE_SCAN MSR on all threads. Instruction |
| 49 | + * execution continues when: |
| 50 | + * |
| 51 | + * 1) All tests have completed. |
| 52 | + * 2) Execution was interrupted. |
| 53 | + * 3) A test detected a problem. |
| 54 | + * |
| 55 | + * Note that ALL THREADS ON THE CORE ARE EFFECTIVELY OFFLINE FOR THE |
| 56 | + * DURATION OF THE TEST. This can be up to 200 milliseconds. If the system |
| 57 | + * is running latency sensitive applications that cannot tolerate an |
| 58 | + * interruption of this magnitude, the system administrator must arrange |
| 59 | + * to migrate those applications to other cores before running a core test. |
| 60 | + * It may also be necessary to redirect interrupts to other CPUs. |
| 61 | + * |
| 62 | + * In all cases reading the SCAN_STATUS MSR provides details on what |
| 63 | + * happened. The driver makes the value of this MSR visible to applications |
| 64 | + * via the "details" file (see below). Interrupted tests may be restarted. |
| 65 | + * |
| 66 | + * The IFS driver provides sysfs interfaces via /sys/devices/virtual/misc/intel_ifs_0/ |
| 67 | + * to control execution: |
| 68 | + * |
| 69 | + * Test a specific core:: |
| 70 | + * |
| 71 | + * # echo <cpu#> > /sys/devices/virtual/misc/intel_ifs_0/run_test |
| 72 | + * |
| 73 | + * when HT is enabled any of the sibling cpu# can be specified to test |
| 74 | + * its corresponding physical core. Since the tests are per physical core, |
| 75 | + * the result of testing any thread is same. All siblings must be online |
| 76 | + * to run a core test. It is only necessary to test one thread. |
| 77 | + * |
| 78 | + * For e.g. to test core corresponding to cpu5 |
| 79 | + * |
| 80 | + * # echo 5 > /sys/devices/virtual/misc/intel_ifs_0/run_test |
| 81 | + * |
| 82 | + * Results of the last test is provided in /sys:: |
| 83 | + * |
| 84 | + * $ cat /sys/devices/virtual/misc/intel_ifs_0/status |
| 85 | + * pass |
| 86 | + * |
| 87 | + * Status can be one of pass, fail, untested |
| 88 | + * |
| 89 | + * Additional details of the last test is provided by the details file:: |
| 90 | + * |
| 91 | + * $ cat /sys/devices/virtual/misc/intel_ifs_0/details |
| 92 | + * 0x8081 |
| 93 | + * |
| 94 | + * The details file reports the hex value of the SCAN_STATUS MSR. |
| 95 | + * Hardware defined error codes are documented in volume 4 of the Intel |
| 96 | + * Software Developer's Manual but the error_code field may contain one of |
| 97 | + * the following driver defined software codes: |
| 98 | + * |
| 99 | + * +------+--------------------+ |
| 100 | + * | 0xFD | Software timeout | |
| 101 | + * +------+--------------------+ |
| 102 | + * | 0xFE | Partial completion | |
| 103 | + * +------+--------------------+ |
| 104 | + * |
| 105 | + * Driver design choices |
| 106 | + * --------------------- |
| 107 | + * |
| 108 | + * 1) The ACTIVATE_SCAN MSR allows for running any consecutive subrange of |
| 109 | + * available tests. But the driver always tries to run all tests and only |
| 110 | + * uses the subrange feature to restart an interrupted test. |
| 111 | + * |
| 112 | + * 2) Hardware allows for some number of cores to be tested in parallel. |
| 113 | + * The driver does not make use of this, it only tests one core at a time. |
| 114 | + * |
| 115 | + * .. [#f1] https://github.com/intel/TBD |
| 116 | + */ |
7 | 117 | #include <linux/device.h>
|
8 | 118 | #include <linux/miscdevice.h>
|
9 | 119 |
|
|
0 commit comments