docs(dedibox-hardware): add missing content (#4422)

nerda-codes · bene2k1 · commit d68c7b87bf6b · 2025-03-12T10:53:57.000+01:00
diff --git a/pages/dedibox-hardware/troubleshooting/diagnose-defective-disk.mdx b/pages/dedibox-hardware/troubleshooting/diagnose-defective-disk.mdx
@@ -10,7 +10,7 @@ dates:
   validation: 2025-02-06
   posted: 2021-11-02
 categories:
-  - dedibox-servers
+  - dedibox-hardware
 ---
 
 `Smartmontools` is a set of tools that controls and monitors a disk using the **SMART** standard (Self-Monitoring, Analysis, and Reporting Technology System).
@@ -56,7 +56,7 @@ On these servers, the physical disks are referred to as `sg*` devices.
       As the devices can be positioned a little further away, do not hesitate to test up to `sg5` if you do not have conclusive results.
     </Message>
 
-### Dell PERC H310 controller
+### Dell PERC controller (H310, H700, H710, H730-P, LSI9361)
 
 Two possibilities exist for this type of controller:
 
@@ -83,7 +83,7 @@ The first one displays the status of the RAID volume, whilst the second one disp
         smartctl -s on -a -d megaraid,${i} ${DEVICE} -T permissive
     done
     ```
-## How to check an HP multi-disk server
+## How to check an HP multi-disk server (P410, P420, P222)
 
 1. Log into your server using SSH.
 2. Run the following command to display the status of the RAID:
@@ -121,7 +121,7 @@ The first one displays the status of the RAID volume, whilst the second one disp
 
 ### How to configure SMARTD
 
-Below, you find an example of a single-disk server installed on a Debian-like machine.
+Below, you will find an example of a single-disk server installed on a Debian-like machine.
 
 <Message type="note">
   The following commands are to be executed as `root` or via `sudo`.
@@ -193,4 +193,260 @@ Local Time is: Fri Oct 29 11:20:27 2010 CEST
 
 <Message type="tip">
   For more information on Smartmontools, refer to the [official documentation](https://www.smartmontools.org/wiki/TocDoc).
-</Message>
+</Message>
+
+<Tabs id="Smart data examples">
+  <TabsTab label="HDD example">
+   The example below shows SMART data for the HDD storage type:
+
+    ```
+    === START OF INFORMATION SECTION ===
+    Model Family:     Seagate Constellation ES.3
+    Device Model:     ST1000NM0033-9ZM173
+    Serial Number:    Z1W2P3WL
+    LU WWN Device Id: 5 000c50 0790721c5
+    Add. Product Id:  DELL(tm)
+    Firmware Version: GA0A
+    User Capacity:    1 000 204 886 016 bytes [1,00 TB]
+    Sector Size:      512 bytes logical/physical
+    Rotation Rate:    7200 rpm
+    Form Factor:      3.5 inches
+    Device is:        In smartctl database [for details use: -P show]
+    ATA Version is:   ACS-2 (minor revision not indicated)
+    SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
+    Local Time is:    Wed Jan 22 11:26:49 2025 CET
+    SMART support is: Available - device has SMART capability.
+    SMART support is: Enabled
+
+    === START OF READ SMART DATA SECTION ===
+    SMART overall-health self-assessment test result: PASSED
+
+    General SMART Values:
+    Offline data collection status:  (0x82) Offline data collection activity
+              was completed without error.
+              Auto Offline Data Collection: Enabled.
+    Self-test execution status:      (   0) The previous self-test routine completed
+              without error or no self-test has ever
+              been run.
+    Total time to complete Offline
+    data collection:    (   90) seconds.
+    Offline data collection
+    capabilities:        (0x7b) SMART execute Offline immediate.
+              Auto Offline data collection on/off support.
+              Suspend Offline collection upon new
+              command.
+              Offline surface scan supported.
+              Self-test supported.
+              Conveyance Self-test supported.
+              Selective Self-test supported.
+    SMART capabilities:            (0x0003) Saves SMART data before entering
+              power-saving mode.
+              Supports SMART auto save timer.
+    Error logging capability:        (0x01) Error logging supported.
+              General Purpose Logging supported.
+    Short self-test routine
+    recommended polling time:    (   2) minutes.
+    Extended self-test routine
+    recommended polling time:    ( 115) minutes.
+    Conveyance self-test routine
+    recommended polling time:    (   3) minutes.
+    SCT capabilities:          (0x50bd) SCT Status supported.
+              SCT Error Recovery Control supported.
+              SCT Feature Control supported.
+              SCT Data Table supported.
+
+    SMART Attributes Data Structure revision number: 10
+    Vendor Specific SMART Attributes with Thresholds:
+    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
+      1 Raw_Read_Error_Rate     0x010f   079   063   044    Pre-fail  Always       -       90441339
+      3 Spin_Up_Time            0x0103   096   095   000    Pre-fail  Always       -       0
+      4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       26
+      5 Reallocated_Sector_Ct   0x0133   100   100   010    Pre-fail  Always       -       0
+      7 Seek_Error_Rate         0x000f   093   060   030    Pre-fail  Always       -       2198492836
+      9 Power_On_Hours          0x0032   094   011   000    Old_age   Always       -       5442
+    10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
+    12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       18
+    184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
+    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
+    188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       1
+    189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
+    190 Airflow_Temperature_Cel 0x0022   071   061   045    Old_age   Always       -       29 (Min/Max 27/34)
+    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
+    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       9
+    193 Load_Cycle_Count        0x0032   094   094   000    Old_age   Always       -       12859
+    194 Temperature_Celsius     0x0022   029   040   000    Old_age   Always       -       29 (0 22 0 0 0)
+    195 Hardware_ECC_Recovered  0x001a   046   015   000    Old_age   Always       -       90441339
+    196 Reallocated_Event_Count 0x0032   000   000   000    Old_age   Always       -       65535
+    197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
+    198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
+    199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
+    240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       62209 (42 197 0)
+    241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       75618145300
+    242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       528734761477
+
+    SMART Error Log Version: 1
+    No Errors Logged
+    ```
+    If `total_uncorrected_errors` or `errors_corrected_by_rereads_rewrites` is > 0, the disk is out of order.
+  </TabsTab>
+  <TabsTab label="SSD example">
+   The example below shows SMART data for the SSD storage type:
+
+    ```
+    === START OF INFORMATION SECTION ===
+    Model Family:     Crucial/Micron MX1/2/300, M5/600, 1100 Client SSDs
+    Device Model:     Micron_1100_MTFDDAK512TBN
+    Serial Number:    1709160C2354
+    LU WWN Device Id: 5 00a075 1160c2354
+    Firmware Version: M0MU031
+    User Capacity:    512 110 190 592 bytes [512 GB]
+    Sector Size:      512 bytes logical/physical
+    Rotation Rate:    Solid State Device
+    Form Factor:      2.5 inches
+    Device is:        In smartctl database [for details use: -P show]
+    ATA Version is:   ACS-3 T13/2161-D revision 5
+    SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
+    Local Time is:    Wed Jan 22 11:24:34 2025 CET
+    SMART support is: Available - device has SMART capability.
+    SMART support is: Enabled
+
+    === START OF READ SMART DATA SECTION ===
+    SMART overall-health self-assessment test result: PASSED
+
+    General SMART Values:
+    Offline data collection status:  (0x03) Offline data collection activity
+              is in progress.
+              Auto Offline Data Collection: Disabled.
+    Self-test execution status:      (   0) The previous self-test routine completed
+              without error or no self-test has ever
+              been run.
+    Total time to complete Offline
+    data collection:    (  913) seconds.
+    Offline data collection
+    capabilities:        (0x7b) SMART execute Offline immediate.
+              Auto Offline data collection on/off support.
+              Suspend Offline collection upon new
+              command.
+              Offline surface scan supported.
+              Self-test supported.
+              Conveyance Self-test supported.
+              Selective Self-test supported.
+    SMART capabilities:            (0x0003) Saves SMART data before entering
+              power-saving mode.
+              Supports SMART auto save timer.
+    Error logging capability:        (0x01) Error logging supported.
+              General Purpose Logging supported.
+    Short self-test routine
+    recommended polling time:    (   2) minutes.
+    Extended self-test routine
+    recommended polling time:    (   7) minutes.
+    Conveyance self-test routine
+    recommended polling time:    (   3) minutes.
+    SCT capabilities:          (0x0035) SCT Status supported.
+              SCT Feature Control supported.
+              SCT Data Table supported.
+
+    SMART Attributes Data Structure revision number: 16
+    Vendor Specific SMART Attributes with Thresholds:
+    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
+      1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       11
+      5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       10
+      9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       63309
+    12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       12
+    171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       1
+    172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
+    173 Ave_Block-Erase_Count   0x0032   060   060   000    Old_age   Always       -       610
+    174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       6
+    183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
+    184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
+    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
+    194 Temperature_Celsius     0x0022   068   047   000    Old_age   Always       -       32 (Min/Max 24/53)
+    196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       10
+    197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
+    198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
+    199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
+    202 Percent_Lifetime_Used   0x0030   060   060   001    Old_age   Offline      -       40
+    206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       1
+    246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       72065906327
+    247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       2254963742
+    248 Bckgnd_Program_Page_Cnt 0x0032   100   100   000    Old_age   Always       -       15919135484
+    180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2459
+    210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       44
+
+    SMART Error Log Version: 1
+    No Errors Logged
+    ```
+   If the `RAW_VALUE` column for `Reallocated_Sector_Ct` or ` Runtime_Bad_Block` or `Current_Pending_Sector` is > 5, the disk can already be considered as unhealthy. If it is > 20, the disk is out of order.
+  </TabsTab>
+  <TabsTab label="NVMe example">
+   The example below shows SMART data for the NVMe storage type:
+
+    ```
+    === START OF INFORMATION SECTION ===
+    Model Number:                       SKHynix_HFS512GEJ9X164N
+    Serial Number:                      4YC8N008713108B48
+    Firmware Version:                   51770C30
+    PCI Vendor/Subsystem ID:            0x1c5c
+    IEEE OUI Identifier:                0xace42e
+    Controller ID:                      1
+    NVMe Version:                       1.4
+    Number of Namespaces:               1
+    Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
+    Namespace 1 Formatted LBA Size:     512
+    Namespace 1 IEEE EUI-64:            ace42e 003abd04e2
+    Local Time is:                      Wed Jan 22 11:21:05 2025 CET
+    Firmware Updates (0x16):            3 Slots, no Reset required
+    Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
+    Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
+    Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
+    Maximum Data Transfer Size:         64 Pages
+    Warning  Comp. Temp. Threshold:     86 Celsius
+    Critical Comp. Temp. Threshold:     87 Celsius
+
+    Supported Power States
+    St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
+    0 +   4.5000W       -        -    0  0  0  0      100     100
+    1 +   3.0000W       -        -    1  1  1  1      200     200
+    2 +   0.6000W       -        -    2  2  2  2      400     400
+    3 -   0.0150W       -        -    3  3  3  3     2000    2000
+    4 -   0.0030W       -        -    4  4  4  4     5000   10000
+
+    Supported LBA Sizes (NSID 0x1)
+    Id Fmt  Data  Metadt  Rel_Perf
+    0 +     512       0         0
+
+    === START OF SMART DATA SECTION ===
+    SMART overall-health self-assessment test result: PASSED
+
+    SMART/Health Information (NVMe Log 0x02)
+    Critical Warning:                   0x00
+    Temperature:                        42 Celsius
+    Available Spare:                    100%
+    Available Spare Threshold:          10%
+    Percentage Used:                    1%
+    Data Units Read:                    5,718,407 [2.92 TB]
+    Data Units Written:                 9,717,865 [4.97 TB]
+    Host Read Commands:                 43,061,485
+    Host Write Commands:                142,156,172
+    Controller Busy Time:               5,906
+    Power Cycles:                       1,315
+    Power On Hours:                     2,261
+    Unsafe Shutdowns:                   56
+    Media and Data Integrity Errors:    0
+    Error Information Log Entries:      0
+    Warning  Comp. Temperature Time:    0
+    Critical Comp. Temperature Time:    0
+    Temperature Sensor 1:               44 Celsius
+    Temperature Sensor 2:               42 Celsius
+
+    Error Information (NVMe Log 0x01, 16 of 256 entries)
+    No Errors Logged
+
+    Read Self-test Log failed: Invalid Field in Command (0x002)
+    ```
+  </TabsTab>
+</Tabs>
+
+<Message type="note">
+If you encounter **Health status: Failed** or **Failing Now**, the disk is considered out of order. Make sure that you have backups, then open a [support ticket](/account/how-to/open-a-support-ticket/) and ask for the disk to be replaced, indicating the serial number with the result of the `smartctl` command.
+</Message>