Obtain Ubuntu Server Raid Information And Data Restoration


Introduction

Nowadays, there are many servers using Ubuntu Server as the operation system. And in production environment, it’s very important for the developers to set up the servers with raid configuration to speed up the disk IO and provide data protection. Mostly, for our developers, we are not the ones to set up the servers. But those raid configuration information would be a great help for our works. For example, what’s the disk type of the machine? Is this SSD or just normal HDD? What’s the raid mode, raid 0 for speed or raid 1 for fault tolerance? These information would be helpful when setting up services like database or deciding which machines to be used to set up a cluster. In this article, I will introduce how to view those raid information under the Ubuntu Server 20.04 version.

Preparation

The first information we would like to know is the server raid card mode. Use the lspci command to verify the RAID controller on the system.

lspci | grep -i raid

Below is the example output:

zichen:~$ lspci | grep -i raid
05:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS 2108 [Liberator] (rev 05)

The MegaRAID is the most common card on Dell Servers manufactured by Broadcom. Normal disk management software cannot get information of disks in raid mode. To view the raid information, we will need to use the megacli tool.

So, First, let’s install the megacli tool. To install megacli we need to make sure libncurses5 is installed. If not just do

sudo apt-get install libncurses5Code language: JavaScript (javascript)

Then, we can use wget to download the MegaCli Package.

wget http://hwraid.le-vert.net/debian/pool-stretch/megacli/megacli_8.07.14-2%2BDebian.stretch.9.9_amd64.deb
Code language: JavaScript (javascript)

Install the MegaCli

dpkg -i megacli_8.07.14-2+Debian.stretch.9.9_amd64.deb
Code language: CSS (css)

Update the repositories after the installation.

apt-get update
Code language: JavaScript (javascript)

Use this MegaCLI command to check raid information:

sudo megacli -CfgDsply -a0

Below is the output:

==============================================================================
Adapter: 0
Product Name: PERC H700 Integrated
Memory: 512MB
BBU: Present
Serial No: 18F00HV
==============================================================================
Number of DISK GROUPS: 1

DISK GROUP: 0
Number of Spans: 1
SPAN: 0
Span Reference: 0x00
Number of PDs: 6
Number of VDs: 1
Number of dedicated Hotspares: 0
Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :Virtual Disk 00
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
Size                : 3.271 TB
Sector Size         : 512
Parity Size         : 0
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disabled
Encryption Type     : None
Bad Blocks Exist: Yes
Is VD Cached: Yes
Cache Cade Type : Read Only
Physical Disk Information:
Physical Disk: 0
Enclosure Device ID: 32
Slot Number: 0
Drive's position: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: N/A
Device Id: 0
WWN: 50000394F8108DA9
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SASCode language: PHP (php)

From here,

RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0

We can know that the raid mode is set to raid 0. Since both the Primary and secondary is 0.

If it’s a raid 10 server, we will see the following output:

RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0

Another tool we will need is the smartctl , To install we just run:

apt-get install smartmontools
Code language: JavaScript (javascript)

With smartctl, we can get the disk inforamtion(S.M.A.R.T) under a raid array, the below command will give you the first disk under the raid disk array, 0 is first, to get the second just change 0 to 1 etc.

sudo smartctl -a /dev/sda -d megaraid,0

This will give us outputs like this:

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-88-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     WD Blue and Green SSDs
Device Model:     WDC  WDS200T2B0A-00SM50
Serial Number:    211709800332
LU WWN Device Id: 5 001b44 8ba1447c5
Firmware Version: 415020WD
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Oct 28 22:42:53 2021 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: ATA return descriptor not supported by controller firmware
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   ---    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   ---    Old_age   Always       -       1150
 12 Power_Cycle_Count       0x0032   100   100   ---    Old_age   Always       -       22
165 Block_Erase_Count       0x0032   100   100   ---    Old_age   Always       -       21692639
166 Minimum_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       1
167 Max_Bad_Blocks_per_Die  0x0032   100   100   ---    Old_age   Always       -       229
168 Maximum_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       4
169 Total_Bad_Blocks        0x0032   100   100   ---    Old_age   Always       -       1456
170 Grown_Bad_Blocks        0x0032   100   100   ---    Old_age   Always       -       0
171 Program_Fail_Count      0x0032   100   100   ---    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   ---    Old_age   Always       -       0
173 Average_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       1
174 Unexpected_Power_Loss   0x0032   100   100   ---    Old_age   Always       -       14
184 End-to-End_Error        0x0032   100   100   ---    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   ---    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   ---    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   071   044   ---    Old_age   Always       -       29 (Min/Max 21/44)
199 UDMA_CRC_Error_Count    0x0032   100   100   ---    Old_age   Always       -       0
230 Media_Wearout_Indicator 0x0032   001   001   ---    Old_age   Always       -       0x002c000a002c
232 Available_Reservd_Space 0x0033   100   100   004    Pre-fail  Always       -       100
233 NAND_GB_Written_TLC     0x0032   100   100   ---    Old_age   Always       -       2564
234 NAND_GB_Written_SLC     0x0032   100   100   ---    Old_age   Always       -       4996
241 Host_Writes_GiB         0x0030   253   253   ---    Old_age   Offline      -       4415
242 Host_Reads_GiB          0x0030   253   253   ---    Old_age   Offline      -       1754
244 Temp_Throttle_Status    0x0032   000   100   ---    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported
Code language: PHP (php)

The number 241, Host_Writes_GiB tells us the total write data, this is very important if we are using ssd.

241 Host_Writes_GiB         0x0030   253   253   ---    Old_age   Offline      -       4415

Restore Data

We can restore data using the megacli, but in most case, we will have physical access to the server since we need to change the disk. With the above tools, we can easily identify the broken disk.

After we switched the broken disk with the new disk, we just need to open the machine and press CTRL + R when it indicates this on the boot screen, this will get us to the raid controller GUI page. On this page, we will want to press CTRL+ N to the second tab of this GUI and selected the VD we want to fix then press F2. After that just following the instructions on the GUI to finish the restoration.


5 responses to “Obtain Ubuntu Server Raid Information And Data Restoration”

  1. I have learn a few just right stuff here.

    Certainly price bookmarking for revisiting. I wonder how
    a lot attempt you set to make this sort of wonderful informative site.

    • Hello Twicsy,

      Thanks for the comment. I will keep adding new stuffs I learned into this blog. I have several draft articles ready(I learned a lot new stuffs recently lol). I will post them once I finished them.

  2. I forget what I was searching for but I stumbled upon your site and loved it! Something about the content and layout that I felt compelled to give you some positive feedback. Anyway keep up the great work and have a pleasant day.

  3. Hello to all, how is everything, I think every one is getting more from this web site, and your views are pleasant in support of new people.

Leave a Reply

Your email address will not be published. Required fields are marked *