How to Rebuild Data Partitions on a Segment Server

This solutions applies to Rebuild Data Partitions on a DCA V1 Segment Server.

Issue:
Multiple Disk Failure
xfs corruption

Cause:
Dual Raid Failure
xfs filesystem corruption

Solution:
On Rebuilding /data1
On the master server – as user “root”
1. Status check of DCA Health monitor: # dca_healthmond_ ctrl -s
2. Stop the DCA Health monitor: # dca_healthmond_ ctrl –d

On the segment server – as user: “root”
Before we start we need to confirm there are no bad blocks on the swap volume there is some concern because of the way the first drive was replaced.

1. # dd bs=64k if=/dev/sdc of=/dev/zero
(Should take about a minute to run…)
Optional: Reboot the server
2. Kill the gpsmon processes or any other processes accessing /data1.
3. # umount /data2
4. # omreport storage vdisk > vdisk.info
 a. Save the information on Vdisk1
5. # omconfig storage vdisk action=deletevdisk controller=0 vdisk=1
6. # omconfig storage controller action=createvdisk controller=0 raid=r5 size=max
 pdisk=0:0:0,0:0:1,0:0:2,0:0:3,0:0:4,0:0:5 stripesize=128kb readpolicy=ara
7. # mkfs -t xfs -L /data1 -f /dev/sdb
8. # mount /data1
9. # mkdir /data1/primary
10. # mkdir /data1/mirror
11. # chown gpadmin /data1/*
12. # chgrp gpadmin /data1/*

On the master server -as user: “gpadmin
On the master server as user: gpadmin run the following note this will abort running queries.
$ gprecoverseg -F (will rebuild the directories and data)
Use the following command to monitor the recover
$ gpstate -m (Checks to see if the primaries and mirror partitions are in-sync)

Note: You’re looking for all the mirrors to be in sync

Once the “gpstate -m” reports all instances are “Synchronized” you will have mirrors acting as primaries to fix that will require a restart of the database.
$ gpstop -aM fast or gpstart -a

On Rebuilding /data2
On the master server - as user “root”
1. Status check of DCA Health monitor: # dca_healthmond_ ctrl -s
2. Stop the DCA Health monitor: # dca_healthmond_ ctrl -d

On the segment server  as user: “root”
Before we start we need to confirm there are no bad blocks on the swap volume there is some concern because of the way the first drive was replaced.

3. # dd bs=64k if=/dev/sdc of=/dev/zero
(Should take about a minute to run…)
Optional: Reboot the server
4. Kill the gpsmon processes or any other processes accessing /data2.
5. # umount /data2
6. # omreport storage vdisk > vdisk.info
 a. Save the information on Vdisk3
7. # omconfig storage vdisk action=deletevdisk controller=0 vdisk=3
8. # omconfig storage controller action=createvdisk controller=0 raid=r5 size=max
 pdisk=0:0:6,0:0:7,0:0:8,0:0:9,0:0:10,0:0:11 stripesize=128kb readpolicy=ara
9. # mkfs -t xfs -L /data2 -f /dev/sdd
10. # mount /data2
11. # mkdir /data2/primary
12. # mkdir /data2/mirror
13. # chown gpadmin /data2/*
14. # chgrp gpadmin /data2/*

On the master server – as user: “gpadmin
On the master server as user: gpadmin run the following note this will abort running queries.
$ gprecoverseg –F (will rebuild the directories and data)
Use the following command to monitor the recover
$ gpstate -m (Checks to see if the primaries and mirror partitions are in-sync)

Note: You’re looking for all the mirrors to be in sync

Once the “gpstate –m” reports all instances are “Synchronized” you will have mirrors acting as primaries to fix that will require a restart of the database.
$ gpstop -aM fast or gpstart -a

Comments

Popular posts from this blog

GP - Kerberos errors and resolutions

How to set Optimizer at database level in greenplum

GP - SQL Joins