How to Rebuild Data Partitions on a Segment Server
This solutions applies to Rebuild Data Partitions on a DCA V1 Segment Server.
Issue:
Multiple Disk Failure
xfs corruption
Cause:
Dual Raid Failure
xfs filesystem corruption
Solution:
On Rebuilding /data1
On the master server – as user “root”
Before we start we need to confirm there are no bad blocks on the swap volume there is some concern because of the way the first drive was replaced.
On the master server as user: gpadmin run the following note this will abort running queries.
$ gprecoverseg -F (will rebuild the directories and data)
Use the following command to monitor the recover
$ gpstate -m (Checks to see if the primaries and mirror partitions are in-sync)
Note: You’re looking for all the mirrors to be in sync
Once the “gpstate -m” reports all instances are “Synchronized” you will have mirrors acting as primaries to fix that will require a restart of the database.
$ gpstop -aM fast or gpstart -a
On Rebuilding /data2
On the master server - as user “root”
Before we start we need to confirm there are no bad blocks on the swap volume there is some concern because of the way the first drive was replaced.
On the master server as user: gpadmin run the following note this will abort running queries.
$ gprecoverseg –F (will rebuild the directories and data)
Use the following command to monitor the recover
$ gpstate -m (Checks to see if the primaries and mirror partitions are in-sync)
Note: You’re looking for all the mirrors to be in sync
Once the “gpstate –m” reports all instances are “Synchronized” you will have mirrors acting as primaries to fix that will require a restart of the database.
$ gpstop -aM fast or gpstart -a
Issue:
Multiple Disk Failure
xfs corruption
Cause:
Dual Raid Failure
xfs filesystem corruption
Solution:
On Rebuilding /data1
On the master server – as user “root”
1. Status check of DCA Health monitor: # dca_healthmond_ ctrl -s 2. Stop the DCA Health monitor: # dca_healthmond_ ctrl –dOn the segment server – as user: “root”
Before we start we need to confirm there are no bad blocks on the swap volume there is some concern because of the way the first drive was replaced.
1. # dd bs=64k if=/dev/sdc of=/dev/zero (Should take about a minute to run…) Optional: Reboot the server 2. Kill the gpsmon processes or any other processes accessing /data1. 3. # umount /data2 4. # omreport storage vdisk > vdisk.info a. Save the information on Vdisk1 5. # omconfig storage vdisk action=deletevdisk controller=0 vdisk=1 6. # omconfig storage controller action=createvdisk controller=0 raid=r5 size=max pdisk=0:0:0,0:0:1,0:0:2,0:0:3,0:0:4,0:0:5 stripesize=128kb readpolicy=ara 7. # mkfs -t xfs -L /data1 -f /dev/sdb 8. # mount /data1 9. # mkdir /data1/primary 10. # mkdir /data1/mirror 11. # chown gpadmin /data1/* 12. # chgrp gpadmin /data1/*On the master server -as user: “gpadmin”
On the master server as user: gpadmin run the following note this will abort running queries.
$ gprecoverseg -F (will rebuild the directories and data)
Use the following command to monitor the recover
$ gpstate -m (Checks to see if the primaries and mirror partitions are in-sync)
Note: You’re looking for all the mirrors to be in sync
Once the “gpstate -m” reports all instances are “Synchronized” you will have mirrors acting as primaries to fix that will require a restart of the database.
$ gpstop -aM fast or gpstart -a
On Rebuilding /data2
On the master server - as user “root”
1. Status check of DCA Health monitor: # dca_healthmond_ ctrl -s 2. Stop the DCA Health monitor: # dca_healthmond_ ctrl -dOn the segment server as user: “root”
Before we start we need to confirm there are no bad blocks on the swap volume there is some concern because of the way the first drive was replaced.
3. # dd bs=64k if=/dev/sdc of=/dev/zero (Should take about a minute to run…) Optional: Reboot the server 4. Kill the gpsmon processes or any other processes accessing /data2. 5. # umount /data2 6. # omreport storage vdisk > vdisk.info a. Save the information on Vdisk3 7. # omconfig storage vdisk action=deletevdisk controller=0 vdisk=3 8. # omconfig storage controller action=createvdisk controller=0 raid=r5 size=max pdisk=0:0:6,0:0:7,0:0:8,0:0:9,0:0:10,0:0:11 stripesize=128kb readpolicy=ara 9. # mkfs -t xfs -L /data2 -f /dev/sdd 10. # mount /data2 11. # mkdir /data2/primary 12. # mkdir /data2/mirror 13. # chown gpadmin /data2/* 14. # chgrp gpadmin /data2/*On the master server – as user: “gpadmin”
On the master server as user: gpadmin run the following note this will abort running queries.
$ gprecoverseg –F (will rebuild the directories and data)
Use the following command to monitor the recover
$ gpstate -m (Checks to see if the primaries and mirror partitions are in-sync)
Note: You’re looking for all the mirrors to be in sync
Once the “gpstate –m” reports all instances are “Synchronized” you will have mirrors acting as primaries to fix that will require a restart of the database.
$ gpstop -aM fast or gpstart -a
Comments
Post a Comment