Greenplum vs Redshift

1st kindly do have a look onto the link Click Here...

Redshift is lack of procedural languages and Greenplum has mapreduce applications.
Obviously, Greenplum runs anywhere while Redshift runs only in AWS.

Redshift only uses instance types with ephemeral storage so if you stop the cluster, you lose the data. 

For instance types with ephemeral storage, you can't replace a disk.  Maybe Amazon does for Redshift instances but everyone else, they recommend to replace the node if you have a disk failure.

Redshift ds2.8xlarge instance type has 16TB of usable storage but the d2.8xlarge instance type has 12 2TB disks.  I'm guessing they are using RAID5 but why?  If they lose a disk, they will replace the entire node unless they do something special for Redshift.  

Greenplum running on d2.8xlarge uses RAID0 so you get denser storage and all 48TB.  If you lose a disk, the software HA kicks in and the mirror segments take over.  You can then shutdown the instance and the Self-Healing kicks in and replaces the node.  Self-Healing also rebalances the data and puts everything back to the original state.

Pivotal recommends to use the R4 series instance type with EBS ST1 storage.  R4.8xlarge with 48TB of EBS ST1 storage performs about the same as d2.8xlarge with 48TB of ephemeral storage.  The big benefit here is you get EBS storage.  You can stop the cluster without losing data (just need to disable the ASG so it doesn't replace nodes).  You get the durability of EBS too.  Lastly, you get snapshots!  Pivotal has "gpsnap" in the AWS Marketplace so you can quickly take a database backup that leverages EBS snapshots.  Redshift can't do this because they don't use EBS storage.

The largest instance type Redshift uses is ds2.8xlarge which has 36 vCPUs and 244GB or RAM.  This has a 10GB network too.  The largest instance type for Greenplum in AWS is r4.16xlarge which has 64 vCPUs and 488GB of RAM.  It also has a 25GB network.  This instance type is significantly faster than r4.8xlarge.

Greenplum in AWS vs Redshift:
- Greenplum is more durable
- Greenplum has denser storage
- Greenplum can take snapshot backups
- Greenplum can utilize faster instance types
- Greenplum can use a 2.5x faster network

There are other differences in the software too.  Greenplum is based on PostgreSQL 8.3 and that will get rebased to later versions while Redshift is a fork of PostgreSQL 8.0.  Greenplum is open source while Redshift isn't.  

And don't forget that if you get tired of Amazon or get better pricing elsewhere, you can run Greenplum anywhere.  Move to Azure or GCP if you want. 

Greenplum is actively developed. Production is currently based on PostgreSQL 8.3 but development is on 9.0 and moving forward quickly. Click Here...

Greenplum is actively developed in Production. They have things like PL/Container etc... Click Here...

Greenplum is developed by Pivotal which has contributors back to PostgreSQL.Org (including at least one committer)

Greenplum can run anywhere (mentioned) but this can't be stressed enough. In the long run, you can save * A LOT * of money by hosting on-prem or just in a colo.

Comments

Popular posts from this blog

GP - Kerberos errors and resolutions

How to set Optimizer at database level in greenplum

GP - SQL Joins