Get Started With Greenplum!

Sponsored by Pivotal -

FREE - Auto-Deploy Open Source Greenplum on AWS.

Now is the time to explore Greenplum OSS for FREE on Amazon Web Services.  At a cost savings of 30%+ over both Amazon Redshift and Google Big Query you can build and test an Open Source Greenplum analytics cluster directly from zData, with no limitations.     Try it for yourself, and find out all of the benefits and features Greenplum OSS has to offer!  

The world’s first open source massively parallel clustered data warehouse is now available on-demand as a zero cost service from zData Inc.  The Open Source Greenplum Database® is geared towards big data analytics & data science features providing powerful and rapid analytics on petabyte scale data volumes.


Advantages of Open Source Greenplum 
  • Cost Savings –  30% over Redshift and Big Query
  • Cloud Native Application
  • Massively Parallel
  • Shared-Nothing database
  • Advanced Analytics Platform
  • Connect Directly to S3 as external tables
  • Multiple node and cluster capability
  • 3x – 10x data compression

Summary of Greenplum
  • Linear scalability Shared-nothing architecture and parallel query optimization ensure that performance and capacity increase linearly to 100s of nodes and 1000s of processing cores.
  • MapReduce support MapReduce has been proven as a technique for high-scale data analysis by Internet leaders such as Google and Yahoo. With Greenplum, this capability is available in-house to enterprises.
  • SQL standard Comprehensive SQL-92 and SQL-99 support with SQL 2003 OLAP extensions. All queries are parallelized and executed across the entire system.
  • Unified analytical processing All queries and analysis (Madlib, Geo spatial, SQL, MapReduce, R, etc) are executed on the same parallel dataflow engine, allowing analysts, developers and statisticians to analyze data using a common infrastructure.
  • Programmable parallel analytics Offers a new level of parallel analysis capabilities for mathematicians and statisticians, with support for R, linear algebra and machine learning primitives.
  • In-database compression Utilizes industry-leading compression technology to increase performance and dramatically reduce the space required to store data. Customers can expect to see a 3-10x disk space reduction with a corresponding increase in effective I/O performance.
  • Petabyte-scale loading High-performance parallel data loader executing simultaneously across all cluster nodes facilitates load rates in excess of 6.5TB/ hr.
  • Anywhere data access Allows queries to be executed from the database against external data sources, returning data in parallel, regardless of their location, format, or storage medium.Page 14
  • Dynamic expansion Allows companies to easily add data warehouse capacity in small or large increments, and avoid costly appliance or SMP server upgrades.
  • Workload management Allows administrators to create role-based resource queues to divide up resources and manage the load on the system.
  • Centralized administration Provides cluster-wide management tools and utilities that allow administrators to manage the database as if it was a single system.
  • Support For Indexes Greenplum supports B-Tree, Hash, Bitmap, GiST, and GIN, which allows for a rich indexing capability, ensuring data architects have the tools necessary to implement the optimal design.
  • Industry standard interfaces Supports standard database interfaces (SQL, ODBC, JDBC, DBI) and is interoperable with market-leading business intelligence and extract/transform/load (ETL) tools.

 Sponsored Online Training:

Pivotal Software will be sponsoring online training for the first 100 Redshift customers to try Open Source Greenplum on AWS, a value of over $1200/user.

Download the Open Source Greenplum Quickstart Guide 

Optional Professional Services

Inquire for more! 

For ongoing support and maintenance or for special requests please contact 


Greenplum Overview:

  • Figure 1 – Types of Database Architectures
  • Figure 2 – Anatomy of Greenplum
  • Figure 3 – Automatic hash-based data distribution
  • Figure 4 – Multi-Level tables partitioning
  • Figure 5 – Master server performs global planning dispatch
  • Figure 6 – gNet interconnect manages the flow of data between nodes
  • Figure 7 – Parallel Dataflow engine operates in parallel across 10’s or 100s of servers
  • Figure 8 – Both SQL and MapReduce are processed on the same parallel infrastructure