Secondary Indexes on Column Store accelerate SQL Data Warehouse look up queries

We are pleased to announce that Azure SQL Data Warehouse now supports the creation of secondary B-Tree indexes (also referred to as non-clustered indexes or NCI) on column store tables (also referred to as clustered column store indexes or CCI). Most analytic queries aggregate large amounts of data and are served well by scanning the column store segments directly. However, there is often a need to look for a ‘needle in a haystack’ which translates to a query that does a lookup of a single row or a small range of rows. Such look up queries can get orders of magnitude (even 1000x) improvement in response time and potentially run in sub-second if there is a B-Tree index on the filter column.

SQL Data Warehouse is your go-to SQL-based view across data, offering a fast, fully managed, petabyte-scale cloud solution. It is highly elastic, enabling you to provision and scale up to 60 times larger in seconds. You can scale compute and storage independently, allowing you to range from burst to archival scenarios, and pay based off what you're using instead of being locked into a confined bundle. Plus, SQL Data Warehouse offers the unique option to pause compute, giving you even more freedom to better manage your cloud costs.

Prior to the availability of secondary B-Tree indexes on column store tables, users could meet response time requirements for their point look up queries by duplicating column store data in a clustered B-Tree index. However, the duplication of data adds implementation complexity, storage cost as well as latency. Some of these users have now tried the new secondary indexes over column store and are delighted that they can get the same interactive response time without the data duplication.

How to Create a Secondary Index on a Column Store Table

This follows the same syntax as the generic Create Index Transact-SQL statements. A simple test on 1TB TPC-H data demonstrated that the query time for selecting orders for a given orderkey from lineitem went down from 41 seconds to under a second after a secondary index on orderkey was added.

Best Practices for Using Secondary Indexes

Here are some guidelines to bear in mind when using secondary indexes on column store tables.

Use them for high cardinality columns that are used as filters in queries returning a small number of rows.
Don’t be heavy handed with secondary indexes as there is an overhead to maintaining them during loads. Best to limit to 1 or 2 secondary indexes per table.

Secondary indexes can be created on partitioned column store tables as well. However, as they are local to each distribution and partition, they cannot be used to implement UNIQUE constraint.

Next Steps

In this blog post we talked about benefits of the new functionality offered by secondary B-Tree indexes on column store tables. This is now available in all SQL Data Warehouse Azure regions worldwide. We encourage you to try it out if you have a use case for point lookups.

Learn More

Check out the many resources for learning more about SQL Data Warehouse, including:

What is Azure SQL Data Warehouse?
SQL Data Warehouse best practices
Video library
MSDN forum
Stack Overflow forum
Quelle: Azure

Das könnte Sie auch interessieren:

Published by

da Agency