Discord's migration from Redshift to BigQuery: lessons learned

Editor’s note: We’re hearing today from Discord, maker of a popular voice, video, and text chat app for gaming. They have to bring a great experience to millions of customers concurrently, and keep up with demand. Here’s how they moved from Redshift to Google Cloud’s BigQuery to support their growth. At Discord, our chat app supports more than 50 million monthly users. We had been using Amazon Redshift as our data warehouse solution for several years, but due to both technical and business reasons, we migrated completely to BigQuery. Since migrating, we’ve been able to serve users faster, incorporate AI and ML capabilities, ensure compliance, and explore usage analytics.At Discord, our chat app supports more than 50 million monthly users. We had been using Amazon Redshift as our data warehouse solution for several years, but due to both technical and business reasons, we migrated completely to BigQuery. Since migrating, we’ve been able to serve users faster, incorporate AI and ML capabilities and ensure compliance.The challenges that led us to migrateOur team here at Discord began to consider alternative solutions once we realized we were encountering technical and cost limitations on Redshift. We knew that if we wanted our data warehouse to scale with our business, we had to find a new solution. On the technical side, we realized we were going to hit the maximum cluster size (128 compute nodes) for DC2 type nodes in six months, given our growing usage patterns. The cost for using Redshift was also becoming a challenge. We had been paying hundreds of thousands of dollars  a month, not including storage and the cost of network ingress/egress between Google Cloud and AWS. (We’d been using Google Cloud for our chat application already.)We looked at some Google Cloud-native solutions and identified that BigQuery would be a natural solution for us, given its large scale (with known customers that were larger than Discord), proximity to where our data resides, and the fact that Google Cloud already had pipelines in place for loading data. Another major reason for our choice of BigQuery was that it is completely serverless, so it wouldn’t require any upfront hardware provisioning and management. We were also able to take advantage of a brand-new feature called BigQuery Reservations to gain significant savings with fixed slot usage.Migration tradeoffs and challengesWe had some preparation to do ahead of and during the migration. One initial challenge was that while both Redshift and BigQuery are designed to handle analytical workloads, they are very different. As an example, in Redshift we had a denormalized set of tables where each of our application events ended up in its own table, and most of our analytics queries need to be joined together. Running an analytics query on user retention involved analyzing data across different events and tables. So running this kind of JOIN-heavy workload resulted in performance differences out of the box. We relied on order by and row number of large swaths of data previously, but that method is supported by BigQuery with limitations. Redshift and BigQuery do partitioning differently, so joining on something like user ID isn’t as fast, because the data layout is different. So we used timestamp partitioning and clustering on JOIN fields, which increased performance in BigQuery. Other aspects of BigQuery brought significant advantages right away, making the migration worthwhile. Those include ease of management (one provider vs. multiple, no maintenance windows, no VACUUM/ANALYZE); scalability; and price for performance.There were some other considerations we took into account when undertaking this migration. We had to convert more than a hundred thousand lines of SQL into BigQuery syntax, so we used the ZetaSQL library and PostgreSQL parser to implement a conversion tool. To do this, we forked an open source parser and made modifications to the grammar so it could parse all of our existing Redshift SQL. Building this was a non-trivial part of the migration. The tool can walk an abstract syntax tree (also known as a parse tree) from templated Redshift and output the equivalent templated for BigQuery. In addition, we re-architected the way we built our pre-aggregated views of data to support BigQuery. Moving to a fixed slot model using BigQuery Reservations allowed for workload isolation, consistent performance, and predictable costs. The last migration step was getting used to the new paradigm post-migration and educating stakeholders on the new operating model. Migrating from Redshift to BigQuery has been game-changing for our organization. We’ve been able to overcome performance bottlenecks and capacity constraints as well as fearlessly unlock actionable insights for our business. Spencer Aiello Tech Lead and Manager, machine learning at DiscordUsing BigQuery as our data foundationSince completing our migration, BigQuery has helped us accomplish our goals around scale, user privacy, and GDPR compliance. BigQuery now supports all of our reporting, dashboarding, machine learning, and data exploratory use cases at Discord. Thousands of queries run against our data stores every day. We wouldn’t have been able to scale our queries on Redshift like we can with BigQuery. With BigQuery, we are able to keep our operations running smoothly without disruptions to our business. This was a breath of fresh air for us because at the end of our Redshift usage, we were having over 12-hour downtimes just to conduct nightly maintenance. These vacuum operations could fail and cause us to slip on internal SLAs beyond 24 hours before we could ingest data. To address this challenge in the past, we had to start actively deleting and truncating tables in Redshift, which led to incomplete and less accurate insights. We’ve also seen other benefits in the move to BigQuery: User data requests have become cheaper and faster to service; BigQuery streaming inserts let us observe machine learning experiments and model results from AI Platform in real time; and we can easily support new use cases for trust and safety, finance, and Discord usage analytics. It’s safe to say that BigQuery is the bedrock for all analysis at Discord. It’s a huge benefit that we’re now able to offer consistent performance to users without worrying about resource constraints. We can now support thousands of queries over hundreds of terabytes of data every day without having to think too much about resources. We can share access to analytics insights across teams, and we’re well-prepared for the next step of using BigQuery’s AI and ML capabilities. Learn more about Discord and about BigQuery.
Quelle: Google Cloud Platform

Published by