Opinary generates recommendations faster on Cloud Run

Editor’s note:Berlin-based startup Opinary migrated their machine learning pipeline from Google Kubernetes Engine (GKE) to Cloud Run. After making a few architectural changes, their pipeline is now faster and more cost-efficient. They reduced the time to generate a recommendation from 20 seconds to a second, and realized a remarkable 50% cost reduction.In this post, Doreen Sacker and Héctor Otero Mediero share with us a detailed and transparent technical report of the migration.Opinary asks the right questions to increase reader engagementWe’re Opinary, and our reader polls appear in news articles globally. The polls let users share their opinion with one click and see how they compare to other readers. We automatically add the most relevant reader polls using machine learning. We’ve found that the polls help publishers increase reader retention, boost subscriptions, and improve other article success metrics. Advertisers benefit from access to their target groups contextually on premium publishers’ sites, and from high-performing interaction with their audiences.Let’s look at an example of one of our polls. Imagine reading an article on your favorite news site about whether or not to introduce a speed limit on the highway. As you might know, long stretches of German Autobahn still don’t have a legal speed limit, and this is a topic of intense debate. Critics of speeding point out the environmental impact and casualty toll. Opinary adds this poll to the article:Diving into the architecture of our recommendation systemHere’s how we’ve architected our system originally on GKE. Our pipeline starts with an article URL, and delivers a recommended poll to add to the article. Let’s take a more detailed look at the various components that make this happen. Here’s a visual overview: First, we’ll push a message with the article URL to a Pub/Sub topic (a message queue). The recommender service pulls the message from the queue in order to process it. Before this service can recommend a poll, it needs to complete a few steps, which we’ve separated out into individual services. The recommender service sends a request to these services one-by-one and stores the results in a Redis store. These are the steps: The article scraper service scrapes (downloads and parses) the article text from the URL.The encoder service encodes the text into text embeddings (we use the universal sentence encoder).The brand safety service detects if the article text includes descriptions of tragic events, such as death, murder, or accidents, because we don’t want to add our polls into these articles. With these three steps completed, the recommendation service can recommend a poll from our database of pre-existing polls, and submit it to an internal database we call Rec Store. This is how we end up recommending adding a poll about introducing a speed limit on the German Autobahn.Why we decided to move to Cloud RunCloud Run looked attractive to us for two reasons. First, because it automatically scales down all the way to zero container instances if there are no requests, we expected we would save costs (and we did!). Second, we liked the idea of running our code on a fully-managed platform without having to worry about the underlying infrastructure, especially since our team doesn’t have a dedicated data engineer (we’re both data scientists).As a fully-managed platform, Cloud Run has been designed to make developers more productive. It’s a serverless platform that lets you run your code in containers, directly on top of Google’s infrastructure. Deployments are fast and automated. Fill in your container image URL and seconds later your code is serving requests. Cloud Run automatically adds more container instances to handle all incoming requests or events, and removes them when they’re no longer needed. That’s cost-efficient, and on top of that Cloud Run doesn’t charge you for the resources a container uses if it’s not serving requests. The pay-for-use cost model was the main motivation for us to migrate away from GKE. We only want to pay for the resources we use – and not for a large idle cluster during the night.Enabling the migration to Cloud Run with a few changesTo move our services from GKE to Cloud Run, we had to make a few changes. Change the Pub/Sub subscriptions from pull to push. Migrate our self-managed Redis database in the cluster to a fully-managed Cloud Memorystore instance. This is how our initial target architecture looks in a diagram: Changing Pub/Sub subscriptions from pull to pushSince Cloud Run services scale with incoming web requests, your container must have an endpoint to handle requests. Our recommender service originally didn’t have an endpoint to serve requests, because we used the Pub/Sub client library to pull messages. Google recommends to use push subscriptions instead of pull subscriptions to trigger Cloud Run from Pub/Sub. With a push subscription, Pub/Sub delivers messages as requests to an HTTPS endpoint. Note that this doesn’t need to be Cloud Run, it can be any HTTPS URL. Pub/Sub guarantees delivery of a message by retrying requests that return an error or are too slow to respond (using a configurable deadline). Introducing a Cloud Memorystore Redis instanceCloud Run adds and removes container instances to handle all incoming requests. Redis doesn’t serve HTTP requests, and it likes to have one or a few stateful container instances attached to a persistent volume, instead of disposable containers that start on-demand.We created a Memorystore Redis instance to replace the in-cluster Redis instance. Memorystore instances have an internal IP address on the project’s VPC network. Containers on Cloud Run operate outside of the VPC. That means you have to add a connector to reach internal IP addresses on the VPC. Read the docs to learn more about Serverless VPC access.Making it faster using Cloud TraceThis first part of our migration went smoothly, but while we were hopeful that our system would perform better, we would still regularly spend almost 20 seconds generating a recommendation. We used Cloud Trace to figure out where requests were spending time. This is what we found:To handle a single request our code made roughly 2,000 requests to Redis. Batching all these requests into one request was a big improvement. The VPC connector has a default maximum limit on network throughput that was too low for our workload. Once we changed it to use larger instances, response times improved.As you can see below, when we rolled out these changes, we realized a noticeable performance benefit. Waiting for responses is expensiveThe changes described above led to scalable and fast recommendations. We reduced the average recommendation time from 10 seconds to under 1 second. However, the recommendation service was getting very expensive, because it spent a lot of time doing nothing, waiting for other services to return their response.The recommender service would receive a request, and wait for other services to return a response. As a result, many container instances in the recommender service were running but were essentially doing nothing except waiting. Therefore, the pay-per-use cost model of Cloud Run leads to high costs for this service. Our costs went up by a factor of 4 compared with the original setup on Kubernetes.Rethinking the architectureTo reduce costs, we needed to rethink our architecture. The recommendation service was sending requests to all other services, and would wait for their responses. This is called an orchestration pattern. To have the services work independently, we changed to a choreography pattern. We needed the services to execute their tasks one after the other, but without a single service waiting for other services to complete. This is what we ended up doing:We changed the initial entrypoint to be the article scraping service, rather than the recommender service. Instead of returning the article text, the scraping service now stores the text in a Cloud Storage bucket. The next step in our pipeline is to run the encoder service, and we invoke it using an EventArc trigger.EventArc lets you asynchronously deliver events from Google services, including those from Cloud Storage. We’ve set an EventArc trigger to fire an event as soon as the article scraper service adds the file to the Cloud Storage bucket. The trigger sends the object information to the encoder service using an HTTP request. The encoder service does its processing and saves the results in a Cloud Storage bucket again. One service after the other can now process and save the intermediate results in Cloud Storage for the next service to use.Now that we asynchronously invoke all services using EventArc triggers, no single service is actively waiting for another service to return results. Compared with the original setup on GKE, our costs are now 50% lower. Advice and conclusionsOur recommendations are now fast, scalable, and our costs are half as much as the original cluster setup.Migrating from GKE to Cloud Run is easy for container-based applications.Cloud Trace was useful for identifying where requests were spending time.Sending a request from one Cloud Run service to another and synchronously waiting for the result turned out to be expensive for us. Asynchronously invoking our services using EventArc triggers was a better solution. Cloud Run is under active development and new features are being added frequently, which makes it a nice developer experience overall.Related ArticleHow to use Google Cloud Serverless tech to iterate quickly in a startup environmentHow to use Google Cloud Serverless tech to iterate quickly in a startup environment.Read ArticleRelated ArticleCloud Wisdom Weekly: 5 ways to reduce costs with containersUnderstand the core features you should expect of container services, including specific advice for GKE and Cloud Run.Read ArticleRelated ArticleHow Einride scaled with serverless and re-architected the freight industryEinride, a Swedish freight mobility company, is partnering with Google Cloud to reimagine the freight industry as we know it.Read Article
Quelle: Google Cloud Platform

Azure Confidential Computing on 4th Gen Intel Xeon Scalable Processors with Intel TDX

Microsoft continues to be the cloud leader in confidential computing, and the Azure team is excited to continue our leadership by partnering with Intel to offer confidential computing on 4th Gen Intel Xeon Scalable processors with Intel Trusted Domain Extensions (Intel TDX) later this year, enabling organizations in highly regulated industries to lift and shift their workloads that handle sensitive data to scale in the cloud. Intel TDX meets the Confidential Computing Consortium (CCC) standard for hardware-enforced memory protection not controlled by the cloud provider, all while delivering minimal performance impact with no code changes. 

Azure and Intel enable innovative use cases

Across industries, Microsoft Azure customers use confidential computing with Intel processors to achieve higher levels of data privacy and mitigate risks associated with unauthorized access to sensitive data or intellectual property. They are leveraging innovative solutions such as data clean rooms to accelerate the development of new healthcare therapies, and privacy-preserving digital asset management solutions for the financial industry. These scenarios and more are in production today, leveraging 3rd Gen Intel Xeon Scalable processors with Intel Software Guard Extensions (Intel SGX), a foundational technology of the Azure confidential computing portfolio. In fact, Azure was the first major cloud provider to offer confidential computing in the cloud with virtual machines (VMs) enabled with Intel SGX application isolation. As founding members of the CCC, Microsoft and Intel work with numerous other member organizations to define and accelerate adoption of confidential computing. This effort includes contributions to several open source projects. The Azure team looks forward to extending this collaboration by bringing to market Intel TDX–based services in Azure.

Intel TDX extends Azure's existing confidential computing offerings

Today, Azure’s DCsv3 VMs offer application isolation using Intel SGX, delivering the smallest trust boundary of any confidential computing technology today. The addition of Intel TDX expands our portfolio to offer isolation at the VM, container or application levels to meet the diversity of customer needs. Azure is the only major cloud provider committed to offering both VM-level and application-level confidential computing offerings. Both are supported by Intel’s hardware root of trust and address the attestation requirements that meet the confidential computing industry standard. Both Intel TDX and Intel SGX technologies provide capabilities that help remove the cloud operator’s access to data, including removing the hypervisor from the trust boundary. 

Removing trust in the hypervisor

While Azure has engineered our hypervisor to be very secure, we are seeing a growing number of customers seeking further protections to meet data sovereignty and regulatory compliance. These customers require increased isolation and protection of their workloads to reduce the risk of unauthorized data access. As such, Microsoft leverages hardware control over hypervisors to protect customer data. With Intel-based confidential computing solutions on Azure, altering the hypervisor does not allow Azure operators to read or alter customer data in memory.

Establishing trust via attestation

Attestation is a critical concept of confidential computing. It allows customers to verify the third-party hardware root of trust and software stack prior to allowing any code to access and process data. With Intel TDX, the attestation is done against the entire VM or container, each with a unique hardware key to keep memory protected. With Intel TDX, we will offer attestation support with Microsoft Azure Attestation as standard and will also partner closely with Intel on their upcoming trust service, code-named "Project Amber," to meet the security requirements of customers.

Confidential computing takes off

Many Azure confidential computing customers can attest to the value they receive from our existing Intel confidential computing offerings.

Novartis Biome uses BeeKeeperAI’s EscrowAI confidential clean room solution on Azure confidential computing for the training and validation of algorithms to predict instances of a rare childhood condition using real patient data from health records, while maintaining privacy and compliance.

“Rare diseases are often challenging to diagnose and if left untreated, they can significantly diminish a patient’s quality of life. With BeeKeeperAI, our scientists were able to securely access a large gold standard dataset that enabled us to improve the predictive capabilities of our algorithm, bringing us much closer to identifying patients early in the disease course and to improving their outcomes.” —Robin Roberts, Co-founder and Chief Operating Officer, Novartis Biome

Fireblocks provides enterprise-grade secure infrastructure for moving, storing, and issuing digital assets. They use Intel confidential computing technology on Azure to hold one of the keys to its wallets.

"Some of the biggest cryptocurrency businesses, financial institutions, and enterprises in the world trust Fireblocks software and APIs to provide digital custody solutions, manage treasury operations, access DeFi, mint and burn tokens, and manage their digital asset operations. We leverage Azure to hold one of the keys to our wallets due to Azure Confidential Computing … " —Michael Shaulov, CEO and Co-founder, Fireblocks

Carbon Asset Solutions soil-based carbon credit collection and tracking system uses immutable ledger technology provided by Azure confidential ledger.

"Carbon Asset Solutions is a world-first precision measurement, recording, and verification platform focused on atmospheric carbon removal through soil carbon sequestration. With Azure, we deliver higher integrity Carbon Credits than any other method." —Sara Saeidi, Chief Operating Officer, Carbon Asset Solutions

Azure’s vision for the confidential cloud

We see a future where confidential computing is standard and pervasive both in the cloud and at the edge within all Azure service offerings. Customers will be able to more confidently use the cloud for their most sensitive data workloads while verifying the environment and staying in full control of data access. We look forward to the launch of 4th Gen Intel Xeon Scalable processors and offering Intel TDX–enabled instances with VM-level data protection and performance improvements later this year, continuing our partnership with Intel to help transition Azure to the confidential cloud.

Learn more

Sign up for early access to Intel TDX confidential VMs coming later this year.

Get started today deploying VMs and AKS nodes with Intel SGX application enclaves.

Current Azure confidential computing–based services featuring Intel technology:

Foundational infrastructure as a service (IaaS) elements utilizing Intel SGX such as Virtual Machines with Application Enclaves and Intel SGX based confidential computing nodes on Azure Kubernetes Service.
Azure first-party confidential computing software as a service (SaaS) such as Microsoft Azure Attestation, Azure confidential ledger, Azure Managed Confidential Consortium Framework (preview), and Azure Key Vault Managed HSM.
Various third-party confidential computing SaaS, many of which are captured in this webinar series.

Open source tools for developing Intel-based confidential computing apps on Azure:

The Open Enclave (OE) Software Development Kit (SDK)
The EGo SDK
The Intel SGX SDK
The Confidential Consortium Framework (CCF)
Gramine
Occlum
MarbleRun
SCONE

Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.
Quelle: Azure