Announcing general availability of Confidential GKE Nodes

Today, we’re excited to announce the general availability of Confidential GKE Nodes. Many organizations have made Google Kubernetes Engine (GKE) the foundation of their modern application architectures. While the benefits of containers and Kubernetes can outweigh that of traditional architectures, moving to and running those apps in the cloud often entails careful planning to minimize risk and potential data exposure. To help increase security of your GKE clusters, Confidential GKE Nodes can be used.Part of the growing Confidential Computing product portfolio, Confidential GKE Nodes leverage hardware to make sure your data is encrypted in memory. The GKE workloads you run today can run confidentially without any code changes on your end. Bringing confidential computing to your container workloadsWith Confidential GKE Nodes, you can achieve encryption in-use for data processed inside your GKE cluster, without significant performance degradation. Confidential GKE Nodes are built on the same technology foundation as Confidential VM and utilize AMD Secure Encrypted Virtualization (SEV). This feature allows you to keep data encrypted in memory with node-specific, dedicated keys that are generated and managed by the processor. The keys are generated in hardware during node creation and reside solely within the processor, making them unavailable to Google or other nodes running on the host. Confidential GKE Nodes also leverage Shielded GKE nodes to offer additional protection against rootkit and bootkits, helping to ensure the integrity of the operating system you run on your Confidential GKE Nodes.Mixed node pools and stateful workloads Two new features have been added for the general availability release of Confidential GKE Nodes: mixed node pool support and PersistentVolumes. Mixing confidential node pools with non-confidential node poolsConfidential GKE Nodes can be enabled as a cluster-level security setting or a node pool-level security setting. When enabled at the cluster level, Confidential GKE Nodes enforce the use of Confidential VMs on all worker nodes. Worker nodes in a cluster can only use confidential nodes, and confidential computing can not be disabled on individual node pools. All worker nodes, including the workloads running on them, are encrypted in-use. When enabled at the node level, Confidential GKE Nodes enforce the use of Confidential VMs on specific node pools, so only worker nodes in specified node pools are running confidentially. This new capability can allow a single GKE cluster to run both confidential and non-confidential workloads. Creating regular node pools and confidential node pools in a single cluster can help minimize cluster management. To learn more, see our guide to enabling Confidential GKE Nodes on node pools.Supporting PersistentVolumes for stateful container workloadsConfidential GKE Nodes are great for protecting data in stateless and stateful workloads. Confidential GKE Nodes recently added support for PersistentVolume resources. In GKE, a PersistentVolume is a cluster resource that Pods can use for durable storage and is typically backed by a persistent disk. The pairing of PersistentVolumes with Confidential GKE Nodes is ideal for containerized applications that require block storage.PricingThere is no additional cost to deploy Confidential GKE Nodes, other than the cost of Compute Engine Confidential VM.Get started with this game-changing technologyCreating a GKE cluster that uses Confidential GKE Nodes on all nodes is easy. Simply go to the Cloud Console, click Kubernetes Engine and then click Clusters. Select “Create” and then “Configure” on GKE Standard. Under Cluster, there is a security section where you click the checkbox that says “Enable Confidential GKE Nodes.”GKE clusters can be enabled to run as Confidential under the Security Setting for Kubernetes Engine.Confidential computing transforms the way organizations process data in the cloud while preserving confidentiality and privacy. To learn more, read about our Confidential VMs and get started using your own confidential GKE Nodes today.Related ArticleA deeper dive into Confidential GKE Nodes—now available in previewConfidential GKE Nodes, now in preview, encrypt the memory of your nodes and the workloads that run on top of them.Read Article
Quelle: Google Cloud Platform

Wayfair: Accelerating MLOps to power great experiences at scale

Machine Learning (ML) is part of everything we do at Wayfair to support each of the 30 million active customers on our website. It enables us to make context-aware, real-time and intelligent decisions across every aspect of our business. We use ML models to forecast product demand across the globe, to ensure our customers can quickly access what they’re looking for. Natural language processing (NLP) models are used to analyze chat messages on our website so customers can be redirected to the appropriate customer support team as quickly as possible, without having to wait for a human assistant to become available.. ML is an integral part of our strategy for remaining competitive as a business and supports a wide range of eCommerce engineering processes at Wayfair. As an online furniture and home goods retailer, the steps we take to make the experience of our customers as smooth, convenient, and pleasant as possible determine how successful we are. This vision inspires our approach to technology and we’re proud of our heritage as a tech company, with more than 3,000 in-house engineers and data scientists working on the development and maintenance of our platform. We’ve been building ML models for years, as well as other homegrown tools and technologies, to help solve the challenges we’ve faced along the way. We began on-prem but decided to migrateto Google Cloud in 2019, utilizing a lift-and-shift strategy to minimize the number of changes we had to make to move multiple workloads into the cloud. Among other things, that meant deploying Apache Airflow clusters on the Google Cloud infrastructure and retrofitting our homegrown technologies to ensure compatibility. While some of the challenges we faced with our legacy infrastructure were resolved immediately, such as lack of scalability, others remained for our data scientists. For example, we lacked a central feature store and relied on a shared cluster with a shared environment for workflow orchestration, which caused noisy neighbor problems. As a Google Cloud customer, however, we can easily access new solutions as they become available. So in 2021, when Google Cloud launched Vertex AI, we didn’t hesitate to try it out as an end-to-end ML platform to support the work of our data scientists.One AI platform with all the ML tools neededAs big fans of open source, platform-agnostic software, we were impressed by Vertex AI Pipelines and how they work on top of open-source frameworks like Kubeflow. This enables us to build software that runs on any infrastructure. We enjoyed how the tool looks, feels, and operates. Within six months, we moved from configuring our infrastructure manually to conducting a POC, to a first production release.Next on our priority list was to use Vertex AI Feature Store to serve and use AI technologies as ML features in real-time, or in batch with a single line of code. Vertex AI Feature Store fully manages and scales its underlying infrastructure, such as storage and compute resources. That means our data scientists can now focus on feature computation logic, instead of worrying about the challenges of storing features for offline and online usage.While our data scientists are proficient in building and training models, they are less comfortable setting up the infrastructure and bringing the models to production. So, when we embarked on an MLOps transformation, it was important for us to enable data scientists to leverage a  platform as seamlessly as possible without having to know all about its underlying infrastructure. To that end, our goal was to build an abstraction on Vertex AI. Our simple python-based library interacts with the Vertex AI Pipeline and Vertex AI Features Store. And a typical data scientist can leverage this setup without having to know how Vertex AI works in the backend. That’s the vision we’re marching towards–and we’ve already started to notice its benefits.Reducing hyperparameter tuning from two weeks to under one hourWhile we enjoy using open source tools such as Apache Airflow, the way we were using it  was creating issues for our data scientists. And we frequently ran into infrastructure challenges, carried over from our legacy technologies, such as support issues and failed jobs. So we built a CI/CD pipeline using Vertex AI Pipelines, based on Kubeflow, to remove the complexity of model maintenance.Now everything is well arranged, documented, scalable, easy to test, and well organized in terms of best practices. This incentivizes people to adopt a new standardized way of working, which in turn brings its own benefits. One example that illustrates this is hyperparameter tuning, an essential part of controlling the behavior of a machine learning model. In machine learning, hyperparameter tuning or optimization is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. Every machine learning model will have a different hyperparameter, whose value is set before the learning process begins. And a good choice of hyperparameters can make an algorithm perform optimally. But while hyperparameter tuning is a very common process in data science, there are no standards in terms of how this should be done. Doing it in Python using a legacy infrastructure would take a data scientist on average two weeks. We have over 100 data scientists at Wayfair, so standardizing this practice and making it more efficient was a priority for us. With a standardized way of working on Vertex AI, all our data scientists can now leverage our code to access CI/CD, monitoring, and analytics out-of-the-box to conduct hyperparameter tuning in just one day. Powering great customer experiences with more ML-based functionalitiesNext, we’re working on a docker container template that will enable data scientists to deploy a running ‘hello world’ Vertex AI pipeline. It can take a data science team more than two months to get a ML model fully operational on average. With Vertex AI, we expect to cut down that time to two weeks. Like most of the things we do, this will have a direct impact on our customer experience. It’s important to remember that some ML models are more complex than others. Those that have an output that the customer immediately sees while navigating the website, such as when an item will be delivered to their door, are more complicated. This prediction is made by ML models and automated by Vertex AI. It must be accurate, and it must appear on-screen extremely quickly while customers browse the website. That means these models have the highest requirements and are the most difficult to publish to production. We’re actively working on building and implementing tools to streamline and enable continuous monitoring of our data and models in production, which we want to integrate with Vertex AI. We believe in the power of AutoML to build models faster, so our goal is to evaluate all these services in GCP and then find a way to leverage them internally. And it’s already clear that the new ways of working enabled by Vertex AI not only make the lives of our data scientists easier, but also have a ripple effect that directly impacts the experience of millions of shoppers who visit our website daily. They’re all experiencing better technology and more functionalities, faster. For a more detailed dive on how our data scientists are using Vertex AI, look for part two of this blog coming soon.Related ArticleHow Wayfair says yes with BigQuery—without breaking the bankBigQuery’s performance and cost optimization have transformed Wayfair’s internal analytics to create an environment of “yes”.Read Article
Quelle: Google Cloud Platform

Anthos on-prem and on bare metal now power Google Distributed Cloud Virtual

Last year, we announced Google Distributed Cloud (GDC), a portfolio of hardware, software, and services that will bring our infrastructure to the edge and into your data centers. In March, Google Distributed Cloud Edge became generally availableto deliver an integrated hardware and software solution for new telco and enterprise edge workloads. And today, we are pleased to share our next update for Google Distributed Cloud Virtual — a software-and services-only solution that brings our existing Anthos on-prem (for VMware vSphere and Anthos bare metal services) into the GDC portfolio under this unified new product family. Customers of Anthos on-premises (now known as GDC Virtual) will continue to enjoy the consistent management and developer experience they have come to know and expect, with no changes to current capabilities, pricing structure, or look and feel across user interfaces and will continue to see consistent roadmap additions. For customers just getting to know GDC Virtual, its capabilities will round out our GDC Edge and Hosted offerings which are designed to accelerate your cloud transformation.  Taken together, the Google Distributed Cloud portfolio for Edge, Hosted, and Virtual provides a uniform set of experiences for development, security, and management across any IT environment you choose, backed by a common Anthos API. This includes the ability to select across system form factor types, choosing between software only or integrated hardware and software solutions, and whether you prefer to be self or fully managed, by Google or another trusted partner. Based on your unique business and workload needs, you choose the scenario that works best for your organization.Cloud-managed and deployed onto your infrastructure, GDC Virtual provides a software only extension of Google Cloud allowing you to: Automate provisioning and management of GKE clusters on VMs and existing bare metal infrastructure with the requirements and form factors you choose and use the Google Cloud Console to provision Anthos clusters on vSphereEnable developers to build and deploy container-based workloads to Kubernetes directly or an application runtimeApply federated security, access control, and identity management across cloud and on-premises clustersTo further illustrate these capabilities, here are some examples of where customers might choose to deploy GDC Virtual. A customer has  a significant investment in their own VM environment. Selecting GDC  Virtual enables them to leverage their existing infrastructure to run new and modernized applications.A customer wants to bring advanced AI/ML workloads into each store. Given the need to deploy in their establishments, they have specific requirements for footprint and hardware.  With GDC Virtual, the new workloads can be deployed on hardware and in a footprint that meets their specific needs.A large auto manufacturer is on a journey to migrate their applications to the cloud.  As part of this journey, they are leveraging their existing on-premises investments. By choosing GDC Virtual, they would be able to  modernize applications in-place  before migrating to the cloud.GDC Virtual Adoption is Growing Rapidly We built Anthos three years ago to deliver a consistent cloud operational model and developer experience, and adoption has continued to grow exponentially. In fact, between 2021 and 2022, the number of customers for Anthos products grew by over five times. This includes Anthos bare metal customers (now Google GDC Virtual) growing by over four times over that same period. Now customers have more ways to consume Anthos – GDC Edge, Virtual, and Hosted – which are all powered by Anthos. Some notable customers include:TELUS – For TELUS, a leading provider of communications and technology, Anthos on-premises capabilities help enable their new Multi-Access Edge Computing (MEC) use cases. This new Telco edge solution, moves the processing and management of traffic from a centralized cloud to the edge of TELUS’ 5G network, making it possible to deploy applications and process content closer to its customers, and thus yielding several benefits including better performance, security, and customization. This includes enabling a new Connected Worker Safety solution that can be applied across a range of business verticals to help improve safety, prevent injury, and save lives. More details on this solution can be found here.  Major League Baseball (MLB) – MLB supports 30 teams spread across the US and Canada, running workloads in the cloud and at the edge with on-premises data centers at each of their ballparks. By using Anthos, they can containerize those workloads and run them in the location that makes the most sense for the application. In particular, this enables scenarios where local computing needs to occur in the park for latency reasons, such as delivering stats in the stadium, to fans, or to broadcast, or to the scoreboard. This has enabled data democratization and distribution to its 30 teams and can support improved time to insight and fan engagement.Google Corporate Engineering – With Anthos, Google Corporate Engineering is working to make operations consistent and to reduce costs across Google Cloud and on-prem distributed cloud environments by using common tooling. In 2021, Google began running its first production workloads in edge environments at corporate offices with Anthos Bare Metal and in 2022 added the first production workloads in hosted data centers. By the end of the year Google plans to have a sizable portion of our virtualized platform migrated to GDC Virtual; this final migration will include a number of enterprise workloads that are critical for the operation of our company, including security, financials, and IT. Freedom of choice with Google Distributed CloudTo us, supporting our customers’ transformation requirements means providing a Google Distributed Cloud portfolio that embraces openness and choice. GDC Virtual offers a new consumption model for customers built upon a proven Anthos stack as a software only deployment option on your infrastructure. This enables modernization efforts to progress in place at a pace that makes sense for your business. With this update for GDC Virtual, the Google Distributed Cloud portfolio can now enable consistent operations from on-premises, to edge, to cloud. To learn more about GDC Virtual check out the Google Distributed Cloud website. Current Google Cloud customers can also learn about Google Distributed Cloud by adding Anthos Service Mesh and Config Management to their GKE clusters today!Related ArticleIntroducing Google Distributed Cloud—in your data center, at the edge, and in the cloudGoogle Distributed Cloud runs Anthos on dedicated hardware at the edge or hosted in your data center, enabling a new class of low-latency…Read Article
Quelle: Google Cloud Platform

How one Googler uses talking tulips to connect with customers

Editor’s Note: Matt Feigal has spent years deep inside many of our customers’ toughest technical problems, and now helps our partners solve innumerable issues for even more customers. That success at engineering problem-solving didn’t come about as you’d expect, though. He’s got an inspiring range of skills in empathy, entertaining…and engineering.  What was your path to Google?I studied History at the University of Minnesota. I liked seeing all the angles: not just what a king did, but what happened in agriculture, in the economy, with the climate, and all the various situations and consequences. I started doing tech work to pay for my unpaid internships in  museum work and found I really enjoyed computers too.Early on I found a mentor who gave me a lot of trust, and pointed me to where I needed to skill up  (a lot). My first employer was a pacemaker company that sent me to Europe to improve research trials between the US and Europe. I really enjoyed understanding both sides of the ocean, and playing the ambassador on both cultural and technical details. Later I joined a Big Swedish Furniture Company as a lead developer, and continued this practice of bridging cultural and technical issues. Eventually I volunteered organizing tech communities and app development workshops as a way to keep myself learning and mentor others. I met a lot of Google developers this way, and that’s how I got a chance to work here.History to Engineering is an interesting path. How does it affect the way you work?I think it helps me have empathy on a couple of levels. Working both in Sales Engineering and Cloud Platforms Solutions roles, I’ve been tasked with asking customers about problems and discovering ways to solve them. I’ve learned it comes down to figuring out the one thing customers need and how that is going to fix the problem they need to solve first, so they build momentum and move faster on the next problems.  It’s a combination of engineering solutions, customer experiences, and hidden internal constraints such as culture and economics – it looks a lot like a social science problem.If there is one thing the customer needs, what is your one thing?I have to help lots of different people get motivated in the same direction. We at Google are seeing customer patterns across many companies and can best help by showing them how to apply repeatable, scalable systems. Yet every customer has complex and unique problems, and before they trust us we must prove we understand. I find combining this empathy with our experience is the best way to get them motivated and moving.  What has been unique about Google?At Google, we’re motivated by a vision, and I see that it’s what makes me and my peers successful:  we constantly build on our strengths, and challenge each other to make the most of those strengths, rather than spending too much effort filling our personal gaps. It’s a big change from my past work, and it makes for very special teams.What’s the most effective way you motivate people to use Cloud?By finding a way to uncover their passion and put it into action. I run meetups here for our partners to talk about our technology because techies love to connect and learn from each other. But it has to then go the next step: challenging them to go back to their shops and apply the new learning.Humor works too. I’ve been part of a few successful April Fool projects centered around life in Holland, like the self-driving bicycle (we ride a lot of bikes); the “Google Tulip,” for communicating with our national plant; and Google Wind for harnessing Holland’s windmills to blow away clouds. They’re funny, but more importantly we use them to build out storytelling and technical demos which show off the data pipelines, NLP, Kubernetes, etc., that real techies would use to build such a project. Since I’m not talking about a real project with these stories, it’s easier for people to quickly imagine their own problems that match the pattern. If someone in retail looks at a financial industry solution, they quickly turn off. But if we show how to talk to flowers, they imagine how that interactive voice application might work in their business.Related Article“Take that leap of faith” Meet the Googler helping customers create financial inclusionCloud Googler shares how she has brought her purpose to her work, creating equity in the financial services space.Read Article
Quelle: Google Cloud Platform

When two become one: Integrating Google Cloud Organizations after a merger or acquisition

Congratulations! Your company just acquired or merged with another organization, beginning an important new chapter in its history. But like with many business deals, the devil is in the details — particularly when it comes to integrating the two companies’ cloud domains and organizations. In this blog post, we look at how to approach mergers and acquisitions (M&A) from the perspective of Google Cloud. These are the best practices that your Google Cloud Technical Account Managers follow — or that we recommend you follow if you plan to perform the integration yourself.Although there are various M&A scenarios, here are the two most common ones we will focus on:Both entities engaged in the M&A have some presence on Google Cloud and are looking for some level of integrationOnly one of the entities has a presence on Google Cloud, and is looking at best ways to work togetherDepending on your situation, your approach to integrating the two companies will vary substantially. When both companies have a Google Cloud presenceIn the first scenario, let’s assume company A is acquiring company B. Prior to the M&A, both companies have their own Google Cloud Organizations — the top level structures in the Google Cloud resource hierarchy — and have one or more billing accounts associated with them. There are also various Folders and Projects below each Google Cloud Organization. In this scenario, here are the key questions to ask: How do you plan to integrate/consolidate two distinct Google Cloud Organizations?How do you plan to organize the billing structure?How do you handle Projects under the two Organizations?What is the identity management strategy for the two Organizations?For each of these key questions, go ahead and formulate a detailed plan of action. If you have access to the Technical Account Manager service through Google Cloud Premium Support, you can reach out to them to further develop this plan.Understanding Google Cloud OrganizationsFrom an organizational integration standpoint, when each entity in an M&A has its own Google Cloud Organization, you have various options: no integration, partial, or full.No integration – When company B operates as an independent entity from company A, no migration is required. One caveat is if company A has negotiated better pricing terms/discounts and support packages with Google Cloud. In that case, you can sign an affiliate addendum by working with your Google Cloud account team to help unlock the same benefits for company B.Partial integration – Some projects move over to company A from company B and others stay with Company B. There can be some shared access between the two companies and each of the organizations can continue to use their existing identity providers. This can be a self-serve or a paid services engagement with Google Cloud depending on the complexity of the two companies and how many project migrations need to take place between them.Full integration – Company B is fully incorporated into company A. This means you go through a full billing, Google Workspace identity and project migration from company B into company A. This can be a complex process and we highly recommend engaging your Google Cloud account teams to scope out a paid services engagement to go through this transition.Planning your project migrationNo matter what you want your end state to look like, project migration requires careful planning. Again, if you have an assigned Technical Account Manager, please reach out to them to ensure that you have a conversation around best practices before starting this migration.If you’re taking a self-service approach, at a high level, we recommend leveraging the Resource Manager API to manage your project migrations. Do keep in mind that there are several prerequisites and required permissions documented here that need to be assigned before going down this path.In addition, please be sure to read the billing and identity management considerations below to ensure that you are covering all of the bases associated with such a migration, as your choices can fundamentally alter your Google Cloud footprint.Billing considerationsWhen deciding how to structure your Organizations and billing accounts, our recommendation is to always limit the number of Organization nodes and use the Folder structure to manage departments/teams within it. Creating additional Organization nodes is only advised in cases where you require a level of isolation for certain Projects from central administration for a specific business reason, for example, if the company being acquired already has their own Organization node and there is a business justification to let it operate as a standalone entity.Warning: If you have multiple Organization nodes, be aware that you will not have central visibility across all your organizational resources, and that policy management across different Organization nodes can be cumbersome. You will also have to manage multiple Workspace accounts and manage identities across them, which can be difficult, especially when operating at scale. From a billing account management perspective, our recommendation is to create one central billing account that lives within the Organization node with tags and labels incorporated for additional granularity. However, there are a few business cases which warrant the creation of additional billing accounts such as:You need to split charges for legal or accounting purposesInvoices are paid in multiple currenciesYou need to segregate usage to draw down on a Google Cloud promotional creditSubsidiaries need their own invoiceKeep in mind that committed-use and spend-based discounts and promotional credits cannot be shared across billing accounts and are provisioned on a per-billing-account basis. As such, more billing accounts can make it harder to leverage these discounts and credits.Identity managementAs you might expect, merging two entities has identity management implications. Cloud Identity is the solution leveraged by Google Cloud to help you manage your user and group identities. Even if the acquired company only uses the productivity products that are part of Google Workspace, the identities would still be managed by Cloud Identity.Google Workspace considerationsTo move large amounts of content into a Google Workspace domain, we recommend one of three options, depending on your end goal and data complexity:For general migrations: Leverage Google Workspace Migrate to move data into your Workspace domain from either another Workspace domain or a third-party productivity solutionFor manual migrations: Use the Export tool to move your organization’s data to a Cloud Storage archive so you can selectively download exported data by user and serviceFor complex Google Workspace scenarios: Speak with your Google Cloud Technical Account Manager about the possibility of using a custom scoped engagement to merge two Google Workspace environments without business interruptionWhen only one company is on Google CloudNow, let’s consider the scenario where only company A has a presence on Google Cloud but company B does not. The approach you take to integrate the two organizations largely depends on your desired end state — full, partial or no integration. If the plan is to eventually integrate company B into company A, your approach here will have a lot of similarities with the ‘full integration’ option mentioned above — just at a later point in time.You may also run into a scenario where company B has a presence on an alternative cloud platform and you need to migrate resources into or out of Google Cloud. Again, similar to the partial integration option called out above, a paid engagement or a self-service exercise would be a good fit depending on the complexity of the desired end state. Here to helpA merger or acquisition is an exciting milestone for any company, but one that needs to be managed carefully. Once you carefully review these considerations, develop a plan of action for your organization. You can also engage Google’s Professional Services for a paid engagement or Google’s Technical Account Management Service for a self-managed process to achieve the desired results. If you are going through or considering going through M&A at your organization and have a different scenario than what we have discussed, please feel free to reach out to your account teams for guidance or contact us at https://cloud.google.com/contact.Related ArticleGoogle Cloud’s 5 ways to create differentiated value in post-merger integrationsThe post-merger integration of the new IT estate is likely an important element to delivering added-value. Google Cloud can help with its…Read Article
Quelle: Google Cloud Platform

How to Train and Deploy a Linear Regression Model Using PyTorch – Part 1

Python is one of today’s most popular programming languages and is used in many different applications. The 2021 StackOverflow Developer Survey showed that Python remains the third most popular programming language among developers. In GitHub’s 2021 State of the Octoverse report, Python took the silver medal behind Javascript.
Thanks to its longstanding popularity, developers have built many popular Python frameworks and libraries like Flask, Django, and FastAPI for web development.
However, Python isn’t just for web development. It powers libraries and frameworks like NumPy (Numerical Python), Matplotlib, scikit-learn, PyTorch, and others which are pivotal in engineering and machine learning. Python is arguably the top language for AI, machine learning, and data science development. For deep learning (DL), leading frameworks like TensorFlow, PyTorch, and Keras are Python-friendly.
We’ll introduce PyTorch and how to use it for a simple problem like linear regression. We’ll also provide a simple way to containerize your application. Also, keep an eye out for Part 2 — where we’ll dive deeply into a real-world problem and deployment via containers. Let’s get started.
What is PyTorch?
A Brief History and Evolution of PyTorch
Torch debuted in 2002 as a deep-learning library developed in the Lua language. Accordingly, Soumith Chintala and Adam Paszke (both from Meta) developed PyTorch in 2016 and based it on the Torch library. Since then, developers have flocked to it. PyTorch was the third-most-popular framework per the 2021 StackOverflow Developer Survey. However, it’s the most loved DL library among developers and ranks third in popularity. Pytorch is also the DL framework of choice for Tesla, Uber, Microsoft, and over 7,300 others.
PyTorch enables tensor computation with GPU acceleration, plus deep neural networks built on a tape-based autograd system. We’ll briefly break these terms down, in case you’ve just started learning about these technologies.

A tensor, in a machine learning context, refers to an n-dimensional array.
A tape-based autograd means that Pytorch uses reverse-mode automatic differentiation, which is a mathematical technique to compute derivatives (or gradients) effectively using a computer.

Since diving into these mathematics might take too much time, check out these links for more information:

What is a Pytorch Tensor?
What is a tape-based autograd system?
Automatic differentiation

PyTorch is a vast library and contains plenty of features for various deep learning applications. To get started, let’s evaluate a use case like linear regression.
What is Linear Regression?
Linear Regression is one of the most commonly used mathematical modeling techniques. It models a linear relationship between two variables. This technique helps determine correlations between two variables — or determines the value-dependent variable based on a particular value of the independent variable.
In machine learning, linear regression often applies to prediction and forecasting applications. You can solve it analytically, typically without needing any DL framework. However, this is a good way to understand the PyTorch framework and kick off some analytical problem-solving.
Numerous books and web resources address the theory of linear regression. We’ll cover just enough theory to help you implement the model. We’ll also explain some key terms. If you want to explore further, check out the useful resources at the end of this section.
Linear Regression Model
You can represent a basic linear regression model with the following equation:
Y = mX + bias
What does each portion represent?

Y is the dependent variable, also called a target or a label.
X is the independent variable, also called a feature(s) or co-variate(s).
bias is also called offset.
m refers to the weight or “slope.”

These terms are often interchangeable. The dependent and independent variables can be scalars or tensors.
The goal of the linear regression is to choose weights and biases so that any prediction for a new data point — based on the existing dataset — yields the lowest error rate. In simpler terms, linear regression is finding the best possible curve (line, in this case) to match your data distribution.
Loss Function
A loss function is an error function that expresses the error (or loss) between real and predicted values. A very popular way to measure loss is by using a root mean squared error, which we’ll also use.
Gradient Descent Algorithms
Gradient descent is a class of optimization algorithms that tries to solve the problem (either analytically or using deep learning models) by starting from an initial guess of weights and bias. It then iteratively reduces errors by updating weights and bias values with successively better guesses.
A simplified approach uses the derivative of the loss function and minimizes the loss. The derivative is the slope of the mathematical curve, and we’re attempting to reach the bottom of it — hence the name gradient descent. The stochastic gradient method samples smaller batches of data to compute updates which are computationally better than passing the entire dataset at each iteration.
To learn more about this theory, the following resources are helpful:

MIT lecture on Linear regression
Linear regression Wikipedia article
Dive into deep learning online resources on linear regression

Linear Regression with Pytorch
Now, let’s talk about implementing a linear regression model using PyTorch. The script shown in the steps below is main.py — which resides in the GitHub repository and is forked from the “Dive Into Deep learning” example repository. You can find code samples within the pytorch directory.
For our regression example, you’ll need the following:

Python 3
PyTorch module (pip install torch) installed on your system
NumPy module (pip install numpy) installed
Optionally, an editor (VS Code is used in our example)

Problem Statement
As mentioned previously, linear regression is analytically solvable. We’re using deep learning to solve this problem since it helps you quickly get started and easily check the validity of your training data. This compares your training data against the data set.
We’ll attempt the following using Python and PyTorch:

Creating synthetic data where we’re aware of weights and bias
Using the PyTorch framework and built-in functions for tensor operations, dataset loading, model definition, and training

We don’t need a validation set for this example since we already have the ground truth. We’d assess our results by measuring the error against the weights and bias values used while creating our synthetic data.
Step 1: Import Libraries and Namespaces
For our simple linear regression, we’ll import the torch library in Python. We’ll also add some specific namespaces from our torch import. This helps create cleaner code:

# Step 1 import libraries and namespaces

import torch

from torch.utils import data

# `nn` is an abbreviation for neural networks

from torch import nn

Step 2: Create a Dataset
For simplicity’s sake, this example creates a synthetic dataset that aims to form a linear relationship between two variables with some bias.
i.e. y = mx + bias + noise

#Step 2: Create Dataset

#Define a function to generate noisy data

def synthetic_data(m, c, num_examples):

"""Generate y = mX + bias(c) + noise"""

X = torch.normal(0, 1, (num_examples, len(m)))

y = torch.matmul(X, m) + c

y += torch.normal(0, 0.01, y.shape)

return X, y.reshape((-1, 1))

 

true_m = torch.tensor([2, -3.4])

true_c = 4.2

features, labels = synthetic_data(true_m, true_c, 1000)

Here, we use the built-in PyTorch function torch.normal to return a tensor of normally distributed random numbers. We’re also using the torch.matmul function to multiply tensor X with tensor m, and Y is distributed normally again.
The dataset looks like this when visualized using a simple scatter plot:

The code to create the visualization can be found in this GitHub repository.
Step 3: Read the Dataset and Define Small Batches of Data

#Step 3: Read dataset and create small batch

#define a function to create a data iterator. Input is the features and labels from synthetic data

# Output is iterable batched data using torch.utils.data.DataLoader

def load_array(data_arrays, batch_size, is_train=True):

"""Construct a PyTorch data iterator."""

dataset = data.TensorDataset(*data_arrays)

return data.DataLoader(dataset, batch_size, shuffle=is_train)

 

batch_size = 10

data_iter = load_array((features, labels), batch_size)

 

next(iter(data_iter))

Here, we use the PyTorch functions to read and sample the dataset. TensorDataset stores the samples and their corresponding labels, while DataLoader wraps an iterable around the TensorDataset for easier access.
The iter function creates a Python iterator, while next obtains the first item from that iterator.
Step 4: Define the Model
PyTorch offers pre-built models for different cases. For our case, a single-layer, feed-forward network with two inputs and one output layer is sufficient. The PyTorch documentation provides details about the nn.linear implementation.
The model also requires the initialization of weights and biases. In the code, we initialize the weights using a Gaussian (normal) distribution with a mean value of 0, and a standard deviation value of 0.01. The bias is simply zero.

#Step4: Define model & initialization

# Create a single layer feed-forward network with 2 inputs and 1 outputs.

net = nn.Linear(2, 1)

 

#Initialize model params

net.weight.data.normal_(0, 0.01)

net.bias.data.fill_(0)

Step 5: Define the Loss Function
The loss function is defined as a root mean squared error. The loss function tells you how far from the regression line the data points are:

#Step 5: Define loss function
# mean squared error loss function
loss = nn.MSELoss()

Step 6: Define an Optimization Algorithm
For optimization, we’ll implement a stochastic gradient descent method.
The lr stands for learning rate and determines the update step during training.

#Step 6: Define optimization algorithm
# implements a stochastic gradient descent optimization method
trainer = torch.optim.SGD(net.parameters(), lr=0.03)

Step 7: Training
For training, we’ll use specialized training data for n epochs (five in our case), iteratively using minibatch features and corresponding labels. For each minibatch, we’ll do the following:

Compute predictions and calculate the loss
Calculate gradients by running the backpropagation
Update the model parameters
Compute the loss after each epoch

# Step 7: Training

# Use complete training data for n epochs, iteratively using a minibatch features and corresponding label

# For each minibatch:

#   Compute predictions by calling net(X) and calculate the loss l

#   Calculate gradients by running the backpropagation

#   Update the model parameters using optimizer

#   Compute the loss after each epoch and print it to monitor progress

num_epochs = 5

for epoch in range(num_epochs):

for X, y in data_iter:

l = loss(net(X) ,y)

trainer.zero_grad() #sets gradients to zero

l.backward() # back propagation

trainer.step() # parameter update

l = loss(net(features), labels)

print(f’epoch {epoch + 1}, loss {l:f}’)

Results
Finally, compute errors by comparing the true value with the trained model parameters. A low error value is desirable. You can compute the results with the following code snippet:

#Results
m = net.weight.data
print(‘error in estimating m:’, true_m – m.reshape(true_m.shape))
c = net.bias.data
print(‘error in estimating c:’, true_c – c)

When you run your code, the terminal window outputs the following:
python3 main.py
features: tensor([1.4539, 1.1952])
label: tensor([3.0446])
epoch 1, loss 0.000298
epoch 2, loss 0.000102
epoch 3, loss 0.000101
epoch 4, loss 0.000101
epoch 5, loss 0.000101
error in estimating m: tensor([0.0004, 0.0005])
error in estimating c: tensor([0.0002])
As you can see, errors gradually shrink alongside the values.
Containerizing the Script
In the previous example, we had to install multiple Python packages just to run a simple script. Containers, meanwhile, let us easily package all dependencies into an image and run an application.
We’ll show you how to quickly and easily Dockerize your script. Part 2 of the blog will discuss containerized deployment in greater detail.
Containerize the Script
Containers help you bundle together your code, dependencies, and libraries needed to run applications in an isolated environment. Let’s tackle a simple workflow for our linear regression script.
We’ll achieve this using Docker Desktop. Docker Desktop incorporates Dockerfiles, which specify an image’s overall contents.
Make sure to pull a Python base image (version 3.10) for our example:
FROM python:3.10
Next, we’ll install the numpy and torch dependencies needed to run our code:
RUN apt update && apt install -y python3-pip
RUN pip3 install numpy torch
Afterwards, we’ll need to place our main.py script into a directory:
COPY main.py app/
Finally, the CMD instruction defines important executables. In our case, we’ll run our main.py script:
CMD [“python3″, “app/main.py” ]
Our complete Dockerfile is shown below, and exists within this GitHub repo:
FROM python:3.10
RUN apt update && apt install -y python3-pip
RUN pip3 install numpy torch
COPY main.py app/
CMD [“python3″, “app/main.py” ]
Build the Docker Image
Now that we have every instruction that Docker Desktop needs to build our image, we’ll follow these steps to create it:

In the GitHub repository, our sample script and Dockerfile are located in a directory called pytorch. From the repo’s home folder, we can enter cd deeplearning-docker/pytorch to access the correct directory.
Our Docker image is named linear_regression. To build your image, run the docker build -t linear_regression. command.

Run the Docker Image
Now that we have our image, we can run it as a container with the following command:
docker run linear_regression
This command will create a container and execute the main.py script. Once we run the container, it’ll re-print the loss and estimates. The container will automatically exit after executing these commands. You can view your container’s status via Docker Desktop’s Container interface:

Desktop shows us that linear_regression executed the commands and exited successfully.
We can view our error estimates via the terminal or directly within Docker Desktop. I used a Docker Extension called Logs Explorer to view my container’s output (shown below):
Alternatively, you may also experiment using the Docker image that we created in this blog.

As we can see, the results from running the script on my system and inside the container are comparable.
To learn more about using containers with Python, visit these helpful links:

Patrick Loeber’s talk, “How to Containerize Your Python Application with Docker”
Docker documentation on building containers using Python

Want to learn more about PyTorch theories and examples?
We took a very tiny peek into the world of Python, PyTorch, and deep learning. However, many resources are available if you’re interested in learning more. Here are some great starting points:

PyTorch tutorials
Dive into Deep learning GitHub
Machine Learning Mastery Tutorials

Additionally, endless free and paid courses exist on websites like YouTube, Udemy, Coursera, and others.
Stay tuned for more!
In this blog, we’ve introduced PyTorch and linear regression, and we’ve used the PyTorch framework to solve a very simple linear regression problem. We’ve also shown a very simple way to containerize your PyTorch application.
But, we have much, much more to discuss on deployment. Stay tuned for our follow-up blog — where we’ll tackle the ins and outs of deep-learning deployments! You won’t want to miss this one.
Quelle: https://blog.docker.com/feed/