New in Google Cloud VMware Engine: autoscaling, Mumbai expansion, etc.

We’ve made several updates to Google Cloud VMware Engine in recent weeks—today’s post provides a recap of our latest milestones. Google Cloud VMware Engine delivers an enterprise-grade VMware stack running natively in Google Cloud. This cloud service is one of the fastest paths to the cloud for VMware workloads without making changes to existing applications or operating models across a variety of use-cases. These include rapid data center exit, application lift and shift, disaster recovery, virtual desktop infrastructure, or modernization at your own pace.In fact, Mitel, a global provider of unified communications-as-a-service to 70 million business users across 100 countries, migrated 1,000 VMware instances to Google Cloud VMware Engine in less than 90 days and improved its monthly operational output four times. In our last update, we focused on several innovative capabilities around networking, reach, and scale. Let us take a look at the highlights we released since our last installment.Fast provisioning of a dedicated, intrinsically secure VMware private cloudWith Google Cloud VMware Engine, you can spin up a VMware private cloud in about 30 minutes. You can also scale your VMware-based infrastructure on-demand with dedicated hosts located in secure Google data centers. Let us look at what’s new:Autoscale: The ability to elastically and programmatically manage infrastructure resources to align with business needs or what is called “right-sizing” is a core capability of an IaaS platform. With autoscale, Google Cloud VMware Engine users can leverage policy-driven automation to scale the nodes needed to meet the compute demands of the VMware infrastructure. Autoscale:Addresses seasonal spikes in demand, gradual increases of utilization, or new projects being onboarded or expanded due to disaster recovery events. Analyzes the CPU, memory, and storage utilization to give you the controls to scale Google Cloud VMware Engine nodes up or down. Ensures that storage consumption does not exceed the recommended limits for maintaining the Google Cloud VMware Engine service-level agreement. Reduces overhead on IT teams by automating capacity monitoring and enabling sufficient availability of resources based on thresholds. Note that safeguards for maintaining minimum capacity and maximum capacity can be configured to ensure there are boundaries to the automation.Learn how to set up Autoscale.Mumbai region availability Google Cloud VMware Engine is now available in the Mumbai region. This brings the availability of the service to 12 regions globally, enabling our multi-national and regional customers to leverage a VMware-compatible infrastructure-as-a-service platform on Google Cloud. For more details, please read the press release.Enterprise-grade infrastructureWith 99.99% availability for a cluster in a single zone, fully dedicated 100 Gbps east-west networking with no oversubscription, and all nonvolatile memory express storage, Google Cloud VMware Engine provides the highest performance required for the most demanding workloads. Let us look at what’s new:Preview – Google Cloud KMS integration: You already have the ability to bring your own keys to encrypt your vSAN datastores. With this new capability, organizations that want to eliminate the overhead of managing external key providers can leverage a Google managed key provider, using Cloud KMS. This brings increased flexibility in securing workloads and data by enabling vSAN encryption by default for newly instantiated VMware Private Clouds. This feature is currently in Preview. HIPAA compliance: Since April, Google Cloud VMware Engine is Health Insurance Portability and Accountability Act (HIPAA) compliant. This opens the service up to healthcare organizations, that can now migrate and run their HIPAA-compliant VMware workloads in a fully compatible VMware Cloud Verified stack running natively in Google Cloud with Google Cloud VMware Engine, without changes or re-architecture to tools, processes, or applications. Read more in this blog.NSX-T support for Active Directory: With NSX-T support for Active Directory, you can now leverage your on-premises Active Directory as one of the lightweight directory access protocol identity sources for user authentication into NSX-T manager. This extends the theme of being able to leverage your on-premises tools with Google Cloud VMware Engine. For more information, read the documentation on how to set up identity sources.vSAN TRIM/UNMAP support: For space-efficiency, vSAN allows creating thin-provisioned disks that grow gradually as they are filled with data. However, files that are deleted within the guest operating system (OS) do not result in vSAN freeing up space allocated. To increase space efficiency, guest OS file systems have the ability to reclaim capacity that is no longer used, using TRIM/UNMAP commands. vSAN is fully aware of these commands that are sent from the guest OS and enables reclamation of previously allocated storage as free space. We have enabled TRIM/UNMAP for vSan by default in Google Cloud VMware Engine.Simplicity in experience and operationsWith Google Cloud VMware Engine, you only need to worry about your workloads—not patching, upgrading, and updating the solution layer, for fewer interoperability issues and infrastructure maintenance. IIn addition, we have pre-built service accounts to enable your third-party VMware-supported tools and solutions to work seamlessly in VMware Engine. Access to Google services privately over local connections is also natively supported, enabling enrichment of existing applications and modernization over time. Finally, this service brings the power of Google Cloud Virtual Private Cloud (VPC) design by natively providing multi-VPC, multi-region networking that’s unique. Let’s look at what’s new:Dashboards for Day 2 operations: To speed up cloud transformation and enable efficiency, Google Cloud VMware Engine administrators can take advantage of Cloud Operations dashboards for the solution. In addition, administrators can create custom policies through cloud alerting and enable notifications via channels of their choice (SMS, email, Slack, and more). For more details on how to set up cloud monitoring, please refer to Setting up Cloud Monitoring. For the latest updates, bookmark Google Cloud VMware Engine release notes.Thanks to Manish Lohani, Product Management, Google Cloud; Nargis Sakhibova, Product Management, Google Cloud; and Wade Holmes, Solutions Management, Google Cloud; for their contributions to this blog post.Related ArticleRetire your tech debt: Move vSphere 5.5+ to Google Cloud VMware EngineMigrating your legacy VMware vSphere environment to Google Cloud VMware Engine can be a quick and easy way to get your systems back into …Read Article
Quelle: Google Cloud Platform

Google demonstrates leading performance in latest MLPerf Benchmarks

The latest round of MLPerf benchmark results have been released, and Google’s TPU v4 supercomputers demonstrated record-breaking performance at scale. This is a timely milestone since large-scale machine learning training has enabled many of the recent breakthroughs in AI, with the latest models encompassing billions or even trillions of parameters (T5,Meena,GShard,Switch Transformer, andGPT-3). Google’s TPU v4 Pod was designed, in part, to meet these expansive training needs, and TPU v4 Pods set performance records in four of the six MLPerf benchmarks Google entered using TensorFlow and JAX. These scores are a significant improvement over our winning submission from last year and demonstrate that Google once again has the world’s fastest machine learning supercomputers. These TPU v4 Pods are already widely deployed throughout Google data centers for our internal machine learning workloads and will be available via Google Cloud later this year.Figure 1: Speedup of Google’s best MLPerf Training v1.0 TPU v4 submission over the fastest non-Google submission in any availability category – in this case, all baseline submissions came from NVIDIA. Comparisons are normalized by overall training time regardless of system size. Taller bars are better.1Let’s take a closer look at some of the innovations that delivered these ground-breaking results and what this means for large model training at Google and beyond. Google’s continued performance leadershipGoogle’s submissions for the most recent MLPerf demonstrated leading top-line performance (fastest time to reach target quality), setting new performance records in four benchmarks. We achieved this by scaling up to 3,456 of our next-gen TPU v4 ASICs with hundreds of CPU hosts for the multiple benchmarks. We achieved an average of 1.7x improvement in our top-line submissions compared to last year’s results. This means we can now train some of the most common machine learning models in a matter of seconds.Figure 2: Speedup of Google’s MLPerf Training v1.0 TPU v4 submission over Google’s MLPerf Training v0.7 TPU v3 submission (exception: DLRM results in MLPerf v0.7 were obtained using TPU v4). Comparisons are normalized by overall training time regardless of system size. Taller bars are better. Unet3D not shown since it is a new benchmark for MLPerf v1.0.2We achieved these performance improvements through continued investment in both our hardware and software stacks. Part of the speedup comes from using Google’s fourth-generation TPU ASIC, which offers a significant boost in raw processing power over the previous generation, TPU v3. 4,096 of these TPU v4 chips are networked together to create a TPU v4 Pod, with each pod delivering 1.1 exaflop/s of peak performance.Figure 3: A visual representation of 1 exaflop/s of computing power. If 10 million laptops were running simultaneously, then all that computing power would almost match the computing power of 1 exaflop/s.In parallel, we introduced a number of new features into the XLA compiler to improve the performance of any ML model running on TPU v4. One of these features provides the ability to operate two (or potentially more) TPU cores as a single logical device using a shared uniform memory access system. This memory space unification allows the cores to easily share input and output data – allowing for a more performant allocation of work across cores. A second feature improves performance through a fine-grained overlap of compute and communication. Finally, we introduced a technique to automatically transform convolution operations such that space dimensions are converted into additional batch dimensions. This technique improves performance at the low batch sizes that are common at very large scales.Enabling large model research using carbon-free energyThough the margin of difference in topline MLPerf benchmarks can be measured in mere seconds, this can translate to many days worth of training time on the state-of-the-art models that comprise billions or trillions of parameters. To give an example, today we can train a 4 trillion parameter dense Transformer with GSPMD on 2048 TPU cores. For context, this is over 20 times larger than the GPT-3 model published by OpenAI last year. We are already using TPU v4 Pods extensively within Google to develop research breakthroughs such as MUM and LaMDA, and improve our core products such as Search, Assistant and Translate. The faster training times from TPUs result in efficiency savings and improved research and development velocity. Many of these TPU v4 Pods will be operating at or near 90% carbon free energy. Furthermore, cloud datacenters can be ~1.4-2X more energy efficient than typical datacenters, and the ML-oriented accelerators – like TPUs – running inside them can be ~2-5X more effective than off-the-shelf systems. We are also excited to soon offer TPU v4 Pods on Google Cloud, making the world’s fastest machine learning training supercomputers available to customers around the world. Cloud TPUs support leading frameworks such as TensorFlow, PyTorch, and Jax, and we recently released an all-new Cloud TPU system architecture that provides direct access to TPU host machines, greatly improving the user experience. Want to learn more? Please contact your Google Cloud sales representative to request early access to Cloud TPU v4 Pods. We are excited to see how you will expand the machine learning frontier with access to exaflops of TPU computing power!1. All results retrieved from www.mlperf.org on June 30, 2021. MLPerf name and logo are trademarks. See www.mlperf.org for more information. Chart uses results 1.0-1067, 1.0-1070, 1.0-1071, 1.0-1072, 1.0-1073, 1.0-1074, 1.0-1075, 1.0-1076, 1.0-1077, 1.0-1088, 1.0-1089, 1.0-1090, 1.0-1091, 1.0-1092.2. All results retrieved from www.mlperf.org on June 30, 2021. MLPerf name and logo are trademarks. See www.mlperf.org for more information. Chart uses results 0.7-65, 0.7-66, 0.7-67, 1.0-1088, 1.0-1090, 1.0-1091, 1.0-1092.Related ArticleNew Cloud TPU VMs make training your ML models on TPUs easier than everNew Cloud TPU VMs let you run TensorFlow, PyTorch, and JAX workloads on TPU host machines, improving performance and usability, and reduc…Read Article
Quelle: Google Cloud Platform

Build a platform with KRM: Part 4 – Administering a multi-cluster environment

This is part 4 in a multi-part series about the Kubernetes Resource Model. See parts 1, 2, and 3 to learn more. Kubernetes clusters can scale. Open-source Kubernetes supports up to 5,000 Nodes, and GKE supports up to 15,000 Nodes. But scaling out a single cluster can only get you so far: if your cluster’s control plane goes down, your entire platform goes down; if the Cloud region running your cluster has a service interruption, so does your app. Many organizations choose, instead, to operate multiple Kubernetes clusters. Besides availability, there are lots of reasons to consider multi-cluster, such as allocating a cluster to each development team, splitting workloads between cloud and on-prem, or providing burst capability for traffic spikes. But operating a multi-cluster platform comes with its own challenges. How to consistently administer many clusters at once? How to keep the clusters secure? How to deploy and monitor applications running across multiple clusters? How to seamlessly fail over from one region to another?This post introduces a few tools that can help platform teams more easily administer a multi-cluster Kubernetes environment. The platform base layer, with Config Sync In the last post, we explored how thoughtful platform abstractions can help reduce toil for app developers- including for a multi-cluster environment, where automation such as CI/CD handles all interactions with the staging and production clusters. But equally important is the platform base layer, the Kubernetes resources and configuration that are shared across services. Your platform base layer might consist of Namespaces, role-based access control, and shared workloads like Prometheus. Platform abstractions depend on the existence of these base-layer resources. And so does the security and stability of your platform as a whole. It’s important that these resources not only get deployed, but also stay put. CI/CD is great for deploying resources, but what about making sure resources stay deployed? What if a Kubernetes Namespace gets deleted? Or a Prometheus StatefulSet is modified? Kubernetes’ job is to ensure that the cluster’s actual state matches the desired state. But sometimes, the “desired” state isn’t desired at all – it’s a developer who mistakenly modified a resource, or a bad actor that’s gained access into the system. For this reason, a platform base layer needs more than a one-and-done CI/CD pipeline. A tool called Config Sync can help with this. Config Sync is a Google Cloud product that can sync Kubernetes resources from a Git repository to one or more GKE or Anthos clusters. Unlike CI/CD tools like Cloud Build, Config Sync watches your clusters constantly, making sure that the intended resource state in the cluster always matches what’s in Git. Config Sync is designed primarily for base-layer resources like namespaces and RBAC. In this way, Config Sync is complementary to, not a replacement for, CI/CD. Source: Config Sync documentationConfig Sync runs in a Pod inside your Kubernetes cluster, watching your Git config repo for changes, and also watching the cluster itself for any divergence from your desired state in Git. If any configuration drift is detected from what’s stored in Git, Config Sync will update the API Server accordingly. You can point multiple Config Sync deployments at the same Git repo, allowing you to manage the base-layer platform resources for multiple clusters using the same source of truth. And by using Git as the landing zone for config, you can benefit from some of the GitOps principles we discussed in part 2, including the ability to audit and roll back configuration changes.Let’s walk through an example of how to manage base-layer resources with Config Sync. The Cymbal Bank platform consists of four GKE clusters: admin, dev, staging, and prod. We can install Config Sync on all four clusters using the gcloud tool or the Google Cloud Console, pointing all four clusters at a single Git repository, called cymbalbank-policy. Note that this repo is separate from the application source and config repos, and is managed by the platform team. From the Console, we can see that all four clusters are synced to the same commit of the cymbalbank-policy repo. Now, let’s say that the Cymbal Bank platform team wants to limit the amount of CPU and memory resources each application team can request for their service. Kubernetes ResourceQuotas help impose these limits, and prevent unexpected Pod evictions.The platform team can define a set of ResourceQuotas for each application namespace. They can also scope the resources to only be applied to a subset of clusters – for instance, to the production cluster only. (If no cluster name selector is specified, Config Sync will deploy the resource to all clusters by default.)From here, the platform team can commit the resources to the cymbalbank-policy repo, and Config Sync, always watching the policy repo, will deploy the resources to the production cluster:If a developer tries to delete one of the ResourceQuotas, Config Sync will block the request, helping to ensure that these base-layer resources stay put.  In this way, Config Sync can help platform teams ensure the stability of that platform base-layer, as well as ensure resource consistency across multiple clusters at once. This, in turn, can help organizations mitigate the complexity of adding new clusters to their environment.  Enforce policies on Kubernetes resourcesConfig Sync is a powerful tool on its own, and can work with any Kubernetes resource that your cluster recognizes. This includes Custom Resource Definitions (CRDs) installed with add-ons like Anthos Service Mesh. But Config Sync, by default, doesn’t have an idea of “good or bad” Kubernetes resources. It will deploy whatever resources land in Git, even resources that might pose a security risk to your organization. Security is an essential feature of any developer platform, and when it comes to Kubernetes, it’s important to think about security from the initial software design stages, and set up your clusters with security best-practices in mind.   But it’s just as important to think about security at deploy-time. Who and what can access your clusters? What kinds of Kubernetes resources – and fields within those resources- are allowed? These decisions will depend on lots of factors, including the kinds of data your application deals with, and any industry-specific regulations. One common security use case for KRM is the need to monitor incoming Kubernetes resources, whether they’re coming in through kubectl, CI/CD, or Config Sync. But if you have multiple clusters, your Kubernetes environment has multiple API Servers, and therefore multiple entry points.  A Google tool called Policy Controller can help automate resource monitoring across multiple clusters. Policy Controller is a Kubernetes admission controller that can accept or reject incoming resources based on custom policies you define. Policy Controller is based on the OpenPolicyAgent Gatekeeper project, and it allows you to define policies, or “Constraints,” as KRM. This means you can deploy them using Config Sync, via Git. Once deployed, Policy Controller uses your Constraints as a set of rules to evaluate all incoming KRM, rejecting resources that fall out of compliance.  Let’s walk through an example. Say that the Cymbal Bank security team wants to ensure that no code in development is accessible to the public. Kubernetes Services of type LoadBalancerexpose public IP addresses by default, so the platform team wants to define a PolicyController constraint that blocks Services of that type on the development GKE cluster. To do this, the platform team can define a Policy Controller Constraint as KRM. This Constraint uses a Constraint Template, provided through the pre-installed Constraint Template library. The ConstraintTemplate defines the logic of the policy itself, and the Constraint makes the template concrete, populating any variables needed to execute the policy logic. Here, we’re also adding a Config Sync cluster name annotation, to scope this resource to apply only to the development cluster.The platform team can then commit the resource to the cymbalbank-policy repo, and Config Sync will deploy the resource to the development cluster. From here, if an app developer tries to create an externally-accessible Kubernetes Service, Policy Controller will block the resource from being created.The platform team can define as many of these Constraints as they want, each defining a separate policy.Writing custom policies The Policy Controller Constraint Template library provides a lot of functionality, from blocking privileged containers, to requiring certain resource labels, to preventing app teams from deploying into certain namespaces. But if you want to enforce custom logic on your organization’s KRM, you can do so by writing a custom Constraint Template. Constraint Templates are written in a query language called Rego. Rego was designed for policy rule evaluation, and it can introspect Kubernetes resource fields to make a conclusion as to whether the resource is allowed or not. For instance, let’s say that the platform team wants to limit the number of containers allowed inside a single application Pod. Too many containers per Pod can cause outage risks— when one container crashes, the entire Pod crashes.To enforce this policy, the platform team can define a Constraint Template, using the Rego language, that looks inside a resource to ensure that the number of containers per Pod is within the allowed limit: Finally, the platform team can push these resources to the cymbalbank-policy repo, and Config Sync will deploy the policy to all four clusters. If a developer tries to define a Kubernetes Deployment containing more containers per pod than what’s allowed, the resource will be blocked at deploy time: Custom Constraint Templates can give platform teams lots of flexibility in the types of policies they define and enforce in a Kubernetes environment. Integrating policy checks into CI/CD As we explored earlier, Config Sync and CI/CD are complementary tools. Config Sync works great for base-layer platform resources and policies, whereas CI/CD works well for application tests and deployment. But one pitfall of having two separate KRM deployment mechanisms is that app developers may not know that their resources are out of policy until they try to deploy them into production. This is especially true if some policies are scoped only to production, as we saw with the ResourceQuota example. Ideally, the platform team has a way to empower developers and code reviewers to know ahead of time whether new or modified resources are still in compliance. We can enable this use case by integrating policy checks into the existing Cymbal Bank CI/CD.Policy Controller operates, by default, as a Kubernetes Admission Controller running inside the cluster. But Policy Controller also provides a “standalone” mode, running inside a container, that can be used outside of a cluster, such as from inside a Cloud Build pipeline. In the example below, Cloud Build executes Policy Controller checks by getting the cymbalbank-app-config manifests, cloning the cymbalbank-policy resources, and using the “policy-controller-validate” container image to evaluate the app manifests against the policies. From here, an app developer or operator can know if their resources violate org-wide policies, by looking at the Cloud Build output for their Pull Request:  By integrating policy checks into CI/CD, app development teams can understand whether their resources are in compliance, and platform teams add an additional layer of policy checks to the platform. Overall, Config Sync and Policy Controller can provide a powerful toolchain for standardizing base-layer config across a multi-cluster environment. Check out the Part 4 demo to try out each of these examples. And stay tuned for Part 5, where we’ll learn how to use KRM to manage cloud-hosted resources. 
Quelle: Google Cloud Platform

Improved Volume Management, Docker Dev Environments and more in Desktop 3.5

Docker Desktop 3.5 is here and we can’t wait for you to try it!

We’ve introduced some exciting new features including improvements to the Volume Management interface, a tech preview of Docker Dev Environments, and enhancements to Compose V2.

Easily Manage Files in your Volumes

Volumes can quickly take up local disk storage and without an easy way to see which ones are being used or their contents, it can be hard to free up space. This is why in the release of Docker Desktop 3.5 we’ve made it even easier for Pro and Team users to explore the directories and files inside of a volume. We’ve added in the modified date, kind, and size of files so that you can quickly identify what is taking up all that space and decide if you can part with it.

Once you’ve identified a file or directory inside a volume you no longer need, you can remove them straight from the Dashboard to free up space. We’ve also introduced a way to download files locally using “Save As” so that you can easily back up files before removing them.

We’re continuing to add more to volume management like the ability to share your volumes with your colleagues. Have ideas on how we might make managing volumes easier? We’d love you to help us prioritize by adding your use cases on our public roadmap. 

Docker Dev Environments

In 3.5 we released a technical preview of Docker Dev Environments. Check out our blog to learn more about why we built this and how it works.

Docker Compose V2 Beta Rollout Continues

We’re continuing to roll out the beta of Docker Compose V2, which allows you to seamlessly run the compose command in the Docker CLI. We are working towards launching Compose v2 as a drop-in replacement for docker-compose, so that no changes are required in your code to use this new functionality. We have also introduced the following new features:

Added support for container links and external links to facilitate communication between containers Introduced the docker compose logs –since and –until options enabling you to search logs by date.`docker compose config –profiles` now lists all defined profiles so you can see which additional services are defined in a single docker-compose.yml file. Profiles allow you to adjust the Compose application model for various usages and environments by selectively enabling services. 

You can test this new functionality by running the docker compose command, dropping the – in docker-compose. We are continuing to roll this out gradually; 31% of compose users are already using this beta version. You’ll be notified if you are using the new docker compose. You can opt-in to run Compose v2 with docker-compose, by running docker-compose enable-v2 command or by updating your Docker Desktop’s Experimental Features settings.  

If you run into any issues using Compose V2, simply run docker-compose disable-v2 command, or turn it off using Docker Desktop’s Experimental Features. Let us know your feedback on the new ‘compose’ command by creating an issue in the Compose-CLI GitHub repository.

Warning for Images incompatible with Apple Silicon Machines

Docker Dashboard will now warn you if an image you are using does not match your architecture on Apple Silicon. If you are using Desktop on Apple Silicon and an image is amd64 run by qemu emulation, it is possible that it may have poor performance or potentially crash. While we are promoting the usage of multi-architecture images, we want to make sure you are aware when an image you are using is running under emulation because it does not match your machine’s native architecture. If this is the case a warning will appear on the Containers / Apps page.

Less Disruptive Requests for Feedback

And finally, we’ve heard your feedback on how we ask you for your feedback. We’ve changed the way that the feedback form works so that it won’t pop up while you’re in the middle of working. When it’s time, the feedback form will only show up if you click on the whale menu. We do appreciate the time you spend to rate Docker Desktop. Your input helps us make changes like this! 

See the full release notes for Docker Desktop for Mac and Docker Desktop for Windows for the complete set of changes in Docker Desktop 3.5. 

We can’t wait for you to try Volume Management and the preview of Dev Environments! To get started simply download or update to Docker Desktop 3.5. To start collaborating with your teammates on your dev environments and digging into the contents of your volumes, upgrade to a Pro or Team subscription today!
The post Improved Volume Management, Docker Dev Environments and more in Desktop 3.5 appeared first on Docker Blog.
Quelle: https://blog.docker.com/feed/