Pi in the sky: Calculating a record-breaking 31.4 trillion digits of Archimedes’ constant on Google Cloud

Ever since the ancient Babylonians, people have been calculating the digits of π, the ratio of the circumference of a circle to its diameter that starts as 3.1415… and goes on forever. In honor of Pi Day, today March 14 (represented as 3/14 in many parts of the world), we’re excited to announce that we successfully computed π to 31.4 trillion decimal places—31,415,926,535,897 to be exact, or π * 1013. This has broken a GUINNESS WORLD RECORDSTM title , and the first time the record was broken using the cloud, proving Google Cloud’s infrastructure works reliably for long and compute-heavy tasks.We achieved this feat using y-cruncher, a Pi-benchmark program developed by Alexander J. Yee, using a Google Compute Engine virtual machine cluster. 31.4 trillion digits is almost 9 trillion digits more than the previous world record set in November 2016 by Peter Trueb. Yee independently verified the calculation using Bellard’s formula and BBP formula. Here are the last 97 digits of the result.6394399712 5311093276 9814355656 1840037499 35734609921433955296 8972122477 1577728930 8427323262 4739940You can read more details of this record from y-cruncher’s perspective in Yee’s report.A constant raceGranted, most scientific applications don’t need π beyond a few hundred digits, but that isn’t stopping anyone; starting in 2009, engineers have used customized personal computers to calculate trillions of digits of π. In fact, the race to calculate more π digits has only accelerated as of late, with computer scientists using it as a way to test supercomputers, and mathematicians to compete against one another.However, the complexity of Chudnovky’s formula—a common algorithm for computing π—is O(n (log n)3). In layman’s terms, this means that the time and resources necessary to calculate digits increase more rapidly than the digits themselves. Furthermore, it gets harder to survive a potential hardware outage or failure as the computation goes on.For our π calculation, we decided to go to the cloud. Using Compute Engine, Google Cloud’s high-performance infrastructure as a service offering, has a number of benefits over using dedicated physical machines. First, Compute Engine’s live migration feature lets your application continue running while Google takes care of the heavy lifting needed to keep our infrastructure up to date. We ran 25 nodes for 111.8 days, or 2,795 machine-days (7.6 machine-years), during which time Google Cloud performed thousands of live migrations uninterrupted and with no impact on the calculation process.Running in the cloud also let us publish the computed digits entirely as disk snapshots. In less than an hour and for as little as $40/day, you can copy the snapshots, work on the results, and dispose of the computation resources. Before cloud, the only feasible way to distribute such a large dataset was to ship physical hard drives.Then there are the general benefits of running in the cloud: availability of a broad selection of hardware, including the latest Intel Skylake processors with AVX-512 support. You can scale your instances up and down on demand, and kill off when you are done with them, only having paid for what you used.Here are additional details about the program:An overview of our π cluster architectureCluster designWe selected an n1-megamem-96 instance for the main computing node. It was the biggest virtual machine type available on Compute Engine that provided Intel Skylake processors at the beginning of the project. The Skylake generation of Intel processors supports AVX-512, which are 512-bit SIMD extensions that can perform floating point operations on 512-bit data or eight double-precision floating-point numbers at once.Currently, each Compute Engine virtual machine can mount up to 64 TB of Persistent Disks. We used the iSCSI protocol to remotely attach Persistent Disks to add additional capacity. The number of nodes were decided based on y-cruncher’s disk benchmark performance. We selected n1-standard-16 for the iSCSI target machines to ensure sufficient bandwidth between the computing node and the storage as the network egress bandwidth and Persistent Disk throughput are determined by the number of vCPU cores.How to get your hands on the digitsOur pi.delivery service provides a REST API to access the digits on the web. It also has a couple of fun experiments that lets you visualize and listen to π.To make it easier for you to use these digits in your own work, we have made the resulting π digits available as snapshots on Google Cloud Platform. Each snapshot contains a single text file with the decimal digits and you can create a new Persistent Disk based on these images. We provide both XFS and NTFS disk formats to accommodate Linux and Windows operating systems respectively. The snapshots are located in the us multi-region.You need to join the pi-31415926535897 Google Group to gain access. It will cost approximately $40 per day to keep the cloned disk in one of the us-central1, us-west1, and us-east1 regions in your project. We will keep the snapshots until March 14, 2020. The snapshots are available at the following locations.XFS: https://www.googleapis.com/compute/v1/projects/pi-31415926535897/global/snapshots/decimal-digits-xfsNTFS: https://www.googleapis.com/compute/v1/projects/pi-31415926535897/global/snapshots/decimal-digits-ntfsTo create a new disk named pi314-decimal-digits-xfs in your project based on the XFS snapshot, for example, type the following command:gcloud compute disks create pi314-decimal-digits-xfs –source-snapshot https://www.googleapis.com/compute/v1/projects/pi-31415926535897/global/snapshots/decimal-digits-xfsRemember to delete the disk once you’re done with it to avoid unexpected charges.gcloud compute disks delete pi314-decimal-digits-xfsPlease refer the restoring a non-boot disk snapshot section and the gcloud compute disks create command help for more instructions on how to use these images.Coming full circleThe world of math and sciences is full of records just waiting to be broken. We had a great time calculating 31.4 trillion π digits, and look forward to sinking our teeth into other great challenges. Until then, let’s celebrate the day with fun experiments. Our Pi Day Celebration cloud experiment on our Showcase experiments website lets you generate a custom art piece from digits of π that you pick. And if you’re going to Google Cloud Next ’19 in San Francisco, come to our deep-dive technical session together with Alexander Yee to discuss details and insights from this experiment, interact with the Showcase experiment and watch a live experiment with the π digits inside the DevZone.
Quelle: Google Cloud Platform

Monitoring on HDInsight Part 1: An Overview

Azure HDInsight offers several ways to monitor your Hadoop, Spark, or Kafka clusters. Monitoring on HDInsight can be broken down into three main categories:

Cluster health and availability
Resource utilization and performance
Job status and logs

Two main monitoring tools are offered on Azure HDInsight, Apache Ambari which is included with all HDInsight clusters and optional integration with Azure Monitor logs, which can be enabled on all HDInsight clusters. While these tools contain some of the same information, each has advantages in certain scenarios. Read on for an overview of the best way to monitor various aspects of your HDInsight clusters using these tools.

Cluster health and availability

Azure HDInsight is a high-availability service that has redundant gateway nodes, head nodes, and ZooKeeper nodes to keep your HDInsight clusters running smoothly. While this ensures that a single failure will not affect the functionality of a cluster, you may still want to monitor cluster health so you are alerted when an issue does arise. Monitoring cluster health refers to monitoring whether all nodes in your cluster and the components that run on them are available and functioning correctly. Ambari is the recommended way to monitor the health for any given HDInsight cluster. You can learn more about monitoring cluster availability using Ambari in our documentation, “Availability and reliability of Apache Hadoop clusters in HDInsight.”

Ambari portal view showing the status of all components on a head node

Cluster resource utilization and performance

To maintain optimal performance on your cluster, it is essential to monitor resource utilization. This can be accomplished using Ambari and Azure Monitor logs.

With Ambari

Ambari is the recommended way to monitor utilization across the whole cluster. The Ambari dashboard shows easily glanceable widgets that display metrics such as CPU, network, YARN memory, and HDFS disk usage. The “Hosts” tab shows metrics for individual nodes so you can ensure the load on your cluster is evenly distributed. The “YARN Queue Manager” is also accessible through Ambari. This allows you to manage the capacity of each of your job queues to see how jobs are distributed between them and whether any jobs are resource constrained. Read more about using Ambari to monitor cluster performance in our documentation, “Monitor cluster performance.”

The Ambari Portal dashboard that shows the utilization of your entire cluster at a glance

With Azure Monitor logs

You can monitor resource utilization at the virtual machine (VM) level using Azure Monitor logs. All VMs in an HDInsight cluster push performance counters into the Perf table in your Log Analytics workspace, including CPU, memory, and disk usage. Like any other Log Analytics table, you can query the Perf table, create visualizations with view designer, and configure alerts. One of the key benefits of Log Analytics is that you can push metrics and logs from multiple HDInsight clusters to the same Log Analytics workspace, allowing you to monitor multiple clusters in one place. You can read more about working with performance data in Azure Monitor logs by visiting our documentation, “View or analyze data collected with Log Analytics log search.”

Job status and logs

Another key part of monitoring HDInsight clusters is monitoring the status of submitted jobs and viewing relevant logs to assist with debugging. You may want to know how many jobs are currently running or when a job fails.

With Azure Monitor logs

The recommended way to do this on Azure HDInsight is through Azure Monitor logs. HDInsight clusters emit workload-specific logs from the OSS components and metrics with each line being a record. An example of this would be the number of apps pending, failed, and killed for Spark/Hadoop clusters and incoming messages for Kafka clusters. You can query the tables and set up alerts when certain metrics meet your defined thresholds. For example, you could set up an alert that fires and sends you an email or takes some other action whenever a Spark job fails.

HDInsight monitoring solutions

Workload-specific HDInsight monitoring solutions that build on top of the Azure Monitor logs integration are also available. These solutions are premade dashboards that contain visualizations for the aforementioned workload metrics. For example, the Spark solution shows graphs of metrics like pending, failed, and killed apps over time. Because these solutions are backed by a Log Analytics workspace, the visualizations show data for all clusters that emit metrics to the workspace. In result, you can see visualizations of these workload metrics from multiple clusters of the same type and all in one place.

The HDInsight Spark monitoring solution

With Ambari

You can also view workload information from Spark/Hadoop clusters in the YARN ResourceManager UI, which is accessible via the Ambari portal.  The YARN UI shows detailed information about all job submissions and provide a link to the capacity scheduler, where you can view information about your job queues. You can also access raw ResourceManager log files through the Ambari portal if you need to further debug jobs.

Try HDInsight now

Between Apache Ambari and Azure Log Analytics integration, HDInight offers comprehensive tools for monitoring all aspects of your HDInsight cluster. We hope you will take full advantage of monitoring on HDInsight and we are excited to see what you will build with Azure HDInsight. Read this developer guide and follow the quick start guide to learn more about implementing these pipelines and architectures on Azure HDInsight. Stay up-to-date on the latest Azure HDInsight news and features by following us on Twitter #AzureHDInsight and @AzureHDInsight. For questions and feedback, reach out to AskHDInsight@microsoft.com.

About HDInsight

Azure HDInsight is an easy, cost-effective, enterprise-grade service for open source analytics that enables customers to easily run popular open source frameworks including Apache Hadoop, Spark, Kafka, and others. The service is available in 36 public regions and Azure Government and National Clouds. Azure HDInsight powers mission-critical applications in a wide variety of sectors and enables a wide range of use cases including ETL, streaming, and interactive querying.
Quelle: Azure

Simplify disaster recovery with Managed Disks for VMware and physical servers

Azure Site Recovery (ASR) now supports disaster recovery of VMware virtual machines and physical servers by directly replicating to Managed Disks. Beginning in March 2019 and moving forward, all new protections have this capability available on the Azure portal. In order to enable replication for a machine, you no longer need to create storage accounts. You can now write replication data directly to a type of Managed Disk. The choice of Managed Disk type should be based on the data change rate on your source disks. Available options are Standard HDD, Standard SSD and Premium SSD.

Please note, this change will not impact the machines which are already in a protected state. They will continue to replicate to storage accounts. However, you can still choose to use Managed Disks at the time of failover by updating the settings in compute and network blade.

There are benefits in writing to Managed Disks:

Hassle free management of capacity on Microsoft Azure: You don’t need to track and manage multiple target storage accounts anymore. ASR will create the replica disks at the time of enabling replication. An Azure Managed Disk is created for every virtual machine (VM) disk at on-premises. This is managed by Azure.
Seamless movement between different types of Managed Disks: After enabling protection, if the data change rate or churn pattern on your source disk changes, you do not need to disable and enable the replication with Managed Disks. You can simply choose to switch the type of Managed disk in order to handle the new data change rate. However, once you change the Managed Disk type, please be sure that you wait for fresh recovery points to be generated if you need to do test failover or failover post this activity.

The replication architecture for ASR is refined with replication logs first being uploaded to a cache storage account in Azure. These logs are processed by ASR and then pushed into the replica Managed Disk in Azure. Snapshots are created on these Managed Disks at a frequency which is applied by replication policy at the time of enabling replication. You can find the name of replica and target Managed Disks on the disks blade of the replicated item. At the time of failover, one of the recovery points on replica Managed Disk is chosen by the customer. This recovery point is used to create the target Managed Disk in Azure which is attached to the VM when it is brought up.

It is recommended to use the replication option LRS for cache storage. Since cache account is standard storage and only holds temporary data, it is not required to have multiple cache storage accounts in a recovery services vault.

Get started with ASR today. Support for writing to Managed Disks is available in all Azure regions. It will be released on national clouds soon!

Related links and additional content

Tag Managed Disks in Azure for billing
Learn more about pricing of Managed Disks
Learn more about Azure Site Recovery churn limits
Set up disaster recovery for VMware or physical machines to Azure

Quelle: Azure