March 2019 changes to Azure Monitor Availability Testing

Azure Monitor Availability Testing allows you to monitor the availability and responsiveness of any HTTP or HTTPS endpoint that is accessible from the public internet. You don't have to add anything to the web site you're testing. It doesn't even have to be your site, you could test a REST API service you depend on. This service sends web requests to your application at regular intervals from points around the world. It alerts you if your application doesn't respond, or if it responds slowly.

At the end of this month we are deploying some major changes to this service, these changes will improve performance and reliability, as well as allow us to make more improvements to the service in the future. This post will highlight some of the changes, as well as describe some of the changes you should be aware of to ensure that your tests continue running without any interruption.

Reliability improvements

We are deploying a new version of the availability testing service. This new version should improve the reliability of the service, resulting in fewer false alarms. This change also increases the capacity for the creation of new availability tests, which is greatly needed as Application Insights usage continues to grow. Additionally, the architecture of this new design enables us to add new regions much more easily. Expect to see additional regions from which you can test your app’s availability in the future!

New UI

Along with the new backend architecture, we are updating the availability testing UI with a brand new design. See the image below for a sneak peek of the UI that we will be rolling out for all customers in the next few weeks. 

The new design is more consistent with other experiences in Application Insights. It reduces the number of clicks needed to see highly requested information, and surfaces insights about your availability tests to the right side of the availability scatter plot. The new chart supports time brushing, you can click and drag over a section of the chart to zoom into just that time period. Additionally, this design loads faster than the previous one!

IP address changes

If you have whitelisted certain IP addresses because you are running web tests on your app, but your web server is restricted to serving specific clients, then you should be aware that we are deploying our service on new IP ranges. We are increasing the capacity of our service, and this requires adding additional test agents.

Effective March 20, 2019, we will begin running tests from our new test agents, and this will require you to update your whitelist. The list containing all of the necessary whitelisted IPs, including our previous IP ranges and the new IP ranges is published in our documentation, “IP addresses used by Application Insights and Log Analytics.”

France South changes

France South will no longer be offered as a region from which you can perform availability tests. All existing tests in France South will be moved to a duplicate service running in France Central which will appear in the portal as “France Central (formerly France South).”  If you already have a test running in France Central, this means that your test will run from France Central twice per time period. Your existing alert rules will not be affected.

New testing region

We will be adding an additional region within Europe from which to run availability tests. An announcement will be made when this region is available.

Next steps

Log into your Azure account today to get started with the new Application Insights Availability UX. You can also learn more about how to get started by visiting our “Azure Monitor Documentation.”
Quelle: Azure

Matching jobs and candidates across locations and languages with Cloud Talent Solution

Since launching Cloud Talent Solution last year, we’ve been working closely with employers and job boards to help improve the discoverability of jobs, as well as matching those jobs with the right candidates. Companies around the globe have told us that reaching a larger talent pool is consistently top of mind, and we also hear from job seekers about their unique job search and employment needs.We’re always working to add new features and functionality to connect employers and job seekers. Last year, we added job search by U.S. military occupational specialty code for Cloud Talent Solution customers in the United States. Veteran job board RecruitMilitary and employers such as Encompass Health are reporting strong engagement on their sites.To help companies reach even more candidates, we’re adding new functionality to Cloud Talent Solution’s job search API, supporting commute search by walking and cycling and enhancing our job search capabilities in more than 100 languages. We’ll be showcasing these features and more in our Cloud Next ‘19 session on April 11 in San Francisco.Search by commute now includes walking and cyclingCommute is a top consideration for all workers, so we built a commute search feature that lets our customers provide their users the ability to search for jobs via driving and public transit options. We’re now announcing the addition of walking and cycling to our commute search functionality. This feature enhancement was inspired by research studies with clients and users who have taught us many important nuances. In the United States, for example, cycling is often the only commuting option for those living in low-income communities. Cities across the country have started to prioritize their cyclists with added investments in dedicated bike lanes and multi-use paths. Outside the U.S., in Copenhagen 41% of commuters commute by cycling, and in Barcelona it is very common to walk to work.Employers such as Cox Communications are working with SmashFly and its recruitment platform to offer this commute search functionality to all candidates who visit their career site.“For more than 120 years, Cox has had a purposeful commitment to its employees and the communities where we live and work,” said Adam Glassman, Senior Manager of Employment Branding. “One of the ways the company lives those values is by building a diverse workforce and by creating an inclusive environment. This includes embracing the unique talents of people with a variety of backgrounds, perspectives and needs. We’re happy to be a Google Cloud Talent Solution partner and expect that these enhancements will open up our amazing company to a wider group of candidates, no matter how they chose to commute or what language they use to search.”Cox’s use of SmashFly helps them find job candidates. “SmashFly was founded to achieve a very simple mission: To fundamentally change how companies connect with talent,” said Thom Kenney, CEO at SmashFly. “We believe that Google Cloud Talent Solution helps us continue that mission, bringing machine learning to job search to truly transform the candidate experience for our clients. Google’s been a fantastic partner and we’re thrilled to continuously add new features, like commute search and military occupational code translation. We look forward to sharing this advanced functionality with more employers and job seekers.”Job boards are also sharing this functionality with their clients and users. “At College Recruiter, we’re very excited about the enhancement to the Cloud Talent Solutions commute search option,” said Steven Rothberg, President and Founder of College Recruiter. “Many of the job seekers who use our site are looking for part-time, seasonal, and internship opportunities while they’re in school, and many of them would strongly prefer to work within walking or cycling distance so they can avoid the cost and hassle of driving or using public transportation. Now, they can search for a part-time retail job within a 10-minute walk from their apartment instead of having to weed through dozens or even hundreds of part-time, retail jobs which are listed within their city.”Search in over 100 languages also returns jobs in EnglishIn addition to commute and lifestyle needs, language preference is another personal element of the job search experience. More than 100 languages are spoken at home across the U.S., especially in metropolitan cities such as Chicago, Dallas, and Philadelphia. To help companies reach candidates in whatever language they choose to speak, we’ve improved our support for job searches in more than 100 different languages by returning the relevant job postings that are often written in English. This way, employers and job sites can ensure they aren’t deterring users who prefer to search in a language other than that of the original job posting. Job seekers now can see jobs in the language they searched in, as well as jobs in English. Here’s an example of a search result for “enfermera,” the Spanish word for “nurse,” on Encompass Health’s career site, powered by Jibe and Cloud Talent Solution:All of these features are available to any of the more than 4,000 sites using Cloud Talent Solution to power their job search.And if you are an employer or running a job board or staffing agency and want to help more people find the right job opportunities on your site, join us at Cloud Next ‘19 in San Francisco to learn more. We’d love to see you at our session, Inclusive by Design: Engage and Recruit Diverse Talent with AI, on Thursday, April 11. You can also visit our website to get started with Cloud Talent Solution today.
Quelle: Google Cloud Platform

Introducing a new Coursera course on Site Reliability Engineering

Our Customer Reliability Engineering (CRE) team is on a mission to help every business become more reliable by making it easy to adopt Site Reliability Engineering (SRE). SRE is a discipline founded here at Google that utilizes prescriptive methods and principles for building and running reliable systems. With CRE, we work with customers and partners to reduce the operational burden of your systems, become more agile, and help you run reliable services for your users and customers.We want to make sure that teams everywhere can adopt SRE and implement these principles. That’s why we’re pleased to introduce a new Coursera course that’s dedicated to helping you get started with SRE. The new course, Site Reliability Engineering: Measuring and Managing Reliability, distills years of collective Google SRE experience with designing and managing complex systems that meet their reliability targets. We’re making it easy for developers to start learning the basics of SRE concepts and help the larger SRE community continue on their journey. You’ll learn at your own pace and find insight, whether you’re a new or experienced SRE.Some of the terms and concepts you’ll learn include:How to describe and measure the desired reliability of a serviceWhat it means to operate reliablyWhat SLOs, SLIs and SLAs areWhat error budgets are and how to use themHow to measure against your metrics and assess whether they’re realistic.Getting started with Coursera and SREIn the SRE course, you’ll learn about the basics, including how it came to be part of Google engineering, and what kind of tools SREs use to make decisions. You’ll start by learning about the goals of a reliable system and how that relates to user expectations. You’ll also learn about common monitoring practices, pros and cons of different measurement strategies, and specific recommendations on how to choose your own metrics.The course also dives into the details you’ll need to build your own set of service-level indicators (SLIs) and service-level objectives (SLOs), using a case study. You’ll see a method for performing risk analysis and see how to incorporate those findings into your long-term reliability goals. Additionally, you’ll cover documenting SLOs and assigning responsibilities to ensure you’re setting up a sustainable SRE practice.Get started today with SRE on Coursera as the next step in your SRE journey!
Quelle: Google Cloud Platform

Securely monitoring your Azure Database for PostgreSQL Query Store

A few months ago, I shared best practices for alerting on metrics with Azure Database for PostgreSQL. Though I was able to cover how to monitor certain key metrics on Azure Database for PostgreSQL, I did not cover how to monitor and alert on the performance of queries that your application is heavily relying on. As a PostgreSQL database, from time to time you will need to investigate if there are any queries running indefinitely on a PostgreSQL database. These long running queries may interfere with the overall database performance and likely get stuck on some background process. This blog post covers how you can set up alerting on query performance related metrics using Azure Functions and Azure Key Vault.

What is Query Store?

Query Store was a feature in Azure Database for PostgreSQL announced in early Fall 2018 that seamlessly enables tracking query performance over time. This simplifies performance troubleshooting by helping you quickly find the longest running and most resource-intensive queries. Learn how you can use Query Store on a wide variety of scenarios by visiting our documentation, “Usage scenarios for Query Store.” Query Store, when enabled, automatically captures a history of query runtime and wait statistics. It tracks this data over time so that you can see database usage patterns. Data for all users, databases, and queries is stored in a database named azure_sys in the Azure Database for PostgreSQL instance.

Query Store is not enabled on a server by default. However, it is very straightforward to opt-in on your server by following the simple steps detailed in our documentation, “Monitor performance with the Query Store.” After you have enabled Query Store to monitor your application performance, you can set alerts on various metrics such as long running queries, regressed queries, and more that you want to monitor.

How to set up alerting on Query Store metrics

You can achieve near real-time alerting on Query Store metrics monitoring using Azure Functions and Azure Key Vault. This GitHub repo provides you with an Azure Function and a PowerShell script to deploy a simple monitoring solution, which gives you some flexibility to change what and when to alert.

Alternatively, you can clone the repo to use this as a starting point and make code changes to better fit your scenario. The Visual Studio solution, when built with your changes, will automatically package the zip file you need to complete your deployment in the same fashion that is described here.

In this repo, the script DeployFunction creates an Azure function to serve as a monitor for Azure Database for PostgreSQL Query Store. Understanding the data collected by query performance insights will help you identify the metrics that you can alert on.

If you don't make any changes to the script or the function code itself and only provide the required parameters to DeployFunction script, here is what you will get:

A function app.
A function called PingMyDatabase that is time triggered every one minute.
An alert condition that looks for any query that has a mean execution time of longer than five seconds since the last time query store data is flushed to the disk.
An email when an alert condition is met with an attached list of all of the processes that was running on the instance, as well as the list of long running queries.
A key vault that contains two secrets named pgConnectionString and senderSecret that hold the connection string to your database and password to your sender email account respectively.
An identity for your function app with access to a Get policy on your secrets for this key vault.

You simply need to run DeployFunction on Windows PowerShell command prompt. It is important to run this script from Windows PowerShell. Using Windows PowerShell ISE will likely result in errors as some of the macros may not resolve as expected.

The script then creates the resource group and Key Vault deploys a monitoring function app, updates app configuration settings, and sets up the required Key Vault secrets. At any point during the deployment, you can view the logs available in the .logs folder.

After the deployment is complete, you can validate the secrets by going to the resource group in the Azure portal. As shown in the following diagram, two secrets keys are created, pgConnString and senderSecret. You can select the individual secrets if you want to update the value.

Depending on the condition set in the SENDMAILIF_QUERYRETURNSRESULTS app settings, you will receive an email alert when the condition is met.

How can I customize alert condition or supporting data in email?

After the default deployment goes through, using Azure portal you can update settings by selecting Platform features and then Application settings.

You can change the run interval, mail to, if condition, or supporting data to be attached by making changes to the below settings and saving them on your exit.

Alternatively, you can simply use az cli to update these settings like the following.

$cronIntervalSetting="CronTimerInterval=0 */1 * * * *"

az functionapp config appsettings set –resource-group yourResourceGroupName –name yourFunctionAppName –settings $cronIntervalSetting

Or

az functionapp config appsettings set –resource-group $resourceGroupName –name $functionAppName –settings "SENDMAILIF_QUERYRETURNSRESULTS=select * from query_store.qs_view where mean_time > 5000 and start_time >= now() – interval '15 minutes'"

Below are common cases on conditions that you can monitor and alert by either updating the function app settings after your deployment goes through or updating the corresponding value in DeployFunction.ps1 prior to your deployment:

Case

Function app setting name

Sample value

Query 3589441560 takes more than x milliseconds on average in the last fifteen minutes

SENDMAILIF_QUERYRETURNSRESULTS

select * from query_store.qs_view where query_id = 3589441560 and mean_time > x and start_time >= now() – interval '15 minutes'

Queries with cache hit less than 90 percent

SENDMAILIF_QUERYRETURNSRESULTS

select * , shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS as cache_hit from query_store.qs_view where shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) < 0.90

Queries with a mean execution time that is more than x milliseconds

SENDMAILIF_QUERYRETURNSRESULTS

select * from query_store.qs_view where mean_time > x and start_time >= now() – interval '15 minutes'

If an alert condition is met, check if there is an ongoing autovacuum operation, list the processes running and attach the results to email

LIST_OF_QUERIESWITHSUPPORTINGDATA

{“count_of_active_autovacuum”:” select count(*) from pg_stat_activity where position('autovacuum:' IN query) = 1 “,"list_of_processes_at_the_time_of_alert":"select now()-query_start as Running_Since,pid,client_hostname,client_addr, usename, state, left(query,60) as query_text from pg_stat_activity"}

How secure is this?

The script provides you with the mechanism to store your secrets in a Key Vault. Your secrets are secured as they are encrypted in-transit and at rest. However, the function app accesses the Key Vault over the network. If you want to avoid this and access your secrets over your virtual network (VNet) through the backbone, you will need to configure a VNet for both your function app and your Key Vault. Note, that VNet support of function apps is in preview and is currently available in selected Azure regions. When the proper deployment scenarios are supported, we may revisit this script to accommodate the changes. Until then, you will need to configure a VNet manually to accomplish the setup below.

We are always looking to hear feedback from you. If you have any feedback for the Query Store on PostgreSQL, or monitoring and alerting on query performance, please don’t hesitate to contact the Azure Database for PostgreSQL team.

Acknowledgments

Special thanks to Korhan Ileri, Senior Data Scientist, for developing the script and contributing to this post. As well as Tosin Adewale, Software Engineer from the Azure CLI team for closely partnering with us.
Quelle: Azure