Real-time diagnostics from nanopore DNA sequencers on Google Cloud

In a healthcare setting, being able to access data quickly is vital. For example, a sepsis patient’s survival rate decreases by 6% for every hour we fail to diagnose the species causing the infection and its antibiotic resistance profile.Typical genomic analyses are too slow. You transport DNA samples from the collection point to a centralized facility to be sequenced and analyzed in a batch process, which can take weeks or even months. Recently, nanopore DNA sequencers have become commercially available that stream raw signal-level data as they are collected and provide immediate access to them. However, processing the data in real-time remains challenging, requiring substantial compute and storage resources, as well as a dedicated bioinformatician. Not only is the process still too slow, it’s also failure-prone, expensive, and doesn’t scale.We recently built out a proof of concept for genomics researchers and bioinformatics developers that highlights the breadth and depth of Google Cloud’s data processing tools. In this article we describe a scalable, reliable, and cost effective end-to-end pipeline for fast DNA sequence analysis built on Google Cloud and this new class of nanopore DNA sequencers.We envision four scenarios that can use this application, specifically to detect biocontaminants:Medical professionalsVeterinary clinicsAgronomistsBiosecurity professionalsIn all cases, analytical results are made available in a dynamic dashboard for immediate insight, decision-making, and action.Here’s a video of the University of Queensland’s application performing real-time analysis of DNA nanopore sequencer data:How the research team built itThe team’s primary concern while building this system was to shorten the time between when the data is uploaded from the sequencer and when results are available.To keep things fast on the client-side, the team implemented a dynamic dashboard with D3.js, which periodically polls a database for new data and updates the chart accordingly. More specifically, they based their visualization on Sunburst. Server-side, they used Firebase, a document-storage system that can represent hierarchical data (necessary to represent biological taxonomies) and that is designed with web and mobile developers in mind. You can find all the code in the GitHub nanostream-dataflow project.From a system architecture perspective, the team relied on a variety of Google Cloud Platform compute, storage and data processing tools. Data are collected from a nanopore DNA sequencer, and as data become available (New DNA read created), they are uploaded to a Cloud Storage bucket (Upload Bucket). As files are uploaded, they are ingested into a workflow that converts the input files into actionable reports (Dynamic D3 Visualization).The team made extensive use of the Apache Beam library to implement the data processing logic. Beam workflows can be run on Google Cloud Dataflow, which makes integration with other Google Cloud services easy. The team used Compute Engine to build the auto-scaling Alignment Cluster (here’s a codelab), and Firebase for visualization.Use casesThe possibilities for real-time DNA sequencing are endless. Here are a few that the team tested, and others that we imagine.1. Dog bite infection diagnosis from a blood sample: a clinical use case for taxonomical countingA patient with a recent history of a dog bite was admitted to the intensive care unit with severe sepsis, including multi-organ failure, severe acute kidney injury and haemolysis. After failing to grow on blood cultures for three days, a successful diagnosis was obtained after nineteen hours of nanopore sequencing.While the majority of nanopore reads mapped to the human genome, the team also observed the bacterial genomes Capnocytophaga canimorsus, a pathogenic species of gingival flora of canine and feline species that is sometimes transmitted through bites and licks.Bacterial cultures were obtained only after four days of growth, allowing for positive identification of C. canimorsus pathogen six days after the patient was admitted. The patient survived and is well. You can read a full clinical description of this case in Rapid Diagnosis of Capnocytophaga canimorsus Septic Shock in an Immunocompetent Individual Using Real-Time Nanopore Sequencing.The team also prepared an interactive data visualization of the breakdown of detected species. Here’s a static preview of the data visualization:The data used to prepare the figure are in gs://nanostream-dataflow-demo-data/2_Klebsiella. To preserve the privacy of the patient, the research team replaced all reads mapping to the human genome with sequences from human reference genome NA12889.2. Superbug monitoring: profiling and tracking the evolution of antibiotic resistance genes in a Klebsiella strainThe team sequenced DNA extracted from a clinical isolate of an extensively drug resistant “superbug” ST258 Klebsiella pneumoniae strain to characterize its antibiotic resistance profile. A full description is available in Multifactorial chromosomal variants regulate polymyxin resistance in extensively drug-resistant Klebsiella pneumoniae. The team presents the results for sample 2_GR_12 here. An interactive data visualization of the breakdown of detected resistance genes is the K. pneumoniae sample is available. Data used to prepare the figure are unavailable due to presence of reads mapping to the human genome.Here’s a static preview of the resistance genes identified, with arcs proportional to the number of times a gene was observed. The visualization demonstrates the presence of genes conferring resistance to multiple drug classes, consistent with phenotypic drug resistance testing results, which found this isolate to be resistant to all drugs tested. In a clinical setting, rapid identification of extensive drug resistance such as this can help prioritize use of the latest generation of antibiotics, or potentially suggest antibiotic drug combinations.3. Pathogen detection in sewage: a public health use caseThe previous examples demonstrate the utility of sequencing patient DNA. In an unpublished collaboration with Dr. Guangming Jiang from the Australian Water Management Centre, the team sequenced DNA obtained from sewage to identify bacterial species present.Environmental sensing and anomaly detection of air- and water-borne organisms is another promising use case for streaming DNA sequence analysis. This technique further generalizes to food safety and customs/border control applications.Interactive data visualizations of taxonomical proportions and resistance genes in the sample are available, and data used to prepare the figures are in gs://nanostream-dataflow-demo-data/4_Sewage.Here are static previews of the interactive data visualization that demonstrates the complexity of the sewage microbiome. Analysis of the acquired resistance genes present in this sample reveal a higher prevalence of beta-lactamase, aminoglycoside and macrolide resistance genes.4. Agricultural Use Case: identification of viruses in cassava crops in AfricaCassava is a major staple crop in Africa and is the third largest source of carbohydrates in the world. Cassava mosaic virus (CMV) causes cassava mosaic disease (CMD) which can cause crop yield losses of more than 80%.Cassava is vegetatively propagated, so it is vulnerable to viral infections and CMD is spread primarily via movement of cuttings from disease-affected cassava.Dr. Laura Boykin has been using nanopore sequencing plant material to identify viral plant infections in a variety of countries in sub-saharan Africa. The analysis performed here is based on data described in Real time portable genome sequencing for global food security. The visualization indicates that the plant sample processed is contaminated with CMV.Here’s an interactive data visualizations of taxonomical proportions in the sample are available, and data used to prepare the figure are in gs://nanostream-dataflow-demo-data/5_Cassava.Here’s a static previews of the interactive data visualization that clearly shows the sequenced sample is infected with CMV.ConclusionNanopore DNA sequencers reduce the time it takes to generate DNA sequence data from weeks to minutes by providing a portable, miniaturized sequencer which can be taken to the sample—the patient, sewage plant, or crop field—as well as providing access to sequence data as soon as it is generated. Google Cloud provides highly scalable computing resources, plus frameworks for processing data in a continuous stream. The application the team built is responsive—simply synchronize the data with a Cloud Storage bucket to automatically initialize the pipeline, and it scales automatically to keep pace with data generation, while continuously updating the data analysis in a browser-based visualization.Google Cloud strives to build tools essential to a variety of clinical, public health, agrarian, and security settings. You can learn more about genomics and data processing on Google Cloud.
Quelle: Google Cloud Platform

Re-thinking federated identity with the Continuous Access Evaluation Protocol

In today’s mobile and cloud-centric world, your typical enterprise user is logged in simultaneously to multiple cloud- and enterprise-hosted apps using federation protocols or certificates. These login sessions can last hours or even days—especially on mobile devices.Increasingly however, whether or not to authorize a user session needs to be based on dynamic data such as the device’s location, IP location, device and app health, and user privileges. Imagine, for example, a user in the U.S. who is logged in on their phone to a cloud-based CRM service, and they get on a plane to China. When they land, the CRM provider needs to detect that new location and change the user’s access accordingly.    Here are some other scenarios that could benefit from dynamic authorization decisions:A device connected into a corporate VPN needs to be disconnected after a malicious app is observed to be present on the device.A file sharing app discovers the user’s IP address has changed, and needs to re-evaluate the user’s access privilege given its new IP location.A user is added to a task group that requires access to a specific customer account. The CRM app must be notified of this change in order for the user to be able to access the required resources.Unfortunately, providing this kind of dynamic access authorization can be difficult. Today’s technology determines access authorization only at the time of authentication, typically with the help of a  federated identity provider—or in the case of TLS client-auth, by the server-side app itself. Even with enterprise infrastructure such as WiFi routers or VPN servers, it’s hard for cloud-based identity providers to signal a change in session authorization.Introducing the Continuous Access Evaluation ProtocolContinuous access evaluation is a new approach to access authorization that enables independent parties to control live user session properties. Sometimes referred to as “continuous authentication” by our industry peers, Google’s vision for a Continuous Access Evaluation Protocol (or CAEP) addresses the same concerns, but uses a standards-based approach.Our vision for continuous access evaluation is based on a publish-and-subscribe (“pub-sub”) approach. Pub-sub is a good model to convey updated information about a session between apps, infrastructure, identity providers, device management services and device security services—regardless of whether they’re in the cloud or on-premises. Specifically, a publish-and-subscribe model has the following advantages:It’s complementary to federated or cert-based authenticationIt’s not as chatty as WAMIt doesn’t impact latency for user accessUsing pub-sub, a server-side endpoint—either a cloud app or an identity provider—can convey updated information about a session to interested parties. If a user is logged into multiple apps or infrastructure endpoints, they’re all notified about the updated status.In contrast, federated identity, which is the most commonly used authentication system, is a “fire and forget” model—authorization decisions are only evaluated at login time. (Before the federated model was popular, enterprises used a chatty WAM model and evaluated every access to an app using a central access management server. This model, of course, isn’t viable with today’s traffic volumes and distributed environments.)CAEP publishers and subscribersIn a typical cloud environment, a service can function either as a publisher or subscriber for various events. For example, an identity provider service is the publisher for authorization decisions or user attributes, but a VPN server or a SaaS app may also be a publisher for client IP address within a session.On the flip side, a VPN server or SaaS app will typically subscribe to the identity provider’s authorization decisions or user attributes, and the identity provider may subscribe to information about a client IP from a VPN server or a SaaS app.In other words, with CAEP, a typical cloud session may have multiple publishers such as identity providers, device management services, and security services, etc. It may also have multiple subscribers, e.g., multiple cloud apps, enterprise apps, and VPN and WiFi routers, etc.Interacting with CAEPCAEP allows publishers and subscribers to communicate a wide range of information about their active user sessions. You can see the CAEP’s operational flows by the interactions below.In the diagram above, the interactions are:Service request: The device or app requests service from a relying party. This can happen multiple times during the life of an authenticated user session (e.g., each HTTP request is a service request.) The response can either be the successful completion of the request or a remediation response.Context update: If anything about a session is different from the previous access (e.g., first time access after authentication or a changed IP), the relying party publishes the updated context. This update message can also contain an interest or disinterest in receiving updates about the session. Subscribers to these messages may include policy servers such as identity providers.User, device or policy update: If a policy service learns about changes that impact a session (either from its own observations or after getting notified by a relying party), it processes and publishes updated information to all of that session’s subscribers.Remediation response: An update may result in the user, device, or app needing to be remediated. In this case, the relying party provides a response to a service request that specifies what went wrong and what remediation actions the user must take in order to resume services.Note that in the above flow diagram, only interactions #1 and #4 have a request / response semantic. Interactions #2 and #3 are asynchronous updates that may be triggered at any time.Establishing trust with CAEPEach party in this pub-sub model establishes point-to-point trust with other parties. Each party announces what information they are capable of publishing about a  user session, and the trusting party determines which information that a publisher announces may be trusted. These communication channels use peer-authenticated TLS to ensure authenticity, privacy and integrity.CAEP use casesHere are some ways in which CAEP can solve real-world issues:GeolocationA user of a file sharing service travels to a foreign country with weak IP protections.CAEP solutionThe file sharing service provider publishes an event that the user’s IP location has changed.The identity provider, which had expressed interest in the session by previously authorizing the service provider to allow access, is notified that the user’s IP location has changed.It then re-evaluates the user’s access privileges and publishes new user attributes (including authorization decisions specific to the service provider) for all sessions that the user had logged into.All service providers interested in that user’s sessions (including the file sharing service) obtain the new user attributes that include decisions on whether the user should continue to be allowed access to certain resources.The service provider disables access to the user for certain resources.App vulnerabilityA vulnerability is discovered in a popular mobile app.CAEP solutionThe policy server re-evaluates access decisions for all users based on updated information from its internal vulnerability assessment. It publishes a termination message for all sessions that it knows to be using the mobile app. Service providers subscribing to those sessions receive the new message and terminate the client-session.Suspicious user activityA mobile phone belonging to an authenticated user has just downloaded suspicious apps and visited untrusted websites.CAEP solutionAn endpoint security service monitoring the device obtains information about the suspicious activity. It publishes a message that invalidates all sessions from that device. All service providers subscribed to those sessions then invalidate their internal sessions from that device, and the user needs to re-authenticate from that device in order to proceed.Standardizing access authorizationWith the rise of mobile devices and cloud-based apps, the time has come to reevaluate  federated approaches to identity and authorization. Here at the Google Cloud Identity team, we intend to submit CAEP as an open standard that leverages existing standards such as SET. A related effortin Google aims to standardize consumer account related security events through the RISC working group in the OpenID foundation. CAEP could be implemented as an extension of the same RISC proposal.Can you think of more use cases where CAEP would be useful? Want to participate and keep up-to-date on CAEP? Provide your feedback here.
Quelle: Google Cloud Platform

Modernize alerting using Azure Resource Manager storage accounts

Classic alerts in Azure Monitor will reach retirement this coming June. We recommend that you migrate your classic alert rules defined on your storage accounts, especially if you want to retain alerting functionality with the new alerting platform. If you have classic alert rules configured on classic storage accounts, you will need to upgrade your accounts to Azure Resource Manager (ARM) storage accounts before you migrate alert rules.

For more information on the new Azure Monitor service and classic alert retirement read the article, “Classic alerts in Azure Monitor to retire in June 2019.”

Identify classic alert rules

You should first find all classic alert rules before you migrate. The following screenshot shows how you can identify classic alert rules in the Azure portal. Please note, you can filter by subscription so you can find all classic alert rules without checking on each resource separately.

Migrate classic storage accounts to ARM

New alerts do not support classic storage accounts, only ARM storage accounts. If you configured classic alert rules on a classic storage account you will need to migrate to an ARM storage account.

You can use "Migrate to ARM" to migrate using the storage menu on your classic storage account. The screenshot below shows an example of this. For more information on how to perform account migration see our documentation, “Platform-supported migration of laaS resources from classic to Azure Resource Manager.”

Re-create alert rules in new alerting platform

After you have migrated the storage account to ARM, you then need to re-create your alert rules. The new alerting platform supports alerting on ARM storage accounts using new storage metrics. You can read more about new storage metric definitions in our documentation, “Azure Storage metrics in Azure Monitor.” In the storage blade, the menu is named "Alert" for the new alerting platform.

Before you re-create alert rules as a new alert for your storage accounts, you may want to understand the difference between classic metrics and new metrics and how they are mapped. You can find detailed mapping in our documentation, “Azure Storage metrics migration.”

The following screenshot shows how to create an alert based on “UsedCapacity.”

Some metrics include dimension, which allows you to see and use different dimension value types. For example, the transactions metric has a dimension named “ResponseType” and the values represent different type of errors and success. You can create an alert to monitor transactions on a particular error such as “ServerBusyError” or “ClientOtherError” with “ResponseType”.

The following screenshot shows how to create an alert based on Transactions with “ClientOtherError.”

In the list of dimension values, you won't see all supported values by default. You will only see values that have been triggered by actual requests. If you want to monitor conditions that have not happened, you can add a custom dimension value during alert creation. For example, when you have not had anonymous requests to your storage account yet, you can still setup alerts in advance to monitor such activity from upcoming requests.

The following screenshot shows how to add a custom dimension value to monitor upcoming anonymous transactions.

We recommend creating the new alert rules first, verify they work as intended, then remove the classic alerts.

Azure Monitor is a unified monitoring service that includes alerting and other monitor capabilities. You can read more in the “Azure Monitor documentation.”
Quelle: Azure

Making AI-powered speech more accessible—now with more options, lower prices, and new languages and voices

The ability to recognize and synthesize speech is critical for making human-machine interaction natural, easy, and commonplace, but it’s still too rare. Today we’re making our Cloud Speech-to-Text and Text-to-Speech products more accessible to companies around the world, with more features, more voices (roughly doubled), more languages in more countries (up 50+%), and at lower prices (by up to 50% in some cases).Making Cloud Speech-to-Text more accessible for enterprisesWhen creating intelligent voice applications, speech recognition accuracy is critical. Even at 90% accuracy, it’s hard to have a useful conversation. Unfortunately, many companies build speech applications that need to run on phone lines and that produce noisy results, and that data has historically been hard for AI-based speech technologies to interpret.For these situations with less than pristine data, we announced premium models for video and enhanced phone in beta last year, developed with customers who opted in to share usage data with us via data logging to help us refine model accuracy. We are excited to share today that the resulting enhanced phone model now has 62% fewer transcription errors (improved from 54% last year), while the video model, which is based on technology similar to what YouTube uses for automatic captioning, has 64% fewer errors. In addition, the video model also works great in settings with multiple speakers such as meetings or podcasts.The enhanced phone model was initially available only to customers participating in the opt-in data logging program announced last year. However, many large enterprises have been asking us for the option to use the enhanced model without opting into data logging. Starting today, anyone can access the enhanced phone model, and customers who choose the data logging option pay a lower rate, bringing the benefits of improved accuracy to more users.In addition to the general availability of both premium models, we’re also announcing the general availability of multi-channel recognition, which helps the Cloud Speech-to-Text API distinguish between multiple audio channels (e.g., different people in a conversation), which is very useful for doing call or meeting analytics and other use cases involving multiple participants. With general availability, all these features now qualify for an SLA and other enterprise-level guarantees.Cloud Speech-to-Text at LogMeInLogMeIn is an example of a customer that requires both accuracy and enterprise scale: Every day, millions of employees use its GoToMeeting product to attend an online meeting. Cloud Speech-to-Text lets LogMeIn automatically create transcripts for its enterprise GoToMeeting customers, enabling users to collaborate more effectively. “LogMeIn continues to be excited about our work with Google Cloud and its market-leading video and real-time speech to text technology. After an extensive market study for the best Speech-to-Text video partner, we found Google to be the highest quality and offered a useful array of related technologies. We continue to hear from our customers that the feature has been a way to add significant value by capturing in-meeting content and making it available and shareable post-meeting. Our work with Google Cloud affirms our commitment to making intelligent collaboration a fundamental part of our product offering to ultimately add more value for our global UCC customers.” – Mark Strassman, SVP and General Manager, Unified Communications and Collaboration (UCC) at LogMeIn.Making Cloud Speech-to-Text more accessible through lower pricing (up to 50% cheaper)Lowering prices is another way we are making Cloud Speech-to-Text more accessible. Starting now:For standard models and the premium video model, customers that opt-in to our data logging program will now pay 33% less for all usage that goes through the program.We’ve cut pricing for the premium video model by 25%, for a total savings of 50% for current video model customers who opt-in to data logging.Making Cloud Text-to-Speech accessible across more countriesWe’re also excited to help enterprises benefit from our research and experience in speech synthesis. Thanks to unique access to WaveNet technology powered by Google Cloud TPUs, we can build new voices and languages faster and easier than is typical in the industry: Since our update last August, we’ve made dramatic progress on Cloud Text-to-Speech, roughly doubling the number of overall voices, WaveNet voices, and WaveNet languages, and increasing the number of supported languages overall by ~50%, including:Support for seven new languages or variants, including Danish, Portuguese/Portugal, Russian, Polish, Slovakian, Ukrainian, and Norwegian Bokmål (all in beta). This update expands the list of supported languages to 21 and enables applications for millions of new end-users.31 new WaveNet voices (and 24 new standard voices) across those new languages. This gives more enterprises around the world access to our speech synthesis technology, which based on mean opinion score has already closed the quality gap with human speech by 70%. You can find the complete list of languages and voices here.20 languages and variants with WaveNet voices, up from nine last August–and up from just one a year ago when Cloud Text-to-Speech was introduced, marking a broad international expansion for WaveNet.In addition, the Cloud Text-to-Speech Device Profiles feature, which optimizes audio playback on different types of hardware, is now generally available. For example, some customers with call center applications optimize for interactive voice response (IVR), whereas others that focus on content and media (e.g., podcasts) optimize for headphones. In every case, the audio effects are customized for the hardware.Get started todayIt’s easy to give Cloud Speech products a try—check out the simple demos on the Cloud Speech-to-Text and Cloud Text-to-Speech landing pages. If you like what you see, you can use the $300 GCP credit to start testing. And as always, the first 60 minutes of audio you process every month with Cloud Speech-to-Text is free.
Quelle: Google Cloud Platform