An annual roundup of Google Data Analytics innovations

October 23rd (this past Sunday) was my 5th Googleversery and we just wrapped up an incredible Google Next 2022!  It was great to see so many customers and my colleagues in person this year in New York City. This blog is an attempt to share progress we have made since last year (4th year anniversary blog post 2021 Next). Bringing BigQuery to the heart of your Data CloudSince last year we have made significant progress across the whole portfolio. I want to start with BigQuery, which is at the heart of our customers’ Data Cloud. We have enhanced BigQuery with key launches like multi-statement transactions, Search and operational log analytics, native JSON support, slot recommender,interactive SQL translation from various dialects like Teradata, Hive, Spark, materialized views enhancements andtable snapshots. Additionally we have launched various enhancements to SQL language, accelerate customer cloud migration with BigQuery migration services and introduced scalable data transformation pipelines in BigQuery using SQL with the Dataform preview. One of the most significant enhancements to BigQuery is support for unstructured data in BigQuery through object tables. Object tables enable you to take advantage of common security and governance across your data.  You can now build data products that unify structured and unstructured data in BigQuery.To support data openness, at Next ’22 we announced the general availability of BigLake, to help you break down data silos by unifying lakes and warehouses. BigLake innovations add support for Apache Iceberg, which is becoming the standard for open source table format for data lakes. And soon, we’ll add support for formats including Delta Lake and Hudi. To help customers bring analytics to their data irrespective of where it resides, we launched BigQuery Omni. Now we are adding new capabilities such as cross-cloud transfer and cross-cloud larger query results that will make it easier to combine and analyze data across cloud environments. We also launched on-demand pricing support which enables you to get started at a low cost for BigQuery Omni. To help customers break down data boundaries across organizations, we launched Analytics Hub. Analytics Hub is a data exchange platform that enables organizations to create private or public exchanges with their business partners. We have added Google data, which includes highly valuable datasets like Google Trends. With hundreds of partners sharing valuable commercial datasets, Analytics Hub helps customers reach data beyond their organizational walls. We also partnered with the Google Earth Engine team to use BigQuery to get access to and value from the troves of satellite imagery data available within Earth Engine.We’ve also invested to bring BigQuery together with operational databases to help customers build intelligent, data-driven applications. Innovations include federated queries for Spanner, Cloud SQL and Bigtable, allowing customers to analyze data residing in operational databases in real-time with BigQuery. At Next ’22, we announced Datastream for BigQuery which provides easy replication of data from operational database sources such as AlloyDB, PostgreSQL, MySQL, and Oracle, directly into BigQuery with a few simple clicks.From Data to AI, with built-in intelligence for BigQuery and Vertex AIWe launched BigQuery Machine Learning in 2018 to make machine learning accessible to data analysts and data scientists across the globe. Now, customers create millions of models and tens of millions of predictions every month using BigQuery ML. Vertex AI enables ML Ops from data model to deployment in production and running predictions in real-time. Over the past year we have tightly integrated BigQuery and Vertex AI to simplify the ML experience. Now you can create models in BigQuery using BigQuery ML which are instantly visible inVertex AI model registry. You can then directly deploy these models to Vertex AI endpoints for real-time serving, use VertexAI pipelines to monitor and train models and view detailed explanations for your predictions through BigQuery ML and Vertex AI integration. Additionally, we announced an integration between Colab and BigQuery which allows users to explore results quickly with a data science notebook on Colab. “Colab” was developed by Google Research to allow users to execute arbitrary Python code and became a favorite tool for data scientists and machine learning researchers. The BigQuery integration enables seamless workflows for data scientists to run descriptive statistics, generate visualizations, create a predictive analysis, or share your results with others.Learn more about innovations to bring data and AI closer together, check out my session at Next with June Yang, VP of Cloud AI and Industry Solutions.Delivering the best of open sourceWe have always believed in making Google Cloud the best platform to run Open Source Software. Cloud Dataproc enables you to run various OSS engines like Spark, Flink, Hive. We have made a lot of enhancements over the past year in Dataproc. One of the most significant enhancements was to create a Serverless Spark offering that enables you to get away from clusters and focus on just running Spark Jobs. At Cloud Next 2022, we added built-in support for Apache Spark in BigQuery will allow data practitioners to create BigQuery stored procedures unifying their work in Spark with their SQL pipelines. This also provides integrated BigQuery billing with access to a curated library of highly valuable, internal and external assets. Powering streaming analyticsStreaming analytics is a key area of differentiation for Google Cloud with products like Cloud Dataflow and Cloud Pub/Sub. This year, our goal was to push the boundaries of innovation in real-time processing through Dataflow Prime and make it seamless to get real-time data coming to Pub/Sub to land into BigQuery for advanced analytics. At the beginning of the year, we introduced over 25 new Dataflow Templates as Generally Available.  At July’s Data Engineer Spotlight, we made Dataflow Prime, Dataflow ML, and Dataflow Go Generally Available. We also introduced a number of new Observability features for Dataflow to give you more visibility and control over your Dataflow pipelines.Earlier this year we introduced a new type of Pub/Sub subscription called a “BigQuery subscription” that writes directly from Cloud Pub/Sub to BigQuery. With this integration, customers no longer need to pay for data ingestion into BigQuery – you only pay for the Pub/Sub you use.Unified business intelligenceIn Feb 2020 we closed the Looker acquisition and since then we have been busy at work in building Looker capabilities and integrating it into Google Cloud. Additionally, Data Studio has been our self service BI offering for many years. It has the strongest tie-in with BigQuery and many of our BigQuery customers use Data Studio. Announced at Next’22, we are bringing all BI assets under the single umbrella of Looker. Data Studio will become Looker Studio and include a paid version that will provide enterprise support. With tight integration between Looker and Google Workspace productivity tools, customers gain easy access via spreadsheets and other documents, to consistent, trusted answers from curated data sources across your organization. Looker integration with Google Sheets is in preview now and increased accessibility of BigQuery to Connected Sheets allows more people to analyze large amounts of data. You can read more details here. Intelligent data management and governanceLastly, a challenge that is top of mind for all data teams is data management and governance across distributed data systems. Our data cloud provides customers with an end-to-end data management and governance layer, with built-in intelligence to help enable trust in data and accelerate time to insights. Earlier this year we launched Dataplex as our Data Management and Governance service. Dataplex helps organizations centrally manage and govern distributed data. Furthermore, we unified Data Catalog with Dataplex to provide a streamlined experience for customers to centrally discover their data with business context and govern and manage that data  with built-in data intelligence. At Next we introduced data lineage capabilities with Dataplex to gain end-to-end lineage from ingestion of data to analysis to ML models. Advancements for automatic data quality in Dataplex ensure confidence in your data which is critical to get accurate predictions. Based on customer input we’ve also added enhanced data discovery for automatic cataloging to databases and Looker from a business glossary and added a Spark-powered data exploration workbench. And Dataplex is now fully integrated with BigLake so you can now manage fine grained access control at scale.An open data ecosystemOver the past 5 years, the Data Analytics team goal has been to make Google Cloud the best place to run analytics. One of the key tenets of this was to ensure we have the most vibrant partner ecosystem. We have a rich ecosystem of hundreds of tech partner integrations and have 40+ partners who have been certified through the Cloud Ready-BigQuery initiative. Additionally, more than 800 technology partners are building their applications on top of our Data Cloud. Data Sharing continues to be one of the top capabilities leveraged by these partners to easily share information at any scale with their enterprise customers.  We also announced new updates and integrations with Collibra, Elastic, MongoDB, Palantir, ServiceNow, Sisu Data, Reltio, Striim and Qlik to help customers move data between platforms of your choice and bring more Google’s Data Cloud capabilities to partner platforms.Finally, we established a Data Cloud Alliance  together with 17 of our key partners who provide the most widely-adopted and fastest-growing enterprise data platforms today across analytics, storage, databases and business intelligence.  Our mission is to collaborate to solve modern data challenges providing an acceleration path to value. The first key areas where we are focusing are related to : data interoperability, data governance and solving for skills gap through education. Customer momentum across a variety of industries and use casesWe’re super excited for organizations to share their Data Cloud best practices at Next, including Walmart, Boeing, Twitter, Televisa Univision, L’Oreal, CNA Insurance, Wayfair, MLB, British Telecom, Telus, Mercado Libre, LiveRamp, and Home Depot. Check out all the Data Analytics sessions and resources from Next and get started on your Data Cloud journey today. We look forward to hearing your story at a future Google Cloud event.
Quelle: Google Cloud Platform

Forrester Total Economic Impact study: Azure Arc delivers 206 percent ROI over 3 years

Businesses today are building and running cloud-based applications to drive their business forward. As these applications are built they need to take full advantage of the agility, efficiency, and speed of cloud innovation. However, not all applications and infrastructure they run on can physically reside in the public cloud. That’s why 86 percent of enterprises plan to increase investment in hybrid or multicloud environments.

We’re building Azure to meet you where you are, so you can do more with your existing investments. We also want you to be able to stay agile and flexible when extending Azure to your on-premises, multicloud, and edge environments.

Azure Arc delivers on these needs. Azure Arc is a bridge that extends the Azure platform so you can build applications and services with the flexibility to run across datacenters, edge, and multicloud environments.

For the 2022 commissioned study, The Total Economic Impact™ of Microsoft Azure Arc for Security and Governance, Forrester Consulting interviewed four organizations with experience using Azure Arc. These organizations serve global markets in the industries of manufacturing, energy, and financial services. According to the aggregated data, Azure Arc demonstrated:

A 206 percent return on investment (ROI) over three years with payback in less than six months.
A 30 percent gain in productivity for IT Operations team members.
An 80 percent reduction in risk of data breach from unsecured infrastructure.
A 15 percent reduction in spending on third-party tools, saving on expenses.

The Forrester study provides a framework for organizations wanting to evaluate the potential financial impact on their organizations of using Azure Arc for infrastructure security and governance. Forrester found that organizations with hybrid or multicloud strategies can realize productivity gains and reduce security risks by using Microsoft Azure Arc to secure and govern non-Azure infrastructure alongside Azure resources.

Productivity gains with Azure Arc’s single-pane view

The organizations in Forrester’s study reported that after implementing Azure Arc, their IT Operations personnel realized a 30 percent gain in productivity from savings in time spent on regular duties such as configuring and updating infrastructure, managing policies and permissions, troubleshooting, and resolving issues, and other tasks that don’t directly drive business. With Azure Arc, IT teams can observe, secure, and govern diverse infrastructure and applications from a single pane of glass in Azure—leveraging Azure services enables them to be more agile, respond more efficiently, and frees time to serve business interests with higher-value tasks.

“We’re just making everyone’s lives so much easier so they can do other things. If there is an issue, for example, you don’t have to spend a week troubleshooting.”—Architect, Cloud products, Energy.

Cost savings and streamlined infrastructure through the Azure portal

Most organizations today run a mix of applications in on-premises datacenters, in the cloud, and at the edge. These disparate environments often result in investments in multiple management tools specific to the technology platforms, resulting in tool sprawl and excessive costs.

By moving to a single view of infrastructure and resources in the Azure portal enabled by Azure Arc, organizations could eliminate their legacy management tools, reducing licensing expenditures and eliminating costly on-premises management infrastructure. With Azure’s flexible consumption-based pricing, they are no longer locked into long-term contracts or capacity limits.

The composite organization in the Forrester study saved $900,000 in year three from reduced spending on third-party tools—a 15 percent decrease.

"When I do dive in, I actually have a faster understanding of [our infrastructure]. So the benefit to me is that I have greater visibility—I need to ask [the team] fewer questions. The [Azure Arc] dashboard is […] very easy."—VP of IT, Finance.

Microsoft Defender for Cloud and Microsoft Sentinel modernize security operations

Azure Arc helps organizations combat rapidly evolving security threats with increased efficiency by enabling the use of Microsoft security services such as Microsoft Defender for Cloud and Microsoft Sentinel across hybrid and multicloud environments.

Forrester found that the composite organization lowered the risk of a data breach from unsecured infrastructure by 80 percent after adopting Azure Arc and Microsoft security services. After onboarding Azure Arc, the organization uncovered noncompliant assets running on-premises or in edge environments and updated them to the latest security standards. This results in the savings of hundreds of thousands of dollars that would have been spent otherwise on managing breaches.

"With Azure Arc, we gained real insights into our infrastructure, including infrastructure [another cloud provider]. That helped us identify architecture [gaps] as well as controls to improve security compliance. [With Azure Arc], we found that around 20 percent of our infrastructure had been noncompliant."—Deputy IT Director, Manufacturing.

Learn more

Azure Arc is a bridge that extends the Azure platform to help customers build applications and services with the flexibility to run across datacenters, at the edge, and in multicloud environments. Get started today and do more with your existing investments. We welcome you to try it for free. You can also learn more about how other customers are using Azure Arc to innovate anywhere.

Download the full report: The Total Economic Impact™ of Microsoft Azure Arc for Security and Governance.
To learn more about Azure Arc, visit our website.

Quelle: Azure

How to Implement Decentralized Storage Using Docker Extensions

This is a guest post written by Marton Elek, Principal Software Engineer at Storj.

In part one of this two-part series, we discussed the intersection of Web3 and Docker at a conceptual level. In this post, it’s time to get our hands dirty and review practical examples involving decentralized storage.

We’d like to see how we can integrate Web3 projects with Docker. At the beginning we have to choose from two options:

We can use Docker to containerize any Web3 application. We can also start an IPFS daemon or an Ethereum node inside a container. Docker resembles an infrastructure layer since we can run almost anything within containers.What’s most interesting is integrating Docker itself with Web3 projects. That includes using Web3 to help us when we start containers or run something inside containers. In this post, we’ll focus on this portion.

The two most obvious integration points for a container engine are execution and storage. We choose storage here since more mature decentralized storage options are currently available. There are a few interesting approaches for decentralized versions of cloud container runtimes (like ankr), but they’re more likely replacements for container orchestrators like Kubernetes — not the container engine itself.

Let’s use Docker with decentralized storage. Our example uses Storj, but all of our examples apply to almost any decentralized cloud storage solution.

Storj is a decentralized cloud storage where node providers are compensated to host the data, but metadata servers (which manage the location of the encrypted pieces) are federated (many, interoperable central servers can work together with storage providers).

It’s important to mention that decentralized storage almost always requires you to use a custom protocol. A traditional HTTP upload is a connection between one client and one server. Decentralization requires uploading data to multiple servers. 

Our goal is simple: we’d like to use docker push and docker pull commands with decentralized storage instead of a central Docker registry. In our latest DockerCon presentation, we identified multiple approaches:

We can change Docker and containerd to natively support different storage optionsWe can provide tools that magically download images from decentralized storage and persists them in the container engine’s storage location (in the right format, of course)We can run a service which translates familiar Docker registry HTTP requests to a protocol specific to the decentralized cloudUsers can manage this themselves.This can also be a managed service.

Leveraging native support

I believe the ideal solution would be to extend Docker (and/or the underlying containerd runtime) to support different storage options. But this is definitely a bigger challenge. Technically, it’s possible to modify every service, but massive adoption and a big user base mean that large changes require careful planning.Currently, it’s not readily possible to extend the Docker daemon to use special push or pull targets. Check out our presentation on extending Docker if you’re interested in technical deep dives and integration challenges. The best solution might be a new container plugin type, which is being considered.

One benefit of this approach would be good usability. Users can leverage common push or pull commands. But based on the host, the container layers can be sent to a decentralized storage.

Using tool-based push and pull

Another option is to upload or download images with an external tool — which can directly use remote decentralized storage and save it to the container engine’s storage directory.

One example of this approach (but with centralized storage) is the AWS ECR container resolver project. It provides a CLI tool which can pull and push images using a custom source. It also saves them as container images of the containerd daemon.

Unfortunately this approach also have some strong limitations:

It couldn’t work with a container orchestrator like Kubernetes, since they aren’t prepared to run custom CLI commands outside of pulling or pushing images.It’s containerd specific. The Docker daemon – with different storage – couldn’t use it directly.The usability is reduced since users need different CLI tools.

Using a user-manager gateway

If we can’t push or pull directly to decentralized storage, we can create a service which resembles a Docker registry and meshes with any client.ut under the hood, it uploads the data using the decentralized storage’s native protocol.

This thankfully works well, and the standard Docker registry implementation is already compatible with different storage options. 

At Storj, we already have an implementation that we use internally for test images. However, the nerdctl ipfs subcommand is another good example for this approach (it starts a local registry to access containers from IPFS).

We have problems here as well:

Users should run the gateway on each host. This can be painful alongside Kubernetes or other orchestrators.Implementation can be more complex and challenging compared to a native upload or download.

Using a hosted gateway

To make it slightly easier one can provide a hosted version of the gateway. For example, Storj is fully S3 compatible via a hosted (or self-hosted) S3 compatible HTTP gateway. With this approach, users have three options:

Use the native protocol of the decentralized storage with full end-to-end encryption and every featureUse the convenient gateway services and trust the operator of the hosted gateways.Run the gateway on its own

While each option is acceptable, a perfect solution still doesn’t exist.

Using Docker Extensions

One of the biggest concerns with using local gateways was usability. Our local registry can help push images to decentralized storage, but it requires additional technical work (configuring and running containers, etc.)

This is where Docker Extensions can help us. Extensions are a new feature of Docker Desktop. You can install them via the Docker Dashboard, and they can provide additional functionality — including new screens, menu items, and options within Docker Desktop. These are discoverable within the Extensions Marketplace:

And this is exactly what we need! A good UI can make Web3 integration more accessible for all users.

Docker Extensions are easily discoverable within the Marketplace, and you can also add them manually (usually for the development).

At Storj, we started experimenting with better user experiences by developing an extension for Docker Desktop. It’s still under development and not currently in the Marketplace, but feedback so far has convinced us that it can massively improve usability, which was our biggest concern with almost every available integration option.

Extensions themselves are Docker containers, which make the development experience very smooth and easy. Extensions can be as simple as a metadata file in a container and static HTML/JS files. There are special JavaScript APIs that manipulate the Docker daemon state without a backend.

You can also use a specialized backend. The JavaScript part of the extension can communicate with any containerized backend via a mounted socket.

The new docker extension command can help you quickly manage extensions (as an example: there’s a special docker extension dev debug subcommand that shows the Web Developer Toolbar for Docker Desktop itself.)

Thanks to the provided developer tools, the challenge is not creating the Docker Desktop extension, but balancing the UI and UX.

Summary

As we discussed in our previous post, Web3 should be defined by user requirements, not by technologies (like blockchain or NFT). Web3 projects should address user concerns around privacy, data control, security, and so on. They should also be approachable and easy to use.

Usability is a core principle of containers, and one reason why Docker became so popular. We need more integration and extension points to make it easier for Web3 project users to provide what they need. Docker Extensions also provide a very powerful way to pair good integration with excellent usability.

We welcome you to try our Storj Extension for Docker (still under development). Please leave any comments and feedback via GitHub.
Quelle: https://blog.docker.com/feed/

AWS Database Migration Service unterstützt jetzt C6i- und R6i-Instance-Typen

AWS Database Migration Service (AWS DMS) unterstützt jetzt Amazon-EC2-C6i- und R6i-Instancetypen. Diese Instances werden von Intel Xeon Scalable-Prozessoren der 3. Generation mit einer All-Core-Turbofrequenz von 3,5 GHz angetrieben und bieten eine bis zu 15 % bessere Computing-Preis-Leistung als Instances der 5. Generation für eine Vielzahl von Workloads sowie eine stets aktive Speicherverschlüsselung mit Intel Total Memory Encryption (TME).
Quelle: aws.amazon.com

Amazon Connect Wisdom bietet jetzt verbesserte Funktionen für Machine Learning

Amazon Connect Wisdom bietet jetzt verbesserte Funktionen für Machine Learning, um Probleme während eines Anrufs kontinuierlich zu verstehen und den Contact Center-Kundendienstmitarbeitern den richtigen Wissensartikel zu liefern. Wisdom analysiert Anrufe im Contact Center in Echtzeit und liefert den Kundendienstmitarbeitern proaktiv die Informationen, die sie zur Lösung von Kundenproblemen benötigen, wodurch die Produktivität der Kundendienstmitarbeiter und die Zufriedenheit der Anrufer verbessert werden. 
Quelle: aws.amazon.com