Streamline data management and governance with the unification of Data Catalog and Dataplex

Today, we are excited to announce that Google Cloud Data Catalog will be unified with Dataplex into a single user interface. With this unification, customers have a single experience to search and discover their data, enrich it with relevant business context, organize it by logical data domains, and centrally govern and monitor their distributed data with built-in data intelligence and automation capabilities. Customers now have access to an integrated metadata platform that connects technical and operational metadata with business metadata, and then uses this augmented and active metadata to drive intelligent data management and governance. The enterprise data landscape is becoming increasingly diverse and distributed with data across multiple storage systems, each having its own way of handling metadata, security, and governance. This creates a tremendous amount of operational complexity, and thus, generates strong market demand for a metadata platform that can power consistent operations across distributed data.Dataplex provides a data fabric to automate data management, governance, discovery, and exploration across distributed data at scale. With Dataplex, enterprises can easily organize their data into data domains, delegate ownership, usage, and sharing of data to data owners who have the right business context, while still maintaining a single pane of glass to consistently monitor and govern data across various data domains in their organization. Prior to this unification, data owners, stewards and governors had to use two different interfaces – Dataplex to organize, manage, and govern their data, and Data Catalog to discover, understand, and enrich their data. Now with this unification, we are creating a single coherent user experience where customers can now automatically discover and catalog all the data they own, understand data lineage, check for data quality, augment that metadata with relevant business context, organize data into business domains, and then use that combined metadata to power data management. Together we provide an integrated experience that serves the full spectrum of data governance needs in an organization, enabling data management at scale.“With Data Catalog now being part of Dataplex, we get a unified, simplified, and streamlined experience to effectively discover and govern our data, which enables team productivity and analytics agility for our organization. We can now use a single experience to search and discover data with relevant business context, organize and govern this data based on business domains, and enable access to trusted data for analytics and data science – all within the same platform.” saidElton Martins, Senior Director of Data Engineering at Loblaw Companies Limited.Getting startedExisting Data Catalog and Dataplex customers and new customers can now start using Dataplex for metadata discovery, management and governance. Please note that while the user experience interface is unified via this release, all existing APIs and feature functionalities of both products will continue to work as before. To learn more, please refer to technical documentations or contact the Google Cloud sales team.Related ArticleScalable Python on BigQuery using Dask and NVIDIA GPUsTo accelerate data analytics and machine learning workflows, we introduce the Dask BigQuery connector to read data through BigQuery stora…Read Article
Quelle: Google Cloud Platform

Using Pacemaker for SAP high availability on Google Cloud – Part 1

Problem StatementMaintaining business continuity of your mission critical systems usually demands high availability (HA) solutions that will failover without human intervention. If you are running SAP HANA or SAP NetWeaver (SAP NW) on Google Cloud, the OS-native high availability (HA) cluster capability provided by Red Hat Enterprise Linux (RHEL) for SAP and SUSE Linux Enterprise Server (SLES) for SAP is often adopted as the foundational functionality to provide business continuity for your SAP system. This blog will introduce some basic terminology and concepts about the RedHat and SUSE HA implementation of Pacemaker cluster software for SAP HANA and NetWeaver platforms.Pacemaker TerminologyResourceThe resource in Pacemaker is the service made highly available by the cluster. For SAP HANA, there are two resources: HANA and HANA Topology. For SAP NetWeaver Central Services, there are also two resources: one for the Central Services instance that runs the Message Server and Enqueue Server (ASCS in NW ABAP or SCS NW Java) and another one for the Enqueue Replication Server (ERS). In the Pacemaker cluster, we also configure other resources for serving other functions such as Virtual IP (VIP) or Internal Load Balancer (ILB) health check mechanism. Resource agentA resource agent manages each resource. It defines the logic for resource operations called by the Pacemaker cluster to start, stop or monitor the health of resources. They are usually Linux bash or python scripts which implement functions for resource agent operations.Resource agents managing SAP resources are co-developed by SAP and OS vendors. They are open sourced in GitHub, OS vendors downstream to SAP resource agent package for their Linux distro.For HANA scale up, resource agents “SAPHANA” and “SAPHANATopology” For HANA scale out, resource agents “SAPHANAController” and “SAPHANATopology”For NetWeaver Central Services, the resource agent is “SAPInstance”Why are there two resource agents to manage HANA? “SAPHanaTopology” is responsible for monitoring HANA topology status on all cluster nodes and updating HANA relevant cluster properties. The attributes are read by “SAPHANA” as part of the HANA monitoring function.Resource agents are usually installed in the directory `/usr/lib/ocf/resource.d/`.Resource operationA resource can have what is called a resource operation. Resource operations are major types of actions: monitor, start, stop, promote, demote. These work as described, for example, if a resource operation is a “promote” operation then it will promote a resource in the cluster. The actions are built into the respective resource agent scripts.Properties of an operation:interval – If set to a nonzero value, defines how frequently the operation occurs after the first monitor action completes. timeout – defines the amount of time the operation has to complete before the operation is aborted and considered failed.on-fail – defines the action to be executed if the operation fails. The default action for operation ‘stop’ is ‘fence’ and the default for all others is ‘restart’.role – run the operation only on node that the cluster thinks should be in the specified role. A role can be master or slave, started or stopped. The role provides context for pacemaker to make resource location and operation decisions.Resource groupResource agents can be grouped into administrative units that are dependent on one another and need to be started sequentially and stopped in the reverse order.While technically each cluster resource is failed over one at a time, logically (to simplify cluster configuration) failover of resource groups is configured. For SAP HANA, for example, there is typically one resource group containing both the VIP resource and the ILB healthcheck resource.Resource constraintsConstraints determine the behavior of a resource in a cluster. Categories of constraints are location, order and colocation. The list below includes the constraints in SLES and RHEL.Location Constraint – determines on which nodes a resource can run; e.g., pins each fence device to the other host VM.Order Constraint – determines the order in which resources run; e.g., first start resource SAPHANATopology then start resource SAPHANA.Colocation Constraint – determines that the location of one resource depends on the location of another resource; e.g., the IP address resource group should be on the same host as the primary HANA instance.Fencing and fence agentA fencing or fence agent is an abstraction that allows a Pacemaker cluster to isolate problematic cluster nodes or cluster resources for which the state cannot be determined. Fencing can be performed at either the cluster node level or at the cluster resource/resource group level. Fencing is most commonly performed at the cluster node level by remotely power cycling the problematic cluster node or by disabling its access to the network.Similar to resource agents, these agents are also usually bash or python scripts. The two commonly used fence agents within GCP are “gcpstonith” and “fence_gce”, with “fence_gce” being the more robust successor of “gcpstonith”. Fence agents leverage the compute engine reset API in order to fence problematic nodes.The fencing resource “gcpstonith” is usually downloaded and saved in the directory `/usr/lib64/stonith/plugins/external` . The resource “fence_gce” comes with the RHEL and SLES images with the HA extension.CorosyncCorosync is an important piece of a Pacemaker cluster whose effect on the cluster is often undervalued. Corosync enables servers to interact as a cluster, while Pacemaker provides the ability to control how the cluster behaves. Corosync provides messaging and membership functionality along with other functions:Maintains the quorum information.Is used by all cluster nodes to communicate and coordinate cluster tasks.Stores the default location of the Corosync configuration: /etc/corosync/corosync.confIf there is a communication failure or timeout within Corosync then there will be a membership change or fencing action performed.Clones and Clone SetsClones represent resources that can become active on multiple hosts without requiring the creation of unique resource definitions for them. When resources are grouped across hosts, we call this a clone set. There are different types of cloned resources. The main clone set of interest for SAP configurations is that of a stateful clone, which represents a resource with a particular role. In the context of the SAP HANA database, the primary and secondary database instances would be contained within the SAPHana clone set.ConclusionNow that you have read through the terminology, let’s see how an SAP Pacemaker cluster looks on each OS: SLES:There are have two nodes in the cluster and both are online* Online: [ node-x node-y ]The STONITH resource is started on each node and is using the “gcpstonith” fence agent  * STONITH-node-x      (stonith:external/gcpstonith):   Started node-y  * STONITH-node-y      (stonith:external/gcpstonith):   Started node-xThere is a resource group called g-primary that contains both the IPAddr2 resource agent, which adds the ILB forwarding rule IP address to the NIC of the active node, and the anything resource agent, which starts a program ‘socat’ to respond to ILB health check probes:    * rsc_vip_int-primary       (ocf::heartbeat:IPaddr2):        Started node-y    * rsc_vip_hc-primary        (ocf::heartbeat:anything):       Started node-yThere is a Clone Set for the SAPHANATopology resource agent containing the two nodes:cln_SAPHanaTopology_TST_HDB00 [rsc_SAPHanaTopology_TST_HDB00] There is a Clone Set for the SAPHANA resource agent containing a master and slave node:  * Clone Set: msl_SAPHana_TST_HDB00 [rsc_SAPHana_TST_HDB00] (promotable)Note: You can see that one of the clone sets is marked as promotable. If a clone is promotable, its instances can perform a special role that Pacemaker will manage via the promote and demote operations of the resource agent.RHEL:There are two nodes in the cluster and both are online:* Online: [ rhel182ilb01 rhel182ilb02 ]The STONITH resource is started on the opposite node and is using the more robust “fence_gce” fence agent:STONITH-rhel182ilb01 (stonith:fence_gce): Started rhel182ilb02STONITH-rhel182ilb02 (stonith:fence_gce): Started rhel182ilb01There is a resource group called g-primary that contains both the IPAddr2 resource agent, which adds the ILB forwarding rule IP address to the NIC of the active node, and the haproxy resource agent, which starts a program ‘haproxy’ to respond to ILB health check probes:* rsc_healthcheck_R82        (service:haproxy):       Started rhel182ilb02 * rsc_vip_R82_00       (ocf::heartbeat:IPaddr2):        Started rhel182ilb02There is a Clone Set for the SAPHanaTopology resource agent containing the two nodes:* Clone Set: SAPHanaTopology_R82_00-clone [SAPHanaTopology_R82_00] There is a Clone Set for the SAPHana resource agent containing a master and slave node:  * Clone Set: SAPHana_R82_00-clone [SAPHana_TST_HDB00] (promotable)If you compare both SLES and RHEL clusters above, even though they are completely different clusters, you can see the similarities and technologies which are used to perform cluster operations.Congratulations. Now you should have a firm grasp of the key areas and terms of a SAP Cluster running on Google Cloud Platform.Where to go from here? Review our other blogs to become an expert in understanding your cluster and its behavior:What’s happening in your SAP systems? Find out with Pacemaker AlertsAnalyze Pacemaker events in Cloud LoggingRelated ArticleWhat’s happening in your SAP systems? Find out with Pacemaker AlertsThe cluster alerting enables the system administrator to be notified about critical events of the enterprise workloads in GCP like the SA…Read Article
Quelle: Google Cloud Platform

Amazon SageMaker Automatic Model Tuning unterstützt jetzt erhöhte Limits zur Verbesserung der Genauigkeit Ihrer Modelle

Amazon SageMaker Automatic Model Tuning ermöglicht es Ihnen, die genaueste Version eines Machine-Learning-Modells (ML) zu finden, indem der optimale Satz von Hyperparameterkonfigurationen für Ihren Datensatz ermittelt wird. SageMaker Automatic Model Tuning unterstützt jetzt erhöhte Limits für zwei Servicekontingente, mit einer um bis zu 50 % höheren Gesamtanzahl an Trainingsjobs, die pro Tuning-Job ausgeführt werden können, und einer höheren Maximalanzahl an Hyperparmetern, die pro Tuning-Job gesucht werden können.
Quelle: aws.amazon.com

Amazon Aurora PostgreSQL-kompatible Edition unterstützt jetzt R6i-Instances

Amazon Aurora unterstützt jetzt R6i-Instances, die mit Intel Xeon Scalable Prozessoren der 3. Generation betrieben werden. R6i-Instances sind die 6. Generation von speicheroptimierten Amazon-EC2-Instances, die für speicherintensive Workloads entwickelt wurden. Diese Instances basieren auf dem AWS Nitro System, einer Kombination aus dedizierter Hardware und schlankem Hypervisor, der praktisch alle Computing- und Speicherressourcen der Host-Hardware für Ihre Instances bereitstellt. R6i-Instances sind aktuell bei der Verwendung der Amazon Aurora PostgreSQL-kompatiblen Edition verfügbar.
Quelle: aws.amazon.com