September 2021 - Seite 15 von 56 - Cloud Computing Köln

Application Programming Interfaces (APIs) are the de facto standard for building and connecting technology solutions. They facilitate software-to-software interactions that allow developers to leverage data and functionality at scale. APIs come in various styles such as REST, gRPC, GraphQL, and AsyncAPI, and each style has its own features. Picking the right API style completely depends on your use case and what you are solving for. While REST is widely used, of late, formats like GraphQL are gaining popularity among developers.GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. Developers are increasingly adopting GraphQL for its flexibility and ease of use. It provides a single endpoint for all data exchange, prevents over- or under-fetching of data, and lets developers make one API call that seamlessly aggregates data from multiple apps and services.Today, we are excited to announce that Google Cloud’s Apigee API management platform natively supports GraphQL APIs, allowing developers to productize and manage their lifecycle in Apigee. Before we learn more about these capabilities, let’s take a closer look at GraphQL.REST and GraphQL API styles differ across many dimensionsREST and GraphQL API styles differ across many dimensions:Endpoints- REST provides multiple endpoints and a URL taxonomy that’s used as a logical resource manager. In GraphQL, there’s one endpoint that captures all fields for a given operation.Interactions – REST commonly uses HTTP verbs and JSON/XML to exchange data. GraphQL mostly uses the HTTP POST verb, and uses a custom query language called Schema Definition Language for request with standard JSON returned in response.Documentation- While REST uses OpenAPI specs and portals, GraphQL most frequently employs schema-generated documentation. Developers frequently use browser-based development environments such as GraphQL playground in order to interact with schema-based endpoints.Discovery- To discover and interact with REST APIs, developers usually use the portal provided by the management vendor, whereas GraphQL APIs tend to be offered with a built-in portal that enables users to explore new queries on the fly.Therefore, instead of picking one style over another, consider using them together in your API program, to solve for the use cases they’re best suited for. Why GraphQL APIs need to be managed Regardless of their design styles, APIs provide access to your enterprise’s most valuable data and functionality. They’re how developers leverage your data for new partnerships and digital services. This means enterprises must control and understand how and by whom APIs are used. Moreover, beyond access management, enterprises need to design and deploy APIs that give developers a first-class experience and help them to be productive. Consequently, APIs should not be viewed as snippets of code but rather as digital products that need full lifecycle management.Packaging GraphQL APIs as products allows you to overcome some limitations of this style such as:Limited authorization capabilities especially for schema browsingNo standard for throttling or quotasNo unified analytics for consumptionLack of version controlAs you scale the adoption of GraphQL APIs for solving business critical problems, it becomes extremely important to manage those APIs as products. This is where there’s a huge opportunity to extend the proven best practices of REST API management to GraphQL.Using Apigee for GraphQL APIsApigee is a market-leading full lifecycle API management platform trusted by thousands of enterprises across the globe. With the new native support, Apigee now allows you to productize and manage full lifecycle of GraphQL APIs for consumption, just like REST.Developers can use the GraphQL policy to:Impose restrictions on the payload by setting a maximum on the number of fragments allowed.Associate GraphQL with API products.Leverage the OAuth2, VerifyAPIKey, and Quota policy features, just as in REST.Validate and authenticate requests at the schema level.Getting StartedVisit the documentation and get step-by-step instructions on how to get started. If you are not familiar with Apigee, click here to sign up for a free trial. Related ArticleDesigning and managing APIs: Best practices & common pitfallsThe job of an API is to make the application developer as successful as possible. When crafting APIs, the primary design principle should…Read Article
Quelle: Google Cloud Platform

24. September 2021

da Agency

Building the data engineering driven organization from the first principles

In the “What type of data processing organisation” paper, we examined that you can build a data culture whether your organization consists mostly of data analysts, or data engineers, or data scientists. However, the path and technologies to become a data-driven innovator are different and success comes from implementing the right tech in a way that matches a company’s culture. In this blog we will expand the data engineering driven organizations and provide how it can be built from the first principles.Not all organizations are alike. All companies have similar functions (sales, engineering, marketing), but not all functions have the same influence on the overall business decisions. Some companies are more engineering-driven, others are sales-driven, others are marketing-driven. In practice, all companies are a mixture of all these functions. In the same way, the data strategy can be more focused on data analysts, and others on data engineering. Culture is a combination of several factors, business requirements, organizational culture, and skills within the organization. Traditionally organizations that focused on engineering mainly came from technology driven digital backgrounds. They built their own frameworks or used programming frameworks to build repeatable data pipelines. Some of this is due to the way the data is received, the shape the data is received and the speed of the data arrival as well. If your data allows it, your organization can be more focused on data analysis, and not so much on data engineering. If you can apply an Extract-Load-Transform approach (ELT) rather than the classic Extract-Transform-Load (ETL), then you can focus on data analysis and might not need extensive data engineering capability. For example, data that can be loaded directly into the data warehouse allows data analysts to also do data engineering work and apply transformations to the data.This does not happen so often though. Sometimes your data is messy, inconsistent, bulky, and encoded in legacy file formats or as part of legacy databases or systems, with a little potential to be actionable by data analysts. Or maybe you need to process data in streaming, applying complex event processing to obtain competitive insights in near real time. The value of data decays exponentially with time. Most companies can process data by the next day in batch mode. However, not so many are probably obtaining such knowledge the next second data is produced.In these situations, you need the talent to unveil the insights hidden in that amalgam of data, either messy or fast changing (or both!). And almost as importantly, you need the right tools and systems to enable that talent too.What are those right tools? Cloud provides the scalability and flexibility for data workloads that are required in such complex situations. Long gone are the times when data teams had to beg for the resources that were required to have an impact in the business. Data processing systems are no longer scarce, so your data strategy should not generate that scarcity artificially.In this article, we explain how to leverage Google Cloud to enable data teams to do complex processing of data, in batch and streaming. By doing so, your data engineering and science teams can have an impact when (in seconds) after the input data is generated.Data engineering driven organizationsWhen the complexity of your data transformation needs is high, data engineers have a central role in the data strategy of your company, leading to data engineering driven organization. In this type of organization, data architectures are organized in three layers: business data owners, data engineers, and data consumers.Data engineers are at the crossroads between data owners and data consumers, with clear responsibilities:Transporting data, enriching data whilst building integrations between analytical systems and operational systems ( as in the real time use cases)Parsing and transforming messy data coming from business units into meaningful and clean data, with documented metadataApplying DataOps, that is, functional knowledge of the business plus software engineering methodologies applied to the data lifecycleDeployment of models and other artifacts analyzing or consuming dataBusiness data owners are cross-functional domain-oriented teams. These teams know the business in detail, and are the source of data that feeds the data architecture. Sometimes these business units may also have some data-specific roles, such as data analysts, data engineers, or data scientists, to work as interfaces with the rest of the layers. For instance, these teams may design a business data owner, that is the point of contact of a business unit in everything that is related to the data produced by the unit.At the other end of the architecture, we find the data consumers. Again, also cross-functional, but more focused on extracting insights from the different data available in the architecture. Here we typically find data science teams, data analysts, business intelligence teams, etc. These groups sometimes combine data from different business units, and produce artifacts (machine learning models, interactive dashboards, reports, and so on). For deployment, they require the help of the data engineering team so that data is consistent and trusted.At the center of these crossroads, we find the data engineering team. Data engineers are responsible for making sure that the data generated and needed by different business units gets ingested into the architecture. This job requires two disparate skills: functional knowledge and data engineering/software development skills. This is often coined under the term DataOps (which evolved from DevOps methodologies developed within the past decades but applied to data engineering practices).Data engineers have another responsibility too. They must help in the deployment of artifacts produced by the data consumers. Typically, the data consumers do not have the deep technical skills and knowledge to take the sole responsibility for deployment of their artifacts.This is also true for highly sophisticated data science teams. So data engineers must add other skills under their belt: machine learning and business intelligence platform knowledge. Let’s clarify this point, we don’t expect data engineers to become machine learning engineers. Data engineers need to understand ML to ensure that the data delivered to the first layer of a model ( the input ) is correct. They will also become key when delivering that first layer of data in the inference path, as here the data engineering skills around scale / HA etc really need to shine.By taking the responsibility of parsing and transforming messy data from various business units, or for ingesting in real time, data engineers allow the data consumers to focus on creating value. Data science and other types of data consumers are abstracted away from data encodings, large files, legacy systems, complex message queue configurations for streaming. The benefits of concentrating that knowledge in a highly skilled data engineering team are clear, notwithstanding that other teams (business units and consumers) may also have their data engineers to work as interfaces with other teams. More recently, we even see squads created with members of the business units (data product owners), data engineers, data scientists, and other roles. Effectively creating complete teams with autonomy and full responsibility over a data stream, from the incoming data down to the data driven decision with impact in the business.Reference architecture – ServerlessThe number of skills required for the data engineering team is vast and diverse. We should not make it harder by expecting the team to maintain the infrastructure where they run data pipelines. They should be focusing on how to cleanse, transform, enrich, and prepare the data rather than how much memory or how many cores their solution may require.The reference architectures presented here are based on the following principles:Serverless no-ops technologiesStreaming-enabled for low time-to-insightWe present different alternatives, based on different products available in Google Cloud:Dataflow, the built-in streaming analytics platform in Google CloudDataproc, the Google Cloud’s managed platform for Hadoop and Spark. Data Fusion, a codeless environment for creating and running data pipelinesLet’s dig into these principles. By using serverless technology we eliminate the maintenance burden from the data engineering team, and we provide the necessary flexibility and scalability for executing complex and/or large jobs. For example, scalability is essential when planning for traffic spikes during mega Friday for retailers. Using serverless solutions allows retailers to look into how they are performing during the day. They no longer need to worry about resources needed to process massive data generated during the day.The team needs to have full control and write their own code for the data pipelines because of the type of pipelines that the team develops. This is true either for batch or streaming pipelines. In batch, the parsing requirements can be complex and no off the shelf solution works. In streaming, if the team wants to fully leverage the capabilities of the platform, they should implement all the complex business logic that is required, without artificially simplifying the complexity in exchange for some better latency. They can develop a pipeline that achieves a low latency with highly complex business logic. This again requires the team to start writing code from first principles.However, that the team needs to write code should not imply that they need to rewrite any existing piece of code. For many input/output systems, we can probably reuse code from patterns, snippets, and similar examples. Moreover, a logical pipeline developed by a data engineering team does not necessarily need to map to a physical pipeline. Some parts of the logic can be easily reused by using technologies like Dataflow templates, and use those templates in orchestration with other custom developed pipelines. This brings the best of both worlds (reuse and rewrite), while saving precious time that can be dedicated to higher impact code rather than common I/O tasks. The reference architecture presented has another important feature: the possibility to transform existing batch pipelines to streaming.The ingestion layer consists of Pub/Sub for real time and Cloud Storage for batch and does not require any preallocated infrastructure. Both Pub/Sub and Cloud Storage can be used for a range of cases as it can automatically scale up with the input workload.Once the data has been ingested, our proposed architecture follows the classical division in three stages: Extract, Transform, and Load (ETL). For some types of files, direct ingestion into BigQuery (following an ELT approach) is also possible.In the transform layer, we primarily recommend Dataflow as the data process component. Dataflow uses Apache Beam as SDK. The main advantage of Apache Beam is the unified model for batch and streaming processing. As mentioned before, the same code can be adapted to run in batch or streaming by adapting input and output. For instance, switching the input from files in Cloud Storage to messages published in a topic in Pub/Sub.One of the alternatives to Dataflow in this architecture is Dataproc, Google Cloud’s solution for managed Hadoop and Spark clusters. The main use case is for those teams that are migrating to Google Cloud but have large amounts of inherited code in Spark or Hadoop. Dataproc enables a direct path to the cloud, without having to review all those pipelines. Finally, we also present the alternative of Data Fusion, a codeless environment for creating data pipelines using a drag-and-drop interface. Data Fusion actually uses Dataproc as its Compute Engine, so everything we have mentioned earlier applies also to the case of Data Fusion. If your team prefers to create data pipelines without having to write any code, Data Fusion is the right tool.So in summary, these are the three recommended components for the transform layer:Dataflow, powerful and versatile with a unified model for batch and streaming processing. Straightforward path to move from batch processing to streamingDataproc, for those teams that want to reuse existing code from Hadoop or Spark environments.Data Fusion, if your team does not want to write any code.Challenges and opportunitiesData platforms are complex. Having on top of that data responsibility the duty to maintain infrastructure is a wasteful use of valuable skills and talent. Often data teams end up managing infrastructure rather than focusing on analyzing the data. The architecture presented in this article liberates the data engineering team from having to allocate infrastructure and tweak clusters but instead to focus on providing value through data processing pipelines.For data engineers to focus on what they do best, you need to fully leverage the cloud. A lift & shift approach from any on-premise installation is not going to provide that flexibility and liberation. You need to leverage serverless technologies. As an added advantage, serverless lets you also scale your data processing capabilities with your needs, and be able to respond to peaks of activity, however large these are.Serverless technologies sometimes face the doubts of practitioners: will I be locked in with my provider if I fully leverage serverless? This is actually a question that you should be asking when deciding whether to set up your architecture on top of a provider. The components presented here for data processing are based on open source technologies, and fully interoperable with other open source equivalent components. Dataflow uses Apache Beam, which not only unifies batch and streaming, but also offers a widely compatible runner. You can take your code elsewhere to any other runner. For instance, Apache Flink or Apache Spark. Dataproc is a fully managed Hadoop and Spark based on the vanilla open source components of this ecosystem. Data Fusion is actually the Google Cloud version of CDAP, an open source project.On the other hand, for the serving layer, BigQuery is based on standard Ansi SQL. Whereas in the case of Bigtable and Google Kubernetes Engine, Bigtable is compatible at API level with HBase, and Kubernetes is an open source component.In summary, when your components are based on open source, like the ones included in this architecture, serverless does not lock you in. The skills required to encode business logic in the form of data processing pipelines are based on engineering principles that remain stable across time. The same principles apply if you are using Hadoop, Spark, or Dataflow or UI driven ETL tooling. In addition, there are now new capabilities, such as low-latency streaming, that were not available before. A team of data engineers that learn the fundamental principles of data engineering will be able to quickly leverage those additional capabilities.Our recommended architecture separates the logical level, the code of your applications, from the infrastructure where they run. This enables data engineers to focus on what they do best and on where they provide the highest added value. Let your Dataflow and your engineers impact your business, by adopting the technologies that liberate them and allow them to focus on adding business value. To learn more about building an unified data analytics platform, take a look at our recently published Unified Data Analytics Platform paper and Converging Architectures paper.
Quelle: Google Cloud Platform

24. September 2021

da Agency

Cloud DNS explained!

How many times have you heard this:”It’s not DNS.””NO way it is DNS.””It was the DNS!”When you are building and managing cloud native or hybrid cloud applications you don’t want to add more stuff to your plate, especially not DNS. DNS is one of the necessary services for your application to function but you can rely on a managed service to take care of DNS requirements. Cloud DNS is a managed, low latency DNS service running on the same infrastructure as Google which allows you to easily publish and manage millions of DNS zones and records.Click to enlargeHow does DNS work?When a client requests a service, the first thing that happens is DNS resolution. Which means hostname to IP address translation. Here is how the request flow works:Step 1 – A client makes a DNS requestStep 2 – The request is received by a recursive resolver which checks if it already knows the response to the request Step 3 (a)- If yes, the recursive resolver responds to request if it has it stored in cache already.Step 3 (b) – If no, the recursive resolver redirects request to other serversStep 4 – The authoritative server then responds to requestsStep 5 – Recursive resolver caches the result for future queries. Step 6 – And finally sends the information to the clientWhat does Cloud DNS offer?Global DNS Network: Managed Authoritative Domain Name System (DNS) service running on the same infrastructure as Google. You don’t have to manage your DNS server, Google does it for you. 100% Availability & Automatic Scaling: Cloud DNS uses Google’s global network of anycast name servers to serve your DNS zones from redundant locations around the world, providing high availability and lower latency for users. Allows customers to create, update, and serve millions of DNS records Private DNS Zones: Used for providing a namespace that is only visible inside the VPC or hybrid network environment. Example – a business organization has a domain dev.gcp.example.com, reachable only from within the company intranetPublic DNS Zones: Used for providing authoritative DNS resolution to clients on the public internet. Example – a business has an external website, example.com accessible directly from the Internet. Not to be confused with Google Public DNS (8.8.8.8) which is just a public recursive resolver Split horizon DNS: Used to serve different answers (different resource record sets) for the same name depending on who is asking – internal or external network resource.DNS peering: DNS peering makes available a second method of sharing DNS data. All or a portion of the DNS namespace can be configured to be sent from one network to another and, once there, will respect all DNS configuration defined in the peered network.Security: Domain Name System Security Extensions (DNSSEC) is a feature of the Domain Name System (DNS) that authenticates responses to domain name lookups. It prevents attackers from manipulating or poisoning the responses to DNS requests.Hybrid Deployments: DNS ForwardingGoogle Cloud offers inbound and outbound DNS forwarding for private zones. You can configure DNS forwarding by creating a forwarding zone or a Cloud DNS server policy. The two methods – inbound and outbound. You can simultaneously configure inbound and outbound DNS forwarding for a VPC network. Inbound Create an inbound server policy to enable an on-premises DNS client or server to send DNS requests to Cloud DNS. The DNS client or server can then resolve records according to a VPC network’s name resolution order. On-premises clients use Cloud VPN or Cloud Interconnect to connect to the VPC network.OutboundYou can configure VMs in a VPC network to do the following:Send DNS requests to DNS name servers of your choice. The name servers can be located in the same VPC network, in an on-premises network, or on the internet. Resolve records hosted on name servers configured as forwarding targets of a forwarding zone authorized for use by your VPC networkCreate an outbound server policy for the VPC network to send all DNS requests an alternative name server.For more #GCPSketchnote, follow the GitHub repo. For similar cloud content follow me on Twitter @pvergadia and keep an eye out on thecloudgirl.devRelated ArticleIt’s not DNS: Ensuring high availability in a hybrid cloud environmentLearn how to configure your environment to ensure that your Cloud DNS environment is highly available in a hybrid environmentRead Article
Quelle: Google Cloud Platform

23. September 2021

da Agency

Deploying the Cloud Spanner Emulator remotely

Welcome to the third part of our series on the Cloud Spanner Emulator. In the first part, we got an overview of Cloud Spanner and the emulator, as well as the various ways that it can be provisioned. In the second part, we explored the various options available for running the emulator locally, as well as how to build the emulator as one of the components in an application container.The emulator can also be deployed on a remote GCE instance or Cloud Run. Today, we will deploy the emulator services on a GCE host manually and via Terraform. Finally, we will also run the emulator on Cloud Run. Cloud Spanner emulator on GCEIn the previous post, we have deployed the application and the emulator on separate containers by attaching both containers to a Docker network. In environments with VPC / project level separation for dev, stage and production it might be useful to run an emulator on a dedicated remote host. Apart from the ability to point your applications to the public/private IP of the emulator instance, this also allows for collaborative troubleshooting of failed test cases, etc. Manual deployment This section covers the steps to provision a GCE instance and start the emulator services. For the sake of completeness, it has instructions starting from creating a VPC; however, you can skip this and make changes according to your environment.If you have been working through this series so far, your gcloud config is likely set to the emulator. Before you proceed with the commands below, please switch to a different configuration (e.g., your default one)Next, you need to ensure the default gcloud configuration is set correctly. Below we are enabling authentication, unsetting any API endpoint URL set previously, and setting the GCP project we intend to use in the default gcloud configuration.Create a VPC, subnet and firewall rules (you might want to edit the firewall rule source range to be more restrictive):Create an emulator VM. We will run the emulator service with the instance creation itself by passing –metadata startup-script. Replace the placeholder [Your-Project-ID] with your GCP project ID. Once the instance comes up and the emulator services are running, you can follow the instructions from the earlier blog posts to deploy the sample application. The only difference is that we change localhost to the public IP address (or private IP address if you are working from the same VPC or connected via VPN). NOTE – If you are using a public IP address here, all of the data exchanged between your client and the remote emulator will be transmitted in plain text over the internet. Please ensure that you’re not sending privacy-sensitive data.Example configuration below:Provisioning an emulator GCE instance via Terraform In an environment that follows the GitOps style of deployments, having a Terraform template for provisioning the emulator instance can be useful as well. You can follow the instructions below to spin up an instance with the emulator running.Clone this repo which contains the Terraform code modules that we will be using:First, we will have to initialize Terraform so it would download and install the provider plugin and configure the child module.Open the terraform.tfvars file and edit the name of the VM, Project ID, Region and Zone based on your environment. Initialize Terraform:And apply:You can now connect to the VM verify if the emulator services are up and running.Once the VM is up and running with the emulator services started, you can use the VM’s public or private IP address to configure SPANNER_EMULATOR_HOST and connect, similar to what we described in the Manual Deployment section above. Cloud Spanner emulator on Cloud Run Since the emulator is available as a pre-built Docker image (or you can manually build from the source), deploying the emulator services on Cloud Run is straightforward. Cloud Run supports gRPC (after enabling HTTP2). However, it is important to remember that when using Cloud Run, you will be able to route requests to only one port at any given point in time. NOTE – While it is possible to run multiple processes inside the same container in Cloud Run (in this case, gRPC server and REST server), the requests can only be sent to one port. If your application uses only client libraries or RPC API (gRPC), then you can configure Cloud Run to send requests to the 9010 port. If you use only the REST API, then you can configure port 9020. Also, remember to set minimum instances to ‘1’ to prevent the container from being removed and thereby losing data when there is no activity.You can choose from any of the following options:Option 1: Deploy emulator gRPC on Cloud RunOption 2: Deploy REST server on Cloud Run NOTE – Avoid using both options simultaneously, as that will create two different instances of the emulator, and two different copies of your database that would be out of sync, i.e. depending on which emulator instance serves your request, you may get completely different results.Once done, you can configure your emulator profile pointing to Cloud Run’s URL like below:If you are using REST server:If you are using gRPC:NOTE – We have not specified a port number above since the requests to the Cloud Run URL already route to the port used (9010 or 9020) directly. Therefore, you just need to add the Cloud Run URL, without the port. Conclusion Through this 3-part series of blogs, we introduced the Cloud Spanner Emulator and detailed various options available to start and use the emulator both locally and on a remote host. We also demonstrated how the emulator can be used in a development workflow as a no-cost experience for Cloud Spanner using a sample application. Hope you found this useful! To learn more about Cloud Spanner, visit the product page here and to learn more about the Cloud Spanner emulator, please see the documentation here.Related ArticleDeployment models for the Cloud Spanner emulatorThis is the first of a three-part series of blog posts, which together will form a solution guide for developers using the Cloud Spanner …Read Article
Quelle: Google Cloud Platform