Data warehouse migration challenges and how to meet them

Editor’s note: This is the second in a series on modernizing your data warehouse. Find part 1 here.In the last blog post, we discussed why legacy data warehouses are not cutting it any more and why organizations are moving their data warehouses to cloud. We often hear that customers feel that migration is an uphill battle because the migration strategy was not deliberately considered. Migrating to a modern data warehouse from a legacy environment can require a massive up-front investment in time and resources. There’s a lot to think about before and during the process, so your organization has to take a strategic approach to streamline the process. At Google Cloud, we work with enterprises shifting data to our BigQuery data warehouse, and we’ve helped companies of all kinds successfully migrate to cloud. Here are some of the questions we frequently hear around migrating a data warehouse to the cloud:How do we minimize any migration risks or security challenges?How much will it cost?How do we migrate our data to the target data warehouse?How quickly will we see equal or better performance?These are big, important questions to ask—and have answered—when you’re starting your migration. Let’s take them in order.How do we minimize any migration risks or security challenges?It’s easy to consider an on-premises data warehouse secure because, well, it’s on-site and you can manage its data protection. But if scaling up an on-prem data warehouse is difficult, so is securing it as your business scales. We’ve built in multiple features to secure BigQuery. For enterprise users, Cloud Identity and Access Management (Cloud IAM) is key to setting appropriate role-based user access to data. You can also take advantage of SQL’s security views within BigQuery. And all BigQuery data is encrypted at rest and in transit. You can add the protection of customer-managed encryption keys to establish even stronger security measures. Using virtual private cloud (VPC) security controls can secure your migration path, since it helps reduce data exfiltration risks. How much will it cost?The cost of a cloud data warehouse has a different structure from what you’re likely used to with a legacy data warehouse. An on-prem system like Teradata may depend on your IT team paying every three years for the hardware, then paying for licenses for users who need to access the system. Capacity increases come at an additional cost outside of that hardware budget.With cloud, you’ve got a lot more options for cost and scale. Instead of a fixed set of costs, you’re now working on a price-utility gradient, where if you want to get more out of your data warehouse, you can spend more to do so immediately, or vice versa. With a cloud data warehouse like BigQuery, the model changes entirely. TCO becomes an important metric for customers when they’ve migrated to BigQuery (check out ESG’s report on that), and Google Cloud’s flexibility makes it easy to optimize costs.How do we migrate all of our data to the target data warehouse?This question encompasses both migrating your extract, transform, load (ETL) jobs and SAS/BI application workloads to the target data warehouse, as well as migrating all your queries, stored procedures, and other extract, load, transform (ELT) jobs.Actually getting all of a company’s data into the cloud can seem daunting at the outset of the migration journey. We know that most businesses have a lot of siloed data. That might be multiple data lakes set up over the years for various teams, or systems acquired through acquisition that handle just one or two crucial applications. You may be moving data from an on-prem or cloud data warehouse to BigQuery and type systems or representations don’t match up.One big step you can take to prepare for a successful migration is to do some workload and use case discovery. That might involve auditing which use cases exist today and whether those use cases are part of a bigger workload, as well as identifying which datasets, tables, and schemas underpin each use case. Use cases will vary by industry and by job role. So, for example, a retail pricing analyst may want to analyze past product price changes to calculate future pricing. Use cases may include the need to ingest data from a transactional database, transforming data into a single time series per product, storing the results in a data warehouse table, and more. After the preparation and discovery phase, you should assess the current state of your legacy environment to plan for your migration. This includes cataloging and prioritizing your use cases, auditing data to decide what will be moved and what won’t, and evaluating data formats across your organization to decide what you’ll need to convert or rewrite. Once that’s decided, choose your ingest and pipeline methods. All of these tasks take both technology and people management, and require some organizational consensus on what success will look like once the migration is complete. How quickly will we see equal or better performance?Managing a legacy data warehouse isn’t usually synonymous with speed. Performance often comes at the cost of capacity, so users can’t do the analysis they need till other queries have finished running. Reporting and other analytics functions may take hours or days, which is especially true for running large reports with a lot of data, like an end-of-quarter sales calculation. As the amount of data and number of users rapidly grows, performance begins to melt down and organizations often face disruptive outages.However, with a modern cloud data warehouse like BigQuery, compute and storage are decoupled, so you can scale immediately without facing capital infrastructure constraints.  BigQuery helps you modernize because it uses a familiar SQL interface, so users can run queries in seconds and share insights right away. Home Depotis an example of a customer that migrated their warehouse and reduced eight-hour workloads to five minutes. Moving to cloud may seem daunting, especially when you’re migrating an entrenched legacy system. But it brings the benefits of adopting technology that lets the business grow, rather than simply adopting a tool. It’s likely you’ve already seen that the business demand exists. Now it’s time to stop standing in the way of that demand and instead make way for growth.
Quelle: Google Cloud Platform

PNB: Investing in Malaysia’s future with APIs

Editor’s note: Today we hear from Muzzaffar bin Othman, CTO at Permodalan Nasional Berhad (PNB) on how the company uses Google Cloud’s Apigee API Management Platform to create digital investment channels. Read on to learn how PNB is increasing financial inclusion by expanding investment opportunities for all Malaysians.Permodalan Nasional Berhad (PNB) is one of Malaysia’s largest investment institutions with more than RM300 billion ($71 milion) in assets under management. Through our wholly-owned company, Amanah Saham Nasional Berhad (ASNB), we manage 14 funds with a total value of RM235.74 billion ($56.34 million) as of Dec. 31, 2018. To expand the range of people who can invest and participate in the economy, our unit trust funds enable the public to invest as little as RM10 ($2.50) into any of our funds. With each investment, unit holders (a unit holder is an investor who holds securities of a trust) are able to participate in the local and international investment activities managed by PNB and ASNB. They also gain dividends from their investment at the end of the financial year for the funds that they invest in. Accelerating access to services with APIsAs chief technology officer for PNB, I lead the retail and asset management technology aspects of the business. My team and I manage the basic IT systems such as email and networks, but also the more exciting and complex IT infrastructures, including investment core systems and data analytics for the unit trust teams.  In January 2017 (prior to joining PNB), I observed unit holders waiting in a long line just to update their account balances following a dividend announcement. Unit holders had limited options then: they were required to visit an ASNB branch or one of our agent banks to complete the transaction. When PNB hired me three months later, I was determined to create a self-service balance checker that would reduce our unit holders’ waiting time. My team first built an application on an Android tablet that communicated to our backend via APIs. Then we constructed a kiosk around this and started our first self-service kiosk. We have built 120 kiosks across Malaysia in six months.While we made progress in creating new solutions for our unit holders, we were missing an API management framework to manage our APIs. After extensive market research, we decided on the Apigee API management platform as it was the most suitable platform to build our capabilities for developing and managing APIs. Apigee’s technical capabilities coupled with responsive support from Google Cloud were important factors. Being new to APIs, we value the quick and ready technical support made available to us.  In addition, the secure and flexible system that Apigee offers is critical to us because as a financial institution, security is of paramount importance.In July 2017, we migrated our retail core system from mainframe base to a modern, cloud-based  infrastructure. In August of the same year, a newly developed web portal provided our unit holders access to their accounts through their mobile devices for the first time. The customer response was very encouraging and uptake has been very high since then. The portal uses APIs to enable our 14 million unit holders to check balances, reinvest, edit their personal information, and access account statements. For now, the portal is only available to unit holders who pre-register via an in-person onboarding process. We are currently awaiting regulatory direction on electronic Know Your Customer (eKYC) rules that will impact digital onboarding before we can enable access to new unit holders via their  mobile devices.Creating new channels for financial inclusionTo date, our web portal and APIs have generated approximately RM2 billion (USD500 million) in annual investments. This equates to about five percent of our total yearly investments contributed via the digital platform. While this is encouraging progress, there is much more potential that we can tap into, including the collection and analysis of consumer behavior data. Moving forward, this valuable information will provide insights for us to improve customer experience and fine-tune our offerings. A typical bank integration typically takes six months at a high cost. Excluding the governance and compliance approval period, our key APIs can be consumed in under three months, at minimal cost. Banks that use APIs will find ours easy to work with. This simplifies our agent onboarding process. APIs enable us to innovate further by expanding our capabilities and reach. We are currently onboarding a few PNB agent banks and we look forward to the possibility to connect to fintech players in Malaysia, especially e-wallet solution providers. API simplify the communications between multiple systems and offer a world of possibilities for our business.
Quelle: Google Cloud Platform

How to deploy a Windows container on Google Kubernetes Engine

Many people who run Windows containers want to use a container management platform like Kubernetes for resiliency and scalability. In a previous post, we showed you how to run an IIS site inside a Windows container deployed to Windows Server 2019 running on Compute Engine. That’s a good start, but you can now also run Windows containers on Google Kubernetes Engine (GKE). Support for Windows containers in Kubernetes was announced earlier in the year with version 1.14, followed by GKE announcement on the same. You can sign up for early access and start testing out Windows containers on GKE. In this blog post, let’s look at how to deploy that same Windows container to GKE. 1. Push your container image to Container RegistryIn the previous post, we created a container image locally. The first step is to push that image to Container Registry, so that you can later use it in your Kubernetes deployment.  To push images from a Windows VM to Container Registry, you need to:Ensure that the Container Registry API is enabled in your project.Configure Docker to point to Container Registry. This is explained in more detail here but it is usually done via the gcloud auth configure-docker command.Make sure that the VM has storage read/write access scope (storage-rw), as explained here. Once you have the right setup, it’s just a regular Docker push:2. Create a Kubernetes cluster with Windows nodesCreating a Kubernetes cluster in GKE with Windows nodes happens in two steps:Create a GKE cluster with version 1.14 or higher, with IP alias enabled and one Linux node.Add a Windows node pool to the GKE cluster.Here’s the command to create a GKE cluster with one Linux node and IP aliasing:Once you have the basic GKE cluster, you can go ahead and add a Windows pool for Windows nodes to it:Windows containers are resource intensive, so we chose n1-standard-2 as machine type. We’re also disabling automatic node upgrades. Windows container versions need to be compatible with the node OS version. To avoid unexpected workload disruption, it is recommended that you disable node auto-upgrade for Windows node pools. For Windows Server containers in GKE, you’re already licensed for underlying Windows host VMs—containers need no additional licensing.Now, your GKE cluster is ready and contains one Linux node and three Windows nodes:3. Run your Windows container as a pod on GKENow you’re ready to run your Windows container as a pod on GKE. Create an iis-site-windows.yaml file to describe your Kubernetes deployment:Note that you’re creating two pods with the image you pushed earlier to Container Registry. You’re also making sure that the pods are scheduled onto Windows nodes with the nodeSelector tag. Create the deployment:After a few minutes, you should see that the deployment was created and any running pods:4. Create a Kubernetes service To make pods accessible to the outside world, you need to create a Kubernetes service of type “LoadBalancer”:In a few minutes, you should see a new service with an external IP:And if you go to that external IP, you will see your app:This is very similar to the previous deployment to Compute Engine, with the big difference that Kubernetes is now managing the pods. If something goes wrong with the pod or one of its nodes, Kubernetes recreates and reschedules the pod for you—great for resiliency. Similarly, scaling pods is a single command in Kubernetes:If you want to try out these steps on your own, there’s also a codelab on this topic:And there you have it—how to run Windows containers on GKE. If you want to try out Windows Containers on GKE, sign up to get early access.
Quelle: Google Cloud Platform

Designing Your First Application in Kubernetes, Part 5: Provisioning Storage

In this blog series on Kubernetes, we’ve already covered:

The basic setup for building applications in Kubernetes
How to set up processes using pods and controllers
Configuring Kubernetes networking services to allow pods to communicate reliably
How to identify and manage the environment-specific configurations to make applications portable between environments

In this series’ final installment, I’ll explain how to provision storage to a Kubernetes application. 

Step 4: Provisioning Storage
The final component we want to think about when we build applications for Kubernetes is storage. Remember, a container’s filesystem is transient, and any data kept there is at risk of being deleted along with your container if that container ever exits or is rescheduled. If we want to guarantee that data lives beyond the short lifecycle of a container, we must write it out to external storage.
Any container that generates or collects valuable data should be pushing that data out to stable external storage. In our web app example, the database tier should be pushing its on-disk contents out to external storage so they can survive a catastrophic failure of our database pods.
Similarly, any container that requires the provisioning of a lot of data should be getting that data from an external storage location. We can even leverage external storage to push stateful information out of our containers, making them stateless and therefore easier to schedule and route to.
Decision #5: What data does your application gather or use that should live longer than the lifecycle of a pod?
The full Kubernetes storage model has a number of moving parts:
The Kubernetes storage model.

Container Storage Interface (CSI) Plugins can be thought of as the driver for your external storage.
StorageClass objects take a CSI driver and add some metadata that typically configures how storage on that backend will be treated
PersistentVolume (PV) objects represent an actual bucket of storage, as parameterized by a StorageClass
PersistentVolumeClaim (PVC) objects allow a pod to ask for a PersistentVolume to be provisioned to it
Finally, we met Volumes earlier in this series. In the case of storage, we can populate a volume with the contents of the external storage captured by a PV and requested by a PVC, provision that volume to a pod and finally mount its contents into a container in that pod.

Managing all these components can be cumbersome during development, but as in our discussion of configuration, Kubernetes volumes provide a convenient abstraction by defining how and where to mount external storage into your containers. They form the start of what I like to think of as the “storage frontend” in Kubernetes—these are the components most closely integrated with your pods and which won’t change from environment to environment.
All those other components, from the CSI driver all the way through the PVC, which I like to think of as the “storage backend”, can be torn out and replaced as you move between environments without affecting your code, containers, or the controller definitions that deploy them.
Note that on a single-node cluster (like the one created for your by Docker Desktop on your development machine), you can create hostpath backed persistentVolumes which will provision persistent storage from your local disk without setting up any CSI plugins or special storage classes. This is an easy way to get started developing your application without getting bogged down in the diagram above—effectively deferring the decision and setup of CSI plugins and storageClasses until you’re ready to move off of your dev machine and into a larger cluster.
Advanced Topics
The simple hostpath PVs mentioned above are appropriate for early development and proof-of-principle work, but they will need to be replaced with more powerful storage solutions before you get to production. This will require you to look into the ‘backend’ components of Kubernetes’ storage solution, namely StorageClasses and CSI plugins:

 StorageClasses
 Container Storage Interface plugins

The Future
In this series, I’ve walked you through the basic Kubernetes tooling you’ll need to containerize a wide variety of applications, and provided you with next-step pointers on where to look for more advanced information. Try working through the stages of containerizing workloads, networking them together, modularizing their config, and provisioning them with storage to get fluent with the ideas above.
Kubernetes provides powerful solutions for all four of these areas, and a well-built app will leverage all four of them. If you’d like more guidance and technical details on how to operationalize these ideas, you can explore the Docker Training team’s workshop offerings, and check back for new Training content landing regularly.
After mastering the basics of building a Kubernetes application, ask yourself, “How well does this application fit the values of portability, scalability and shareability we started with?” Containers themselves are engineered to easily move between clusters and users, but what about the entire application you just built? How can we move that around while still preserving its integrity and not invalidating any unit and integration testing you’ll perform on it?
Docker App sets out to solve that problem by packaging applications in an integrated bundle that can be moved around as easily as a single image. Stay tuned to this blog and Docker Training for more guidance on how to use this emerging format to share your Kubernetes applications seamlessly.
To learn more about Kubernetes storage and Kubernetes in general:

Read the Kubernetes documentation on PersistentVolumes and PersistentVolumeClaims.
Find out more about running Kubernetes on Docker Enterprise and Docker Desktop.
Check out Play with Kubernetes, powered by Docker.

We will also be offering training on Kubernetes starting in early 2020. In the training, we’ll provide more specific examples and hands on exercises.To get notified when the training is available, sign up here:
Get Notified About Training

Designing Your First App in #Kubernetes, Part 5 — Provisioning StorageClick To Tweet

The post Designing Your First Application in Kubernetes, Part 5: Provisioning Storage appeared first on Docker Blog.
Quelle: https://blog.docker.com/feed/

Amazon ElastiCache kündigt Online-Konfigurationsänderungen für alle geplanten Vorgänge mit dem neuesten Redis 5.0.5 an

Amazon ElastiCache für Redis verbessert die Verfügbarkeit von Auto-Failover-Clustern während aller geplanten Vorgänge. Sie können jetzt Ihren Cluster skalieren, die Redis-Engine-Version aktualisieren und Patches und Wartungsupdates anwenden, während der Cluster online bleibt und weiterhin eingehende Anfragen bearbeitet. Diese Verfügbarkeitsverbesserungen sind zusammen mit der neuesten Redis-Version 5.0.5 enthalten. Eine vollständige Liste aller Verbesserungen in Amazon ElastiCache for Redis 5.0.5. 
Quelle: aws.amazon.com

Elektroauto: Maserati wird elektrisch

Die Sportwagen von Maserati sind für ihren satten Sound bekannt. Auf den müssen die Käufer künftig verzichten: Der italienische Hersteller will seine Autos elektrifizieren. Den Anfang macht der Ghibli, den es ab dem kommenden Jahr auch mit einem Hybridantrieb gibt. (Elektroauto, Technologie)
Quelle: Golem