Spinning up cloud-scale analytics is even more compelling with Talend and Microsoft

Special thanks to Lee Schlesinger and the Talend team for their contribution to this blog post. Following the significant announcement around the continued price-performance leadership of Azure Data Warehouse in February 2019, Talend announced support of Stitch Data Loader for Azure SQL Data Warehouse. Stich Data Loader is Talend’s recent addition to its offering portfolio small and mid-market customers. With Stitch Data Loader, customers can load 5 million rows/month into Azure SQL Data Warehouse for free or scale up to an unlimited number of rows with a subscription.

All across the industry, there is a rapid shift to the cloud. Utilizing fast, flexible, and secure cloud data warehouse is an important first step in that journey. With Microsoft Azure SQL Data Warehouse and Stitch Data Loader companies can get started faster than ever. The fact that ADW can be up to 14x faster, and 94 percent less expensive than similar options in the marketplace, should only help further accelerate adoption of cloud scale analytics by customers of all sizes.

Building pipelines to the cloud with Stitch Data Loader

The Stitch team built the Azure SQL Data Warehouse integration with the help of Microsoft engineers. The solution leverages Azure Blob Storage and PolyBase to get data into the Azure cloud and ultimately loaded to SQL Data Warehouse. We take care of all issues with data type transformation between source and destination, schema changes, and bulk loading.

To start moving data, just specify your host address and database name and provide authentication credentials. Stitch will then start loading data from all of your sources in minutes.

Stitch Data Loader enables Azure SQL Data Warehouse users to analyze data from more than 90 data sources, including databases, SaaS tools, and ad networks. We also sponsor and integrate with the Singer open source ETL project, which makes it easy to get additional or custom data sources into Azure SQL Data Warehouse.

Stitch’s destination switching feature also makes it easy for existing Stitch users to take their existing integrations and start loading them into Azure SQL Data Warehouse right away.

Going further with Talend Cloud and Azure SQL Data Warehouse

What if you’re ready to scale out your data warehousing efforts and layer on data transformation, profiling, and quality? Talend Cloud offers many more sources as well as more advanced data processing and data quality features that are available within the ADW and the Azure Platform. With over 900 connectors available, you’ll be able to move all your data, no matter the format or source. With data preparation and additional security features built-in, you can get Azure-ready in no time.

Take Uniper for instance. Using Azure and Talend Cloud, they built a cloud-based data analytics platform to integrate over 100 data sources including temperature and IoT sensors, from various external and internal sources. They constructed the full flow of business transactions — spanning market analytics, trading, asset management, and post-trading — while enabling data governance and self-service, resulting in reduced integration costs by 80 percent and achieving ROI in 6 months.

What’s next?

Start your free trial of Stitch today and load data into Azure SQL Data Warehouse in minutes
Find out more about the Azure SQL Data Warehouse’s unmatched price-performance and related announcements from Microsoft.

Quelle: Azure

Release with confidence: How testing and CI/CD can keep bugs out of production

With today’s dueling demands to iterate faster while keeping quality standards high, minimizing both the frequency and severity of bugs in code is no easy task. This is doubly true in serverless environments, where lightweight code bases and fully-managed architectures enable developers to iterate more rapidly than ever before. Thorough testing is an effective method of finding potential bugs and protecting against errors in production that can have real business impact.Testing can be somewhat of a double-edged sword, however: it’s a critical part of a successful launch, but it can easily take developers away from other tasks. Therefore, it’s important to know the types of testing available, and which ones to utilize for your specific needs.While there is no shortage of ways to test your serverless applications, all of them come with trade-offs around speed, cost, accuracy, and scope. Which combination works best for you will depend on variables like how critical, long-lasting, and well-maintained your code is. Code that is critical and reused often requires in-depth testing with a wide range of scopes, while less important, non-reused code can often get by with fewer, higher-level tests.In the next couple posts, we’ll look at testing and other important strategies that help minimize the frequency of bugs in production serverless deployments and reduce the severity of those that inevitably sneak past your test suite. We’ll also take a look at some example code that was developed for Cloud Functions, Google Cloud’s function-based serverless compute platform. This first post will discuss two important strategies for minimizing the frequency of bugs in production: testing and CI/CD (continuous integration and continuous deployment), and will cover general testing techniques, followed by examples of how to apply them with Cloud Functions.Keeping tests realTesting your functions locally and on a CI/CD machine is a good defense against most bugs, but it won’t catch everything. For example, it won’t identify issues with environment configuration or external dependencies that could impact your production deployment.To get over this hurdle, we need an environment to test in that has all the functionality of a production Cloud Functions environment, but none of the associated risk should the environment get corrupted. To do this, we can set up a test—or canary—environment that resides somewhere between a local machine and production and replicates the production environment. One common approach is to use a separate Google Cloud Project as a canary environment.Testing 101Once our canary Cloud Functions environment is set up, we can start to talk about the three primary testing types that we’ll use: unit tests, integration tests, and system tests. Let’s look at each type individually, stepping up from the easiest to the most involved.Lightweight: unit testsPerhaps the easiest, quickest tests you can run are unit tests. Unit tests focus on a single feature and confirm that things work as expected. They have a few great things going for them, but are generally limited in their scope and the types of issues they identify.Unit tests use mocking frameworks to fake external dependencies. For example, let’s say you have a feature that calls an API, the API returns a certain response, and then the feature does something based on that response. Unit testing takes that API out of the equation. The mocking framework returns a pre-defined response—what you would expect the API to return if it were working properly, for example—and simply makes sure that the feature itself behaves how we think it should whenever it gets that response.Unit tests at a glance:Are fast and cheap to run since they rarely require billed cloud resourcesConfirm that the details of your code work as expected. For example, they’re great for edge case checking and other similar tests.Are useful for investigating known issues, but not great at identifying new onesHave no reliance on external dependencies (like libraries, APIs, etc). Of course, this means they also can’t be used to verify these things.Let’s take a quick look at an example. First, here is a sample HTTP function that creates a Cloud Storage bucket based on the name parameter in the request body:And here is a very basic unit test for our function. This test creates a mock version of the @google-cloud/storage library using proxyquire and sinon. It then checks that the mock library’s createBucket function is being called with the correct arguments.Middleweight: integration testsStepping up a bit from unit tests are integration tests. As the name suggests, integration tests verify that parts of your code fit together as you expect.Integration tests can use a mocking framework, as we described in unit testing, or can rely on real external dependencies. Using a mocking framework is quicker and cheaper, while bringing in external dependencies provides a more robust test. As a rule of thumb, we recommend mocking any dependencies that are slow (more than one second) and/or expensive. This enables these tests to be run quickly and cheaply.Integration tests at a glance:Balance problem detection and isolation. They are large enough in scope to detect some unanticipated bugs, but can still be run relatively quicklyMay require small amounts of billed resources, depending on how you run your tests. For example: if a test run depends on actual build resources, then those runs would cost money.Here is an integration test for our sample function. This test sends an HTTP request to the function and checks that it actually creates a Cloud Storage bucket with the correct name. For integration tests, the value of BASE_URL should point to a version of the function running locally on a developer’s machine (such as http://localhost:8080).Heavyweight: system testsSystem tests broaden the scope to verify that your code works as a system. To that end, system tests rely heavily on external dependencies—making these tests both slower and more expensive.One important thing to keep in mind with system tests is that state matters, and it may introduce consistency or shared-resource issues. For example, if you run multiple tests at the same time and your test tries to create a resource that already exists (or delete a resource that doesn’t exist), your test results may become flakey.At a glance:Since you’re directing traffic at an actual cloud deployment, system tests can require moderate amounts of billed resources.System tests provide good bug detection. They can even catch unanticipated bugs and bugs outside your codebase,  such as in your dependencies or cloud deployment configuration.Since the scope of system tests is so large, they aren’t as good at isolating problems and their root causes as the other types of tests we’ve discussed.Here is a system test for our sample function. Like our integration test, it sends an HTTP request to the newBucket Cloud Function and checks to make sure the correct bucket was created.If you look closely, you’ll notice that this test is exactly the same as the integration test. In fact, the only difference is that the BASE_URL variable is set so that the test points at a deployed Cloud Function instead of a locally-hosted one.Though this trick is often specific to HTTP-triggered functions, reusing integration test code in system tests (and vice versa) can help reduce the maintenance burden created by your tests.Other testing optionsLet’s take a quick look at some other common types of testing, and how you can best utilize them with Cloud Functions.Static testsStatic tests verify that your code follows language and style conventions and dependency best practices. While they are relatively simple to run, one major limitation you have to account for is their narrow focus.Many static test options are free to install and easy to use. Linters (such as prettier for Node.js and pylint for Python) enforce style conventions, while dependency tools (such as Snyk for Node.js) check for dependency issues.Load testsLoad tests involve creating vast amounts of traffic and directing it at your app to make sure your app can handle real-world traffic spikes. They verify that the entire, end-to-end system—including non-autoscaled components—are capable of handling a specified request load, which is usually a multiple of the peak number of simultaneous users you expect.Load tests can be expensive, since they require lots of billed resources to run, and slow due to the external dependencies they rely on. On the plus side, many of the actual testing tools are free, including Apache Bench (“ab” on most Mac and Linux systems),Apache JMeter, and Nordstrom’s serverless-artillery project.Security testsSecurity tests verify that code and dependencies can handle potentially malicious input, and can be part of your unit, integration, system, or static testing. Beware: security tests have the potential to damage their target app environment. For example, a testing tool may attempt to drop a database or otherwise compromise the resources in its environment. The lesson here is: make sure to use a test or canary environment unless you are 100% sure the tool in question won’t hurt your production environment.There are many free security testing options out there, including Zed Attack Proxy, Snyk.io, the Big List of Naughty Strings, and oss-fuzz, just to name a few. However, no automated security testing tool is perfect. If you are serious about security, hire a security consultant.CI/CD FTWAt the beginning of this post, we mentioned two ways to minimize the frequency of bugs: testing and CI/CD. Now that we’ve covered testing, let’s take a look at how continuous integration and continuous deployment can provide an additional layer of defense against bugs in production.The motivation for CI/CD is fairly straightforward. If you’re a developer, version control—whether it’s git branches or another system—is your source of truth. At the same time, code for Cloud Functions has to be tested and then redeployed manually. This presents no shortage of potential issues.CI/CD systems automate this process, letting you automatically mirror any changes in version control to GCF deployments. CI/CD systems detect code changes using hooks in version control systems that are triggered whenever new code versions are received. These systems can also invoke language-specific command-line functions to run your tests, followed by a call to gcloud to automatically deploy any code changes to Cloud Functions.There are many different CI/CD options available, including Google’s own Cloud Build, which natively integrates with GCP and source repositories. A basic CI/CD for Cloud Functions is fairly simple to set up and deploy with Cloud Build—see this page for more details.In conclusionWriting a thorough and comprehensive test suite, running it in a realistic “canary” environment,  and automating your deployment process using CI/CD tools are techniques that can help you reduce your production bug rate. When used together, they can significantly increase the reliability and availability of your services while decreasing the frequency of buggy code and its resulting negative business impacts.However, as we cautioned at the beginning, testing simply can’t catch every bug before it hits production. In our next post, we’ll discuss how to minimize the business impact of bugs that do make their way into your Cloud Functions based applications using monitoring and in-production debugging techniques.
Quelle: Google Cloud Platform

Help stop data leaks with the Forseti External Project Access Scanner

Editor’s note:This is the second post in a series about Forseti Security, an open-source security toolkit for Google Cloud Platform (GCP) environments . In our last post, ClearDATA told us about a serverless alternative to the usual way of deploying Forseti in a dedicated VM. In this post, we learn about Forseti’s new External Project Access Scanner.With data breaches or leaks a common headline, cloud data security is a constant concern for organizations today. But securing cloud-based data is no easy feat. In particular, it can be hard to identify and secure the routes by which data can leave the organization—so called data exfiltration.Consider the following scenario: a Google Cloud Platform (GCP) user has permissions in projects across different organizations, the root note in a GCP resource hierarchy. As a member of Organization A, they have permissions in a project under Organization A’s GCP organization node. This user also has permissions in a project under Organization B’s GCP organization node.However, nowhere in Organization A’s Cloud Identity and Access Management (IAM) console does it indicate that the user has permissions to a project in Organization B. There is also no evidence of this in Organization A’s G Suite admin console, so the user can move data between organizations virtually unnoticed.This kind of exfiltration vector is difficult to detect. Fortunately, the Forseti Security toolkit includes an External Project Access Scanner that can help.What does the Forseti scanner do?In GCP, the best practice is to use service accounts to perform actions where a GCP user isn’t directly involved. For example, if an application in a VM needs to connect to Google Cloud Storage, the application uses a service account for that interaction. Following this best practice, Forseti also uses service accounts to make API calls when it scans for permissions.Each project in GCP has an ancestry known as a Resource Hierarchy.  This ancestry will always start at an organization node (e.g. Organization A).  Under the organization there will be zero or more folders. A GCP project may either be a child of a folder or the organization itself.The challenge here is that a service account only has permissions in the organization where Forseti is deployed. In other words, if Forseti is deployed in Organization A, it can’t see what projects a user has to in Organization B.This is where the concept of “delegated credentials” becomes incredibly useful. Delegated credentials allow a service account to temporarily act as a user. After compiling a list of users in the organization, the service account impersonates each user with these delegated credentials. The scanner then obtains the list of projects to which each user has access, regardless of the organization node.Having the list of projects, and still using each user’s delegated credentials, the scanner obtains the project ancestry of each project.This scanner is configured via whitelist rules (details discussed later). When you first deploy Forseti, the only rule that exists is to permit users to have access to projects in the organization where Forseti is deployed. In other words, if Forseti is deployed in a project in Organization A, then users in Organization A have access to projects in Organization A and only Organization A.A violation occurs when none of the ancestors of a project are whitelisted in the External Project Access Scanner Rules.To sum up the operation of the External Access Project Scanner, it:Obtains a list of all the users in a GCP organizationFor each userObtains delegated credentials from that userObtains a list of projects to which the user has accessIterates over each project, obtaining the project’s ancestryDetermines which ancestries are in violation of the whitelist rulesReports the violationsHow to configure and run the External Project Access ScannerThe first step is to install and configure Forseti; you can find some great instructions on forsetisecurity.org.Then, you need to configure your whitelist rules.  As mentioned previously, the External Project Access Scanner is configured by whitelist rules in the external_project_access_rules.yaml file. The first time you open this file, there’s only one entry that whitelists the organization in which you’ve deployed Forseti. For example:A resource in GCP is identified by the resource type/resource number ID.  Each rule may list multiple type/ID pairs as long as they are organizations or folders types..Once the desired rules are in place, you can run the scanner. At this point, it is important to note that the scanner does not run in a cron job, like the other Forseti scanners, but must be manually invoked. This is because, depending on the size of the organization, this scanner has the potential to execute for a long time. Remember that the scanner iterates over every user in an organization and calls the GCP API to obtain a list of all projects. Then, for each project, the scanner obtains the ancestry, again via the API. This can amount to a lot of API calls that take a long time to execute.After selecting the Forseti model, here’s how to run it via the CLI.When the scanner completes, it stores violations in at least two locations:Forseti’s CloudSQL database in the violations tableA GCS bucket in the project where Forseti is deployedAn e-mail notification if you configured the Forseti server to do soThe violation data itself is worth discussion. Violation data is in the form of a JSON string and resembles the following.A violation entry is generated for each project (‘full_ name’) and per user (‘member’) where the project’s ancestry is in violation. The ‘rule_ancestors’ field lists all the ancestors that were listed in an External Project Access Scanner rule.Future workWith the External Project Access Scanner you can now identify projects in organizations or folders that aren’t whitelisted by the scanner rules. As of Forseti v2.9.0, the whitelist rules apply to all users in an organization. This means that all users in an organization may have the ability to access projects in another organization if such a rule existed. Going forward, one improvement would be to enhance the rule definition to allow each rule to be applied to specific users or groups.Additionally, the External Project Access Scanner returns a violation regardless of the permissions level a user has on a project in another organization. Whether the user has viewer, editor or project owner roles, the scanner reports a violation all the same. The rule could be further improved by allowing the specification of an allowed permission for each whitelisted organization or folder.ConclusionMigrating your workloads to the cloud brings increased flexibility, but also an expanded threat domain. Thankfully, tools like Forseti can greatly mitigate that risk, with a powerful suite of security analysis, notification, and enforcement tools for GCP. When trying to secure data in the cloud, the External Project Access Scanner affords insight into an often overlooked data exfiltration path.  To get started with Forseti and the External Project Access Scanner, visit forsetisecurity.org.
Quelle: Google Cloud Platform

Azure Databricks – VNet injection, DevOps Version Control and Delta availability

Azure Databricks provides a fast, easy, and collaborative Apache® Spark™-based analytics platform to accelerate and simplify the process of building big data and AI solutions that drive the business forward, all backed by industry-leading SLAs.

With Azure Databricks, you can set up your Spark environment in minutes and autoscale quickly and easily. You can also apply your existing skills and collaborate on shared projects in an interactive workspace with support for Python, Scala, R, and SQL, as well as data science frameworks and libraries like TensorFlow and PyTorch.

We’re continuously listening to customers and answering questions as we evolve this service. This blog outlines important service announcements that we are proud to deliver for our customers.

Azure Databricks Delta available in Standard and Premium SKUs

Azure Databricks Delta brings new levels of reliability and performance for production workloads based on a number of improvements including transaction support, schema validation, indexing, and data versioning.

Since the preview of Delta was announced, we have received overwhelmingly positive feedback on how it has helped customers build complex pipelines for both batch and streaming data, and simplified ETL pipelines. We are excited to announce that Delta is now available in our Standard SKU offering in addition to Premium SKU offering so you can leverage its capabilities to the fullest and build pipelines more efficiently. Now everyone can get the benefits of Databricks Delta‘s reliability and performance.

You can read more about Azure Databricks Delta in our guide, “Introduction to Databricks Delta,” and import our quickstart notebook.

Azure DevOps Services Version Control

Azure DevOps is a collection of services that provide an end-to-end solution for the five core practices of DevOps: planning and tracking, development, build and test, delivery, and monitoring and operations.

Initially, we started with GitHub integration for Azure Databricks notebooks. On popular demand, we have introduced the ability to set your Git provider to Azure DevOps Services.

Authentication with Azure DevOps Services is done automatically when you authenticate using Azure Active Directory (Azure AD). The Azure DevOps Services organization must be linked to the same Azure AD tenant as Databricks. You can easily select your Git provider to Azure DevOps Services as shown in the documentation, “Azure DevOps Services Version Control.”

Deploy Azure Databricks in your own Azure virtual network (VNet injection) preview

By default, we deploy and manage your clusters for you in managed VNETs, with peering enabled. We create and manage these VNETs, but they reside in your subscription. We also manage the accompanying network security group rules.

Some customers, however, require network customization. I am pleased to announce that if you need to, now you can deploy Azure Databricks in your own existing virtual network (VNet injection). Connect Azure Databricks to other Azure services, such as Azure Storage, in a secure manner using service endpoints or to on-premises data sources for use with Azure Databricks, taking advantage of user-defined routes. You can also connect Azure Databricks to a network virtual appliance to inspect all outbound traffic and take actions according to allow and deny rules. Configure Azure Databricks to use custom DNS and configure network security group (NSG) rules to specify egress traffic restrictions.

Deploying Azure Databricks to your own virtual network also lets you take advantage of flexible CIDR ranges. See the documentation to quickly and easily configure Azure Databricks in your Vnet using the Azure Portal UI.

Get started today!

Try Azure Databricks and let us know your feedback!

Try Azure Databricks through a 14-day premium trial with free Databricks Units.
Sign up for the webinar on Machine Learning on Azure.
Watch the video on how to get started with the Apache Spark on Azure Databricks.
Visit the repository of Azure Databricks resources to continue learning.

Quelle: Azure

Uploadfilter: Voss stellt Existenz von Youtube infrage

Gut zwei Wochen vor der endgültigen Abstimmung über Uploadfilter stehen sich Befürworter und Gegner weiter unversöhnlich gegenüber. Verhandlungsführer Voss hat offenbar kein Problem damit, wenn es Plattformen wie Youtube nicht mehr gäbe. Wissenschaftler sehen hingegen Gefahren durch die Reform. (Urheberrecht, Media Center)
Quelle: Golem