Take charge of your data: How tokenization makes data usable without sacrificing privacy

Privacy regulations place strict controls on how to examine and share sensitive data. At the same time, you can’t let your business come to a standstill. De-identification techniques can help you strike a balance between utility and privacy for your data. In previous “Take charge of your data” posts, we showed you how to gain visibility into your data using Cloud Data Loss Prevention (DLP) and how to protect sensitive data by incorporating data de-identification, obfuscation, and minimization techniques. In this post, we’ll dive a bit deeper into one of these de-identification techniques: tokenization.Tokenization substitutes sensitive data with surrogate values called tokens, which can then be used to represent the original (or raw) sensitive value. It is sometimes referred to as pseudonymization or surrogate replacement.  The concept of tokenization is widely used in industries like finance and healthcare to help reduce the risk of data in use, compliance scope, and minimize sensitive data being exposed to systems that do not need it. It’s important to understand how  tokenization can help protect sensitive data while allowing your business operations and analytical workflows to use the information they need. With Cloud DLP, customers can perform tokenization at scale with minimal setup and overhead.Understanding the problemFirst, let’s look at the following scenario: Casey works as a data scientist at a large financial company that services businesses and end users. Casey’s primary job is to analyze data and improve the user experience for people using the company’s vast portfolio of financial applications. In the normal course of doing business, the company collects sensitive and regulated data including personally identifiable information (PII) like Social Security numbers. In order to demonstrate the benefits of tokenization, let’s consider a task that Casey might do as part of her job: The company wants to determine what products they can build to help users improve their credit scores depending on their age range. In order to answer this question, Casey needs to join user information in the company’s banking app with customers’ credit score data received from a third party.Casey requests access to both the users table and the table filled with the third party’s credit score data.Let’s take a look at these tables. The first few rows of each appear as follows:At this point, it’s worth enumerating some of the risks involved with Casey getting access to this data.Email address and exact age is not required for the task at hand but are being disclosed because they are part of the user’s table.Social Security numbers are required for joining the tables, but they’re being disclosed in their raw form. While using this raw data will allow Casey to complete the job at hand, it exposes sensitive data which could be propagated into new systems. Now let’s try to tackle this privacy problem using de-identification and tokenization with Cloud DLP. Fixing the problem with tokenizationAs mentioned above, tokenization substitutes sensitive data with surrogate values called tokens. These tokens can then be used to represent surrogate values in multiple ways. For example, they can retain the format of the original data while revealing only a few characters. This can be useful in cases where you need to retain a record identifier or join data, but don’t want to reveal the sensitive underlying elements. This is sometimes referred to as preserving referential integrity, and can be used to strike a balance between utility and reducing risk when using the data. Continuing with our example, Casey can use this tokenized data to join the two data sources and perform analysis.Step 1: Joining tables on a token instead of raw SSNCloud DLP supports multiple cryptographic token formats that keep the referential integrity needed to join tables:Deterministic encryption: Replaces an input value with a cryptographic token. This encryption method is reversible, which helps to maintain referential integrity across your database and has no character-set limitations.Format Preserving Encryption (FPE): Creates a token of the same length and character set as the input. Similarly to deterministic encryption, it’s reversible. FPE is great for inputs with a well defined alphabet space—for example, an alphabet including only [0-9a-zA-Z]. Secure, key-based hashes: Creates a token based on a one-way hash generated using an encryption key. This encryption method is inherently irreversible. This may not be appropriate for use cases that need to reverse or de-tokenize data in another workflow.For Casey’s task, let’s use deterministic encryption in Cloud DLP to tokenize the Social Security Numbers in both tables. Note: To support your data’s privacy, DLP supports encryption keys which are wrapped using Cloud Key Management Service.Step 2: Masking or redacting unneeded raw PII values.For the business use case described above, we don’t need email addresses or exact age. Given that, one option is to use Cloud DLP’s value replacement and bucketing to transform values in a column. For example, the users_db table can be transformed to replace email address with the string “[email-address]” and replace exact age values with bucket ranges.Great, the datasets have been de-identified and tokenized! Casey can now join and analyze the dataTokenization for unstructured dataWhat we’ve described so far is tokenization of structured data. However, in a real-word scenario, it’s likely that unstructured data containing PII is present. For example, let’s consider the users_db table again and add a customer_support_notes column to the table. This column stores a user’s call log from when they last called the company’s automated customer support line.Cloud DLP can also detect and de-identify sensitive information in this unstructured data. In the example below we have configured Cloud DLP to tokenize the SSN to keep referential integrity and redacted other sensitive data. Referential integrity here would allow someone to see that the Social Security Number in the unstructured notes is the same as the Social Security Number in other parts of the data such as in the SSN column. This happens because the token is the same value for the same given input values:What we’ve just shown is how you can use leverage the power of Cloud DLP’s inspection engine along with its ability to transform and tokenize to help protect both structured and unstructured text.  Tokenization in actionOn Google Cloud Platform, you can tokenize data using Cloud DLP and a click-to-deploy Cloud Dataflow pipeline. This ready-to-use pipeline takes data from Cloud Storage, processes it and ingests it into BigQuery.To do this:Visit: https://console.cloud.google.com/dataflowClick on “Create job from template”Select “Data Masking/Tokenization using Cloud DLP from Cloud Storage to BigQuery”In short, tokenization using Cloud DLP can help you support privacy-sensitive use cases and adhere to data security policies within your organization. It can also help your business satisfy policy and regulatory requirements. For more about tokenization and Cloud DLP, watch our recent Cloud OnAir webinar, “Protecting sensitive datasets in Google Cloud Platform” to see a demo of tokenization with Cloud DLP in action. Then, to learn more, visit Cloud Data Loss Prevention for resources on getting started.
Quelle: Google Cloud Platform

Announcing the general availability of Azure premium files

Highly performant, fully managed file service in the cloud!

Today, we are excited to announce the general availability of Azure premium files for customers optimizing their cloud-based file shares on Azure. Premium files offers a higher level of performance built on solid-state drives (SSD) for fully managed file services in Azure.

Premium tier is optimized to deliver consistent performance for IO-intensive workloads that require high-throughput and low latency. Premium file shares store data on the latest SSDs, making them suitable for a wide variety of workloads like databases, persistent volumes for containers, home directories, content and collaboration repositories, media and analytics, high variable and batch workloads, and enterprise applications that are performance sensitive. Our existing standard tier continues to provide reliable performance at a low cost for workloads less sensitive to performance variability, and is well-suited for general purpose file storage, development/test, backups, and applications that do not require low latency.

Through our initial introduction and preview journey, we’ve heard from hundreds of our customers from different industries about their unique experiences. They’ve shared their learnings and success stories with us and have helped make premium file shares even better.

“Working with clients that have large amounts of data that is under FDA or HIPAA regulations, we always struggled in locating a good cloud storage solution that provided SMB access and high bandwidth… until Azure Files premium tier. When it comes to a secure cloud-based storage that offers high upload and download speeds for cloud and on-premises VM clients, Azure premium files definitely stands out.”

– Christian Manasseh, Chief Executive Officer, Mobius Logic

“The speeds are excellent. The I/O intensive actuarial CloudMaster software tasks ran more than 10 times faster in the Azure Batch solution using Azure Files premium tier. Our application has been run by our clients using 1000’s of cores and the Azure premium files has greatly decreased our run times.”

– Scott Bright, Manager Client Data Services, PolySystems

Below are the key benefits of the premium tier. If you’re looking for more technical details, read the previous blog post “Premium files redefine limits for Azure Files.”

Performant, dynamic, and flexible

With premium tier, performance is what you define. Premium file shares’ performance can instantly scale up and down to fit your workload performance characteristics. Premium file shares can massively scale up to 100 TiB capacity and 100K IOPS with a target total throughput of 10 GiB/s. Not only do premium shares include the ability to dynamically tune performance, but also offer bursting capability to meet highly variable workload requirements with short peak periods of intense IOPS.

"We recently migrated our retail POS microservices to Azure Kubernetes Service with premium files. Our experience has been simply amazing – premium files permitted us to securely deploy our 1.2K performant Firebird databases. No problem with size or performance, just adapt the size of the premium file share to instantly scale. It improved our business agility, much needed to serve our rapidly growing customer base across multiple retail chains in France."

– Arnaud Le Roy, Chief Technology Officer, Menlog

We partnered with our internal Azure SQL and Microsoft Power BI teams to build solutions on premium files. As a result, Azure Database for PostgreSQL and Azure Database for MySQL recently opened a preview of increased scale of 16 TiB databases with 20,000 IOPS powered by premium files. Microsoft Power BI announced a powerful 20 times faster enhanced dataflows compute engine preview built upon Azure Files premium tier.

Global availability with predictable cost

Azure Files premium tier is currently available in 19 Azure regions globally. We are continually expanding regional coverage. You can check the Azure region availability page for the latest information.

Premium tier provides the most cost-effective way to create highly-performant and highly-available file shares in Azure. Pricing is simple and cost is predictable–you only pay a single price per provisioned GiB. Refer to the pricing page for additional details.

Seamless Azure experience

Customers receive all features of Azure Files in this new offering, including snapshot/restore, Azure Kubernetes Service and Azure Backup integration, monitoring, hybrid support via Azure File Sync, Azure portal, PowerShell/CLI/Cloud Shell, AzCopy, Azure Storage Explorer support, and the list goes on. Developers can leverage their existing code and skills to migrate applications using familiar Azure Storage client libraries or Azure Files REST APIs. The opportunities for future integration are limitless. Reach out to us if you would like to see more.

With the availability of premium tier, we’re also enhancing the standard tier. To learn more, visit the onboarding instructions for the standard files 100 TiB preview.

Get started and share your experiences

It is simple and takes two minutes to get started with premium file shares. Please see detailed steps for how to create a premium file share.

Visit Azure Files premium tier documentation to learn more. As always, you can share your feedback and experiences on the Azure Storage forum or email us at azurefiles@microsoft.com. Post your ideas and suggestions about Azure Storage on our feedback forum.
Quelle: Azure