Security, simplified: Making Shielded VM the default for Compute Engine

Last April we announced the general availability of Shielded VM—virtual machine instances that are hardened with a set of easily configurable security features to ensure that when your VM boots, it’s running a verified bootloader and kernel. To make it accessible to everyone, we offered Shielded VM at no additional charge.To continue improving the safety and security of our ecosystem, today we’re making Unified Extensible Firmware Interface (UEFI) and Shielded VM the default for everyone using Google Compute Engine—at no additional charge. This provides defense-in-depth hardening features to all supported VM instances, including protection from:Malicious guest system firmware, UEFI extensions, and driversPersistent boot and kernel compromise in the guest OSVM-based secret exfiltration and replay“Using Shielded VM to run our secure services on Google Cloud Platform has improved our security posture, while being quick and simple to implement,” said Michael Capicotto, Cloud Security Architect at Two Sigma. “Making this the default for Compute Engine is a great next step toward improving security for all.”What’s newSince Shielded VM became generally available, we’ve continued to add support for common use cases based on your feedback and feature suggestions. Adoption across Google Cloud: In addition to making Shielded VM the default across Google Compute Engine, several VM-based Google Cloud services, including Cloud SQL, Google Kubernetes Engine, Kaggle, and Managed Service for Microsoft Active Directory, are now using Shielded VM as their underlying infrastructure.Migration support: Starting with version 4.5, Migrate for Compute Engine (formerly Velostrata) includes support for migration of UEFI-based VMs from on-prem to Shielded VM in Google Compute Engine.Security Command Center integration: Security Health Analytics findings now allow you to identify VM instances with Shielded VM support that don’t have secure boot enabled, so you can enable it if possible.The power to chooseIn addition to the new features we’ve added, Shielded VMs now offer more flexibility around the operating system images you can use and how you get them. Support across multiple operating systems: For an extensive list of operating systems that support Shielded VM features, as well as which projects these can be found in, please see Google Compute Engine images.Marketplace for an open ecosystem: Shielded VM images are also available in the GCP Marketplace. These are brought to you in collaboration with Deep Learning VM, as well as our third-party partners at Center for Internet Security (CIS) and Server General. “Our goal is to help our customers to secure their data and achieve regulatory compliance with ease,” said Raj Sharma, CEO at Server General. “Moving our MySQL and PostgreSQL images to Shielded VM has allowed us to provide verifiable security by extending the trust model from the platform to the application server layer, and ultimately to data that is stored in a database or a file server.”Custom Shielded-ready images: You can also use your own keys to sign binaries and create custom images for your application or workload. These can be imported to Compute Engine at no additional charge.Get started with a simplified UIIt’s now even easier to get started with Shielded VM via the Cloud Console, gcloud, or API. Let’s look at how to create a Shielded VM from the console. First, visit the “VM instances” option from the left navigation bar in the Compute Engine console. Then select “New VM instance” from the menu.Then, simply pick a boot disk that supports Shielded VM features. In this example, we’re creating a VM instance using the Debian operating system.Once you’ve selected a boot disk, you can adjust Shielded VM’s configuration options—Secure Boot, vTPM, and integrity monitoring—under the “Security” tab. On a Shielded VM instance, the vTPM and integrity monitoring options are enabled by default, but Secure Boot is not. This is because some customers use unsigned drivers or other similar features that are incompatible with Secure Boot. If you don’t need these features, we strongly encourage you to enable Secure Boot.At this time we want to be especially mindful of the many challenges organizations are facing. By making Shielded VM the default for Google Compute Engine, we hope to help simplify your workflows and provide the peace of mind that your VMs and VM-based services are protected from persistent rootkits and bootkits. To learn more, please check out the Shielded VM documentation.
Quelle: Google Cloud Platform

Explaining model predictions on image data

Editor’s note: This is the second blog post in a series covering how to use AI Explanations with different data types. The first post explained how to use Explainable AI with tabular data.As machine learning technology continues to improve and models become increasingly accurate, we’re using ML to solve more and more complex problems. As ML technology is improving, it’s also getting more complex. This is one of the reasons that late last year we launched Explainable AI—a set of tools for understanding how your machine learning models make predictions. In this post, the second in our series on Explainable AI, we’ll dive into how explanations work with image classification models, and how you can use AI Explanations to better understand your image models deployed on Cloud AI Platform. We’ll also show you a new image attribution method we recently launched called XRAI.XRAI is a new way of displaying attributions that highlights which salient features of an image most impacted the model, instead of just the individual pixels. You can see the effect below, showing which regions contributed to our model’s prediction of this image as a husky. As indicated in the scale, XRAI highlights the most influential regions in yellow, and the least influential in blue, based on the viridis color palette:You can find more information on XRAI in this paper by Google’s PAIR team. For a broader background on Explainable AI, check out the last post in this series and our whitepaper. Why use Explainable AI for image models?When debugging a mistaken classification from a model or deciding whether or not to trust its prediction, it’s helpful to understand why the model made the prediction it did. Explainability can show you which parts of an image caused your model to make a specific classification.Image explanations are useful for two groups of people: model builders and model stakeholders. For data scientists and ML engineers building models, explanations can help verify that our model is picking up on the right signals in an image. In an apparel classification model, for example, if the highlighted pixels show that the model is looking at unique characteristics of a piece of clothing, we can be more confident that it’s behaving correctly for a particular image. However, if the highlighted pixels are instead in the background of the image, the model might not be learning the right features from our training data. In this case, explanations can help us identify and correct for imbalances in our data.Let’s walk through an example of using explanations to debug model behavior. Take a look at the attributions for this image, which our model correctly classified as “canoe/kayak”:While it classified the picture correctly, the attributions show us that the paddle signaled our model’s prediction most, rather than the boat itself. In fact, if we crop the image to include only the paddle, our model still classifies it as “canoe/kayak” even though it shouldn’t, since there’s no kayak in the picture:With this knowledge, we can now go back and improve our training data to include more images of kayaks from different angles, both with and without paddles. We’d also want to improve our “paddle” label by adding more images to our training data that feature paddles in the foreground and background.We also often need to explain our model’s predictions to external stakeholders. For example, if a manufacturing company is using a model to identify defective products, they may not want to take its classification alone at face value before discarding a product labeled as defective by the model. In these cases, it’s especially useful to understand the regions in the image that caused the model to make a particular classification. If you saw our last post, you might wonder how explanations for tabular models relate to those for image models. The methods are actually the same, but we present the results differently. For tabular data, each feature is assigned an attribution value indicating how much that feature impacted the model’s prediction. With image models, you can think of each pixel as an individual feature, and the explanation method assigns an attribution value to every one. To make image attributions more understandable, we also add a layer of post-processing on top to make the insights really pop.Image explanations on Cloud AI PlatformAI Platform Explanations currently offers two methods for getting attributions on image models based on papers published by Google Research: Integrated Gradients (IG), and XRAI. IG returns the individual pixels that signaled a model’s prediction, whereas XRAI provides a heatmap of region-based attributions. Here’s a comparison of both techniques on the husky image shown above, with IG on the left:Each approach has specific strengths depending on the type of image data you’re working with. IG is optimal for images taken in non-natural environments like labs. XRAI currently performs best on natural images, like a picture of a house or an animal. IG provides more granularity, since it returns a different attribution value for each pixel in an image. XRAI, on the other hand, joins pixels into regions and shows the relative importance of different areas in an image. This is more effective for natural images, where it’s better to get a higher level summary with insights like “the shape of the dog’s face” rather than “the pixels on the top left below the dog’s eye.”When creating a model version in AI Platform, you can specify the attribution method you’d like to use with just one parameter, so it’s worth trying both IG and XRAI to see which one performs better on your image data. In the next section, we’ll show you how to deploy your image models with explanations.Preparing your image model for deploymentOnce you’ve trained a TensorFlow model for image classification, you need to create an explanation_metadata.json file to deploy it to AI Platform Explanations. This tells our explanations service which inputs in your model’s graph you want to explain, along with the baseline you want to use for your model. Just like tabular models provide a baseline value for each feature, for image models we’ll provide a baseline image. Typically image models use an uninformative baseline, or a baseline where no additional information is being presented. Common baselines for image models include solid black or white images, or images with random pixel values. To use both solid black and white images as your baseline in AI Explanations, you can pass [0,1] as the value for the input_baselines key in your metadata. To use a random image, pass a list of randomly generated pixel values in the same size that your model expects. For example, if your model accepts 192×192 pixel color images, this is how you’d use a random pixel baseline image in your explanation metadata:“input_baselines”: [np.random.rand(192,192,3).tolist()]Here is an example of a complete explanation_metadata.json file for image models. Once your metadata file is ready, upload it to the same Cloud Storage bucket as your SavedModel.When you deploy TensorFlow image models to AI Platform Explanations, make sure your model serving function is set up to take a string as input (i.e. the client sends a base64-encoded image string), which you’ll then convert to an array of pixels on the server before sending to your model for prediction. This is the approach used in our sample notebook.Deploying your image model to AI Platform ExplanationsYou can deploy your model to AI Platform Explanations using either the AI Platform API or gcloud, the Google Cloud CLI. Here we’ll show you an example using gcloud. Changing the explanation method is simply a matter of changing the –explanation-method flag below. In this example we’ll deploy a model with XRAI:The origin flag above should include the Cloud Storage path of your saved model assets and metadata file. The num-integral-steps flag determines how many steps are used along the gradients path to approximate the integral calculation in your model. You can learn more about this in the XRAI paper.When you run the command above, your model should deploy within 5-10 minutes. To get explanations, you can either use gcloud or the AI Platform Prediction API. Here’s what the explanation response looks like:Finally, we can visualize the image explanations that were returned with the following:Customizing your explanation visualizationsIn addition to adding XRAI as a new explanation method, we’ve recently added some additional configuration options to customize how your image explanations are visualized. Visualizations help highlight the predictive pixels or regions in the image, and your preferences may change depending on the type of images you’re working with. Where attributions previously returned images with the top 60% of the most important pixels highlighted, you can now specify the percentage of pixels returned, whether to show positive or negative pixels, the type of overlay, and more. To demonstrate changing visualization settings, we’ll look at predictions from a model we trained on a visual inspection dataset from Kaggle. This is a binary classification model that identifies defective metal casts used in manufacturing. The image below is an example of a defective cast, indicated by the circular dent on the right:To customize how your pixel attributions are visualized, the following parameters are available to set in the explanation_metadata.json:In addition to the pink_green option for color mapping, which is more colorblind friendly, we also offer red_green. More details on visualization config options can be found in the documentation.To show what’s possible with these customization options, next we’ll experiment with modifying the clip_below_percentile and visualization type parameters. clip_below_percentile dictates how many attributed pixels will be returned on the images you send for prediction. If you set this to 0, leaving clip_above_percentile to the default of 100 your entire image will be highlighted. Whereas if you set clip_below_percentile to 98 as we’ve done in the code snippet above, only the pixels with the top 2% of attribution values will be highlighted. Below, from left to right, are the IG explanations for the top 2%, 10%, and 30% of positive attributed pixels for this model’s prediction of “defective” on this image:The polarity parameter in the visualization config refers to the sign or directionality of the attribution value. For the images above, we used polarity: positive, which shows the pixels with the highest positive attribution values. Put another way, these were the pixels that were most influential in our model’s prediction of “defective” on this image. If we had instead set polarity to negative, the pixels highlighted would show areas that led our model to not associate the image with the label “defective.” Negative polarity attributions can help you debug images that your model predicted incorrectly by identifying false negative regions in the image.Low polarity pixels (those with an absolute attribution value close to 0), on the other hand, indicate pixels that were least important to our model for a given prediction. If our model is performing correctly, the least important pixels would be in the background of the image or on a smooth part of the cast.Sanity checking your image explanationsImage attributions can help you debug your model and ensure it’s picking up on the right signals, but it’s still important to do some sanity checks to ensure you can trust the explanations your model returns. To help you determine how accurate each explanation is, we recently added an approx_error field to the JSON response from explanations. In general, the lower the approx_error value, the more confidence you can have in your model’s explanations. When approx_error is higher than 5%, try increasing the number of steps for your explanation method or making sure you’ve chosen a non-informative baseline. For example, if you’ve chosen a solid white image as your baseline but many of your training images have white backgrounds, you may want to choose something different.You’ll also want to make sure you’re using the right baseline. Besides making sure it reflects the comparison you’re trying to make, you should make sure that it’s generally “non-informative.” This means that your model doesn’t really “see” anything in the baseline image. One simple check for this is to ensure that the score for each predicted class on the baseline is near 1/k, where k is the number of classes.While looking at approx_error and experimenting with different baselines can help you understand how much to trust your explanations, they should not be used as your only basis for evaluating the accuracy of your explanations. Many other factors affect explanation quality, including your training data and model architecture.Finally, it’s worthwhile to keep in mind the general caveats of any explanation method. Explanations reflect the patterns the model found in the data, but they don’t reveal any fundamental relationships in your data sample, population, or application.Next stepsWe’ve only scratched the surface on what’s possible with image explanations. Here are some additional resources if you’d like to learn more:For a full code sample of building and deploying an image model with explanations, check out this notebookIG paperIG visualization paperXRAI paperAI Platform Explanations documentationWe’d love to hear your thoughts and questions about this post, so please don’t hesitate to reach out. You can find me on Twitter at @SRobTweets. And stay tuned for the next post in this series, which will cover how to summarize and present model explanations to external stakeholders.
Quelle: Google Cloud Platform

DCsv2-series VM now generally available from Azure confidential computing

Security and privacy are critically important when storing and processing sensitive information in the cloud, from payment transactions, to financial records, personal health data, and more. With the general availability of DCsv2-series VMs, we are ushering in the start of a new level of data protection in Azure.

With more workloads moving to the cloud and more customers putting their trust in Microsoft, the Azure confidential computing team continues to innovate to provide offerings that keep and build upon that trust. Starting with our world-class security researchers, and working closely with industry partners, we are developing new ways to protect data while it’s in use with Azure confidential computing. DCsv2-series VMs can protect the confidentiality and integrity of your data even while it’s processed.

What is confidential computing?

There are ways to encrypt your data at rest and while in transit, but confidential computing protects the confidentiality and integrity of your data while it is in use. Azure is the first public cloud to offer virtualization infrastructure for confidential computing that uses hardware-based trusted execution environments (TEEs). Even cloud administrators and datacenter operators with physical access to the servers cannot access TEE-protected data.

By combining the scalability of the cloud and ability to encrypt data while in use, new scenarios are possible now in Azure, like confidential multi-party computation where different organizations combine their datasets for compute-intensive analysis without being able to access each other’s data. Examples include banks combining transaction data to detect fraud and money laundering, and hospitals combining patient records for analysis to improve disease diagnosis and prescription allocation.

Data protection powered by Intel hardware

Our DCsv2 confidential computing virtual machines run on servers that implement Intel Software Guard Extensions (Intel SGX). Because Intel SGX hardware protects your data and keeps it encrypted while the CPU is processing it, even the operating system and hypervisor cannot access it, nor can anyone with physical access to the server.

Microsoft and Intel are committed to providing best-in-class cloud data protection through our deep ongoing partnership:

“Customers are demanding the capability to reduce the attack surface and help protect sensitive data in the cloud by encrypting data in use. Our collaboration with Microsoft brings enterprise-ready confidential computing solutions to market and enables customers to take greater advantage of the benefits of cloud and multi-party compute paradigms using Intel® SGX technology.” —Anil Rao, VP Data Center Security and Systems Architecture, Intel

Partners in the Azure Marketplace

Microsoft works directly with platform partners to provide seamless solutions, development, and deployment experiences running on top of our Azure confidential computing infrastructure. Software offerings can be discovered through our Azure Marketplace including:

Fortanix—Offers a cloud-native data security solution including key management, HSM, tokenization, and secrets management built on Azure confidential computing.
Anjuna—Delivers secure Azure instances using end-to-end CPU hardware-level encryption without changing your application or operations.
Anqlave—A valued partner in Singapore, offers enterprise ready confidential computing solutions.

“Anqlave’s proprietary, institutional-grade modern key management and data encryption solution addresses the most critical security issues we face today. With Anqlave Data Vault (ADV) secret management allows users to securely create, store, transport and use its secrets. Leveraging Azure confidential computing, allows us to make this technology more accessible to our enterprise customers and easily support their scale. Providing a secure enclave that is portable in the cloud is one the key reasons why our enterprises will prefer to host their ADV on Azure confidential computing regardless of their other cloud infrastructure.” —Assaf Cohen, CEO, Anqlave

How customers are succeeding with Azure confidential computing

Customers are already using Azure confidential computing for production workloads. One customer is Signal:

“Signal develops open source technology for end-to-end encrypted communications, like messaging and calling. To meet the security and privacy expectations of millions of people every day, we utilize Azure confidential computing to provide scalable, secure environments for our services. Signal puts users first, and Azure helps us stay at the forefront of data protection with confidential computing.” —Jim O’Leary, VP of Engineering, Signal

While many applications and services can take advantage of data protection with confidential computing, we have seen particular benefits with regulated industries, such as financial, government, and healthcare. Companies can now take advantage of the cloud for processing sensitive customer data with reduced risk and higher confidence that their data can be protected, including when processing.

For example, MobileCoin, a new international cryptocurrency trusts Azure confidential computing to support digital currency transfers. Their network code is now available in open source, and a TestNet is available to tryout:

“MobileCoin partners with Azure because Microsoft has decided to invest in trustworthy systems. Confidential computing rides the edge between what we can imagine and what we can protect. The praxis we’ve experienced with Azure allows us to commit to systems that are integral, high trust, and performant.” —Joshua Goldbard, CEO, MobileCoin

Confidential computing has proven useful for enterprise-grade blockchain, enabling fast and secure transaction verification across a decentralized network. Fireblocks is yet another customer taking advantage of Azure confidential computing infrastructure:

“At Fireblocks, our mission is to secure blockchain-based assets and transactions for the financial industry. Once we realized the traditional tech stack was not suitable for this challenge, we turned to Azure confidential computing and Intel SGX to implement our patent-pending technology. Our customers trust Fireblocks to securely store and move their digital assets—over $6.5 billion of them each month—and Azure provides a backbone for us to deliver on that promise.” —Michael Shaulov, CEO and co-founder, Fireblocks

Industry leadership bringing confidential computing to the forefront

Microsoft is not alone in bringing confidential computing to the forefront of the cloud computing industry. In September 2019, we were a founding member of the Confidential Computing Consortium (CCC), which now consists of dozens of companies working to develop and open source technologies and best practices for protecting data while it’s in use. These companies include hardware, cloud, platform, and software providers.

Microsoft is also committed to the developer experience to ensure platform partners and application developers can build solutions that take advantage of confidential computing. We donated our Open Enclave SDK to the consortium, an open source SDK for developing platforms and applications on top of confidential computing infrastructure.

Get started today

Get started deploying your own DCsv2 virtual machine from the Azure Marketplace and install necessary tools. Then, run the Hello World sample using the Open Enclave SDK to begin building confidential workloads in the cloud.
Quelle: Azure

Speed Up Your Development Flow With These Dockerfile Best Practices

The Dockerfile is the starting point for creating a Docker image. The file format provides a well-defined set of directives that allow you to copy files or folders, run commands, set environment variables, and do other tasks required to create a container image. It’s really important to craft your Dockerfile well to keep the resulting image secure, small, quick to build, and quick to update.

In this post, we’ll see how to write good Dockerfiles to speed up your development flow, ensure build reproducibility and that produce images that can be confidently deployed to production.

Note: for this blog post we’ll base our Dockerfile examples on the react-java-mysql sample from the awesome-compose repository.

Development flow

As developers, we want to match our development environment to the target production context as closely as possible to ensure that what we build will work when deployed.

We also want to be able to develop quickly which means we want builds to be fast and for us to be able to use developer tools like debuggers. Containers are a great way to codify our development environment but we need to define our Dockerfile correctly to be able to interact quickly with our containers.

Incremental builds

A Dockerfile is a list of instructions for building your container image. While the Docker builder caches the result of each step as an image layer, the cache can be invalidated causing the step that invalidated the cache and all subsequent steps to need to be rerun and the corresponding layers to be regenerated.

The cache is invalidated when files in the build context that are referenced by COPY or ADD change. The ordering of the steps can therefore have drastic effects on performance.

Let’s take a look at an example where we build a NodeJs project in the Dockerfile. In this project, there are dependencies specified in the package.json file which are fetched when the npm ci command is run.

The simplest Dockerfile would be:

FROM node:ltsENV CI=trueENV PORT=3000WORKDIR /codeCOPY . /codeRUN npm ciCMD [ “npm”, “start” ]

Structuring the Dockerfile as above will cause the cache to be invalidated at the COPY line any time a file in the build context changes. This means that the dependencies will be fetched and the node_modules directory filled when any file is changed instead of just the package.json file which can take a long time.

To avoid this and only fetch the dependencies when they change (i.e.: when package.json or package-lock.json changes), we should consider separating the dependency installation from the build and run of our application.

A more optimized Dockerfile would be this:

FROM node:lts
ENV CI=true
ENV PORT=3000
WORKDIR /code
COPY package.json package-lock.json /code/
RUN npm ci
COPY src /code/src
CMD [ “npm”, “start” ]

Using this separation, if there are no changes in package.json or package-lock.json then the cache will be used for the layer generated by the RUN npm ci instruction. This means that when you edit your application source and rebuild, the dependencies won’t be redownloaded which saves time .

We also limit the second COPY to the src directory as explained in a previous post.

Keep live reload active between the host and the container

This tip is not directly related to the Dockerfile but we often hear this kind of question: How do I keep live reload active while running the app in a container and modifying the source code from my IDE on the host machine?

With our example, we need to mount our project directory in the container and pass an environment variable to enable Chokidar which wraps NodeJS file change events from the host.

$ docker run -e CHOKIDAR_USEPOLLING=true  -v ${PWD}/src/:/code/src/ -p 3000:3000 repository/image_name

Consistent builds

One of the most important things with a Dockerfile is to build the exact same image from the same build context (sources, dependencies…)

We’ll continue to improve the Dockerfile defined in the previous section.

Build consistently from sources

As we saw in the previous section, we’re able to build an application by adding the source files and dependencies in the Dockerfile description and then running commands on them.

But in our previous example we aren’t able to confirm that the image generated will be the same each time we run a docker build…Why? Because each time NodeJS is released, we can expect the lts tag to point to the latest LTS version of the NodeJS image, which will change over time and could introduce breaking changes. We can easily fix this by using a more specific tag for the base image (we’ll let you choose between LTS or the latest stable version )

FROM node:13.12.0ENV CI=trueENV PORT=3000WORKDIR /codeCOPY package.json package-lock.json /code/RUN npm ciCOPY src /code/srcCMD [ “npm”, “start” ]

We’ll see in the No more latest section that there are other advantages to using more specific base image tags and avoiding the latest tag.

Multi-stage and targets to match the right environment

We made the development build consistent, but how can we do this for the production artifact?

Since Docker 17.05, we can use multi-stage builds to define steps to produce our final image. Using this mechanism in our Dockerfile, we’ll be able to split the image we use for our development flow from that used to build the application and that used in production.

FROM node:13.12.0 AS developmentENV CI=trueENV PORT=3000WORKDIR /codeCOPY package.json package-lock.json /code/RUN npm ciCOPY src /code/srcCMD [ “npm”, “start” ]FROM development AS builderRUN npm run buildFROM nginx:1.17.9 AS productionCOPY –from=builder /code/build /usr/share/nginx/html

Each time you see FROM … AS … it’s a build stage.So we now have a development, a build, and a production stage.We can continue to use a container for our development flow by building the specific development stage image using the –target flag.

$ docker build –target development -t repository/image_name:development .

And use it as usual

$ docker run -e CHOKIDAR_USEPOLLING=true -v ${PWD}/src/:/code/src/ repository/image_name:development

A docker build without the –target flag will build the final stage which in this case is the production image. Our production image is simply a nginx image with the binaries built in the previous steps put in the correct place that they are served.

Production ready

It’s really important to keep your production image as lean and as secure as possible. Here are a few things to check before running a container in production.

No more latest image version

As we previously saw in the Build consistently from sources section, using a specific tag for build steps help to make the image build reproducible. There are at least two other very good reasons to use more specific tags for your images: 

You can easily find all the containers running with an image version in your favorite orchestrator (Swarm, Kubernetes…)

# Search in Docker engine containers using our repository/image_name:development image$ docker inspect $(docker ps -q) | jq -c ‘.[] | select(.Config.Image == “repository/image_name:development”) |”(.Id) (.State) (.Config)”‘
“89bf376620b0da039715988fba42e78d42c239446d8cfd79e4fbc9fbcc4fd897 {”Status”:”running”,”Running”:true,”Paused”:false,”Restarting”:false,”OOMKilled”:false,”Dead”:false,”Pid”:25463,”ExitCode”:0,”Error”:””,”StartedAt”:”2020-04-20T09:38:31.600777983Z”,”FinishedAt”:”0001-01-01T00:00:00Z”}{”Hostname”:”89bf376620b0”,”Domainname”:””,”User”:””,”AttachStdin”:false,”AttachStdout”:true,”AttachStderr”:true,”ExposedPorts”:{”3000/tcp”:{}},”Tty”:false,”OpenStdin”:false,”StdinOnce”:false,”Env”:[”CHOKIDAR_USEPOLLING=true”,”PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin”,”NODE_VERSION=12.16.2”,”YARN_VERSION=1.22.4”,”CI=true”,”PORT=3000”],”Cmd”:[”npm”,”start”],”Image”:”repository/image_name:development”,”Volumes”:null,”WorkingDir”:”/code”,”Entrypoint”:[”docker-entrypoint.sh”],”OnBuild”:null,”Labels”:{}}”#Search in k8s pods running a container with our repository/image_name:development image (using jq cli)$ kubectl get pods –all-namespaces -o json | jq -c ‘.items[] | select(.spec.containers[].image == “repository/image_name:development”)| .metadata’{“creationTimestamp”:”2020-04-10T09:41:55Z”,”generateName”:”image_name-78f95d4f8c-“,”labels”:{“com.docker.default-service-type”:””,”com.docker.deploy-namespace”:”docker”,”com.docker.fry”:”image_name”,”com.docker.image-tag”:”development”,”pod-template-hash”:”78f95d4f8c”},”name”:”image_name-78f95d4f8c-gmlrz”,”namespace”:”docker”,”ownerReferences”:[{“apiVersion”:”apps/v1″,”blockOwnerDeletion”:true,”controller”:true,”kind”:”ReplicaSet”,”name”:”image_name-78f95d4f8c”,”uid”:”5ad21a59-e691-4873-a6f0-8dc51563de8d”}],”resourceVersion”:”532″,”selfLink”:”/api/v1/namespaces/docker/pods/image_name-78f95d4f8c-gmlrz”,”uid”:”5c70f340-05f1-418f-9a05-84d0abe7009d”}

In case of CVE (Common Vulnerabilities and Exposure), you can quickly know if you need to patch or not your containers and image descriptions.

From our example we could specify that our development and production images are alpine versions.

FROM node:13.12.0-alpine AS developmentENV CI=trueENV PORT=3000WORKDIR /codeCOPY package.json package-lock.json /code/RUN npm ciCOPY src /code/srcCMD [ “npm”, “start” ]FROM development AS builderRUN npm run buildFROM nginx:1.17.9-alpineCOPY –from=builder /code/build /usr/share/nginx/html

Use official images

You can use Docker Hub to search for base images to use in your Dockerfile, some of these are the officially supported ones. We strongly recommend to use these images as:

their content has been verifiedthey’re updated quickly when a CVE is fixed

You can add an image_filter request query param to only get the official images.

https://hub.docker.com/search?q=nginx&type=image&image_filter=official

All the previous examples in this post were using official images of NodeJS and NGINX.

Just enough permissions!

All applications, running in a container or not, should adhere to the principle of least privilege which means an application should only access the resources it needs. 

In case of malicious behavior or because of bugs, a process running with too many privileges may have unexpected consequences on the whole system at runtime.

Because the NodeJS official image is well setup, we’ll switch to the backend Dockerfile.

Configuring an image to run as an unprivileged user is very easy:

FROM maven:3.6.3-jdk-11 AS builderWORKDIR /workdir/serverCOPY pom.xml /workdir/server/pom.xmlRUN mvn dependency:go-offlineRUN mvn packageFROM openjdk:11-jre-slimRUN addgroup -S java && adduser -S javauser -G javaUSER javauserEXPOSE 8080COPY –from=builder /workdir/server/target/project-0.0.1-SNAPSHOT.jar /project-0.0.1-SNAPSHOT.jarCMD [“java”, “-Djava.security.egd=file:/dev/./urandom”, “-jar”, “/project-0.0.1-SNAPSHOT.jar”]

Simply by creating a new group, adding a user to it and using the USER directive we can run our container with a non-root user.

Conclusion

In this blog post we just showed some of the many ways to optimize and secure your Docker images by carefully crafting your Dockerfile. If you’d like to go further you can take a look at: 

Our official documentation about Dockerfile best practices A previous post on the subject by Tibor VassA session during the DockerCon 2019 by Tibor Vass and Sebastiaan van Stijn Another session during Devoxx 2019 by Jérémie Drouet and myself
The post Speed Up Your Development Flow With These Dockerfile Best Practices appeared first on Docker Blog.
Quelle: https://blog.docker.com/feed/