Understanding the Docker USER Instruction

In the world of containerization, security and proper user management are crucial aspects that can significantly affect the stability and security of your applications. The USER instruction in a Dockerfile is a fundamental tool that determines which user will execute commands both during the image build process and when running the container. By default, if no USER is specified, Docker will run commands as the root user, which can pose significant security risks. 

In this blog post, we will delve into the best practices and common pitfalls associated with the USER instruction. Additionally, we will provide a hands-on demo to illustrate the importance of these practices. Understanding and correctly implementing the USER instruction is vital for maintaining secure and efficient Docker environments. Let’s explore how to manage user permissions effectively, ensuring that your Docker containers run securely and as intended.

Docker Desktop 

The commands and examples provided are intended for use with Docker Desktop, which includes Docker Engine as an integrated component. Running these commands on Docker Community Edition (standalone Docker Engine) is possible, but your output may not match that shown in this post. The blog post How to Check Your Docker Installation: Docker Desktop vs. Docker Engine explains the differences and how to determine what you are using.

UID/GID: A refresher

Before we discuss best practices, let’s review UID/GID concepts and why they are important when using Docker. This relationship factors heavily into the security aspects of these best practices.

Linux and other Unix-like operating systems use a numeric identifier to identify each discrete user called a UID (user ID). Groups are identified by a GID (group ID), which is another numeric identifier. These numeric identifiers are mapped to the text strings used for username and groupname, but the numeric identifiers are used by the system internally.

The operating system uses these identifiers to manage permissions and access to system resources, files, and directories. A file or directory has ownership settings including a UID and a GID, which determine which user and group have access rights to it. Users can be members of multiple groups, which can complicate permissions management but offers flexible access control.

In Docker, these concepts of UID and GID are preserved within containers. When a Docker container is run, it can be configured to run as a specific user with a designated UID and GID. Additionally, when mounting volumes, Docker respects the UID and GID of the files and directories on the host machine, which can affect how files are accessed or modified from within the container. This adherence to Unix-like UID/GID management helps maintain consistent security and access controls across both the host and containerized environments. 

Groups

Unlike USER, there is no GROUP directive in the Dockerfile instructions. To set up a group, you specify the groupname (GID) after the username (UID). For example, to run a command as the automation user in the ci group, you would write USER automation:ci in your Dockerfile.

If you do not specify a GID, the list of groups that the user account is configured as part of is used. However, if you do specify a GID, only that GID will be used. 

Current user

Because Docker Desktop uses a virtual machine (VM), the UID/GID of your user account on the host (Linux, Mac, Windows HyperV/WSL2) will almost certainly not have a match inside the Docker VM.

You can always check your UID/GID by using the id command. For example, on my desktop, I am UID 503 with a primary GID of 20:

$ id
uid=503(jschmidt) gid=20(staff) groups=20(staff),<–SNIP–>

Best practices

Use a non-root user to limit root access

As noted above, by default Docker containers will run as UID 0, or root. This means that if the Docker container is compromised, the attacker will have host-level root access to all the resources allocated to the container. By using a non-root user, even if the attacker manages to break out of the application running in the container, they will have limited permissions if the container is running as a non-root user. 

Remember, if you don’t set a USER in your Dockerfile, the user will default to root. Always explicitly set a user, even if it’s just to make it clear who the container will run as.

Specify user by UID and GID

Usernames and groupnames can easily be changed, and different Linux distributions can assign different default values to system users and groups. By using a UID/GID you can ensure that the user is consistently identified, even if the container’s /etc/passwd file changes or is different across distributions. For example:

USER 1001:1001

Create a specific user for the application

If your application requires specific permissions, consider creating a dedicated user for your application in the Dockerfile. This can be done using the RUN command to add the user. 

Note that when we are creating a user and then switching to that user within our Dockerfile, we do not need to use the UID/GID because they are being set within the context of the image via the useradd command. Similarly, you can add a user to a group (and create a group if necessary) via the RUN command.

Ensure that the user you set has the necessary privileges to run the commands in the container. For instance, a non-root user might not have the necessary permissions to bind to ports below 1024. For example:

RUN useradd -ms /bin/bash myuser
USER myuser

Switch back to root for privileged operations

If you need to perform privileged operations in the Dockerfile after setting a non-root user, you can switch to the root user and then switch back to the non-root user once those operations are complete. This approach adheres to the principle of least privilege; only tasks that require administrator privileges are run as an administrator. Note that it is not recommended to use sudo for privilege elevation in a Dockerfile. For example:

USER root
RUN apt-get update && apt-get install -y some-package
USER myuser

Combine USER with WORKDIR

As noted above, the UID/GID used within a container applies both within the container and with the host system. This leads to two common problems:

Switching to a non-root user and not having permissions to read or write to the directories you wish to use (for example, trying to create a directory under / or trying to write in /root.

Mounting a directory from the host system and switching to a user who does not have permission to read/write to the directory or files in the mount.

USER root
RUN mkdir /app&&chown ubuntu
USER ubuntu
WORKDIR /app

Example

The following example shows you how the UID and GID behave in different scenarios depending on how you write your Dockerfile. Both examples provide output that shows the UID/GID of the running Docker container. If you are following along, you need to have a running Docker Desktop installation and a basic familiarity with the docker command.

Standard Dockerfile

Most people take this approach when they first begin using Docker; they go with the defaults and do not specify a USER.

# Use the official Ubuntu image as the base
FROM ubuntu:20.04

# Print the UID and GID
CMD sh -c "echo 'Inside Container:' && echo 'User: $(whoami) UID: $(id -u) GID: $(id -g)'"

Dockerfile with USER

This example shows how to create a user with a RUN command inside a Dockerfile and then switch to that USER.

# Use the official Ubuntu image as the base
FROM ubuntu:20.04

# Create a custom user with UID 1234 and GID 1234
RUN groupadd -g 1234 customgroup &&
useradd -m -u 1234 -g customgroup customuser

# Switch to the custom user
USER customuser

# Set the workdir
WORKDIR /home/customuser

# Print the UID and GID
CMD sh -c "echo 'Inside Container:' && echo 'User: $(whoami) UID: $(id -u) GID: $(id -g)'"

Build the two images with:

$ docker build -t default-user-image -f Dockerfile1 .
$ docker build -t custom-user-image -f Dockerfile2 .

Default Docker image

Let’s run our first image, the one that does not provide a USER command. As you can see, the UID and GID are 0/0, so the superuser is root. There are two things at work here. First, we are not defining a UID/GID in the Dockerfile so Docker defaults to the superuser. But how does it become a superuser if my account is not a superuser account? This is because the Docker Engine runs with root permissions, so containers that are built to run as root inherit the permissions from the Docker Engine.

$ docker run –rm default-user-image
Inside Container:
User: root UID: 0 GID: 0
Custom User Docker Image

Custom Docker image

Let’s try to fix this — we really don’t want Docker containers running as root. So, in this version, we explicitly set the UID and GID for the user and group. Running this container, we see that our user is set appropriately.

$ docker run –rm custom-user-image
Inside Container:
User: customuser UID: 1234 GID: 1234

Enforcing best practices

Enforcing best practices in any environment can be challenging, and the best practices outlined in this post are no exception. Docker understands that organizations are continually balancing security and compliance against innovation and agility and is continually working on ways to help with that effort. Our Enhanced Container Isolation (ECI) offering, part of our Hardened Docker Desktop, was designed to address the problematic aspects of having containers running as root.

Enhanced Container Isolation mechanisms, such as user namespaces, help segregate and manage privileges more effectively. User namespaces isolate security-related identifiers and attributes, such as user IDs and group IDs, so that a root user inside a container does not map to the root user outside the container. This feature significantly reduces the risk of privileged escalations by ensuring that even if an attacker compromises the container, the potential damage and access scope remain confined to the containerized environment, dramatically enhancing overall security.

Additionally, Docker Scout can be leveraged on the user desktop to enforce policies not only around CVEs, but around best practices — for example, by ensuring that images run as a non-root user and contain mandatory LABELs.

Staying secure

Through this demonstration, we’ve seen the practical implications and benefits of configuring Docker containers to run as a non-root user, which is crucial for enhancing security by minimizing potential attack surfaces. As demonstrated, Docker inherently runs containers with root privileges unless specified otherwise. This default behavior can lead to significant security risks, particularly if a container becomes compromised, granting attackers potentially wide-ranging access to the host or Docker Engine.

Use custom user and group IDs

The use of custom user and group IDs showcases a more secure practice. By explicitly setting UID and GID, we limit the permissions and capabilities of the process running inside the Docker container, reducing the risks associated with privileged user access. The UID/GID defined inside the Docker container does not need to correspond to any actual user on the host system, which provides additional isolation and security.

User namespaces

Although this post extensively covers the USER instruction in Docker, another approach to secure Docker environments involves the use of namespaces, particularly user namespaces. User namespaces isolate security-related identifiers and attributes, such as user IDs and group IDs, between the host and the containers. 

With user namespaces enabled, Docker can map the user and group IDs inside a container to non-privileged IDs on the host system. This mapping ensures that even if a container’s processes break out and gain root privileges within the Docker container, they do not have root privileges on the host machine. This additional layer of security helps to prevent the escalation of privileges and mitigate potential damage, making it an essential consideration for those looking to bolster their Docker security framework further. Docker’s ECI offering leverages user namespaces as part of its security framework.

Conclusion

When deploying containers, especially in development environments or on Docker Desktop, consider the aspects of container configuration and isolation outlined in this post. Implementing the enhanced security features available in Docker Business, such as Hardened Docker Desktop with Enhanced Container Isolation, can further mitigate risks and ensure a secure, robust operational environment for your applications.

Learn more

Read the Dockerfile reference guide.

Get the latest release of Docker Desktop.

Explore Docker Guides.

New to Docker? Get started.

Subscribe to the Docker Newsletter.

Quelle: https://blog.docker.com/feed/

How to Measure DevSecOps Success: Key Metrics Explained

DevSecOps involves the integration of security throughout the entire software development and delivery lifecycle, representing a cultural shift where security is a collective responsibility for everyone building software. By embedding security at every stage, organizations can identify and resolve security issues earlier in the development process rather than during or after deployment.

Organizations adopting DevSecOps often ask, “Are we making progress?” To answer this, it’s crucial to implement metrics that provide clear insights into how an organization’s security posture evolves over time. Such metrics allow teams to track progress, pinpoint areas for improvement, and make informed decisions to drive continuous improvement in their security practices. By measuring the changing patterns in key indicators, organizations can better understand the impact of DevSecOps and make data-driven adjustments to strengthen their security efforts. 

Organizations commonly have many DevSecOps metrics that they can draw from. In this blog post, we explore two foundational metrics for assessing DevSecOps success. 

Key DevSecOps metrics

1. Number of security vulnerabilities over time

Vulnerability analysis is a foundational practice for any organization embarking on a software security journey. This metric tracks the volume of security vulnerabilities identified in a system or software project over time. It helps organizations spot trends in vulnerability detection and remediation, signaling how promptly security gaps are being remediated or mitigated. It can also be an indicator of the effectiveness of an org’s vulnerability management initiatives and their adoption, both of which are crucial to reducing the risk of cyberattacks and data breaches.

2. Compliance with security policies

Many industries are subject to cybersecurity frameworks and regulations that require organizations to maintain specific security standards. Policies provide a way for organizations to codify the rules for producing and using software artifacts. By tracking policy compliance over time, organizations can verify consistent adherence to established security requirements and best practices, promoting a unified approach to software development.

The above metrics are a good starting point for most organizations looking to measure their transformation from DevSecOps activities. The next step — once these metrics are implemented — is to invest in an observability system that enables relevant stakeholders, such as security engineering, to easily consume the data. 

DevSecOps insights with Docker Scout

Organizations interested in evaluating their container images against these metrics can get started in a few simple steps with Docker Scout. The Docker Scout web interface provides a comprehensive dashboard for CISOs, security teams, and software developers, offering an overview of vulnerability trends and policy compliance status (Figure 1). The web interface is a one-stop shop where users can drill down into specific images for deeper investigations and customize out-of-the-box policies to meet their specific needs.

Figure 1: Docker Scout dashboard.

Furthermore, the Docker Scout metrics exporter is a powerful addition to the Docker Scout ecosystem to bring vulnerability and policy compliance metrics into existing monitoring systems. This HTTP endpoint enables users to configure Prometheus-compatible tools to scrape Docker Scout data, allowing organizations to integrate with popular observability tools like Grafana and Datadog to achieve centralized security observability. 

Figures 2 and 3 show two sample Grafana dashboards illustrating the vulnerability trends and policy compliance insights that Docker Scout can provide.

Figure 2: Grafana Dashboard — Policy compliance.

Figure 2 displays a dashboard that illustrates the compliance posture for each policy configured within a Docker Scout organization. This visualization shows the proportion of images in a stream that complies with the defined policies. At the top of the dashboard, you can see the current compliance rate for each policy, while the bottom section shows compliance trends over the past 30 days.

Figure 3 shows a second Grafana dashboard illustrating the number of vulnerabilities by severity over time within a given stream. In this example, you can see notable spikes across all vulnerabilities, indicating the need for deeper investigation and prioritizing remediation.

Figure 3: Grafana Dashboard — Vulnerabilities by severity trends.

Conclusion

Docker Scout metrics exporter is designed to help security engineers improve containerized application security posture in an operationally efficient way. To get started, follow the instructions in the documentation. The instructions will get you up and running with the current public release of metrics exporter. 

Our product team is always open to feedback on social channels such as X and Slack and is looking for ways to evolve the product to align with our customers’ use cases.

Learn more

Visit the Docker Scout product page.

Looking to get up and running? Use our Docker Scout quickstart guide.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Subscribe to the Docker Newsletter.

Quelle: https://blog.docker.com/feed/

New Beta Feature: Deep Dive into GitHub Actions Docker Builds with Docker Desktop

We’re excited to announce the beta release of a new feature for inspecting GitHub Actions builds directly in Docker Desktop 4.31. 

Centralized CI/CD environments, such as GitHub Actions, are popular and useful, giving teams a single location to build, test, and verify their deployments. However, remote processes, such as builds in GitHub Actions, often lack visibility into what’s happening in your Docker builds. This means developers often need additional builds and steps to locate the root cause of issues, making diagnosing and resolving build issues challenging. 

To help, we’re introducing enhancements in GitHub Actions Summary and Docker Desktop to provide a deeper understanding of your Docker builds in GitHub Actions.

Get a high-level view with Docker Build Summary in GitHub Actions 

We now provide Docker Build Summary, a GitHub Actions Summary that displays reports and aggregates build information. The Docker Build Summary offers additional details about your builds in GitHub Actions, including a high-level summary of performance metrics, such as build duration, and cache utilization (Figure 1). Users of docker/build-push-action and docker/bake-action will automatically receive Docker Build Summaries. 

Key benefits

Identify build failures: Immediate access to error details eliminates the need to sift through logs.

Performance metrics: See exactly how long the Docker Build stage took and assess if it met expectations.

Cache utilization: View the percentage of the build that used the cache to identify performance impacts.

Configuration details: Access information on build inputs to understand what ran during build time.

Figure 1: Animated view of Docker Build Summary in GitHub Actions, showing Build details, including Build status, error message, metrics, Build inputs, and more.

If further investigation is needed, we package your build results in a .dockerbuild archive file. This file can be imported to the Build View in Docker Desktop, providing comprehensive build details, including timings, dependencies, logs, and traces.

Import and inspect GitHub Actions Builds in Docker Desktop

Initially announced last year, the Build View in Docker Desktop now supports importing the dockerbuild archive from GitHub Actions, providing greater insight into your Docker builds. 

In Docker Desktop, navigate to the Builds View tab and use the new Import Builds button. Select the .dockerbuild file you downloaded to access all the details about your remote build as if you ran it locally (Figure 2). 

Figure 2: Animated view of Docker Desktop, showing steps to navigate to the Builds View tab and use the new Import Builds button.

You can view in-depth information about your build execution, including error lines in your Dockerfile, build timings, cache utilization, and OpenTelemetry traces. This comprehensive view helps diagnose complex builds efficiently.

For example, you can see the stack trace right next to the Dockerfile command that is causing the issues, which is useful for understanding the exact step and attributes that caused the error (Figure 3).

Figure 3: Inspecting a build error in Builds View.

You can even see the commit and source information for the build and easily locate who made the change for more help in resolving the issue, along with other useful info you need for diagnosing even the most complicated builds (Figure 4). 

Figure 4: Animated view of Docker Desktop showing info for inspecting an imported build, such as source details, build timing, dependencies, configuration, and more.

Enhance team collaboration

We aim to enhance team collaboration, allowing you to share and work together on Docker builds and optimizing the build experience for your team. These .dockerbuild archives are self-contained and don’t expire, making them perfect for team collaboration. Share the .dockerbuild file via Slack or email or attach it to GitHub issues or Jira tickets to preserve context for when your team investigates.

Get started

To start using Docker Build Summary and the .dockerbuild archive in Docker Desktop, update your Docker Build GitHub Actions configuration to:

uses: docker/build-push-action@v6

uses: docker/bake-action@v5

Then, update to Docker Desktop 4.31 to inspect build archives from GitHub Actions. Learn more in the documentation.

We are incredibly excited about these new features, which will help you and your team diagnose and resolve build issues quickly. Please try them out and let us know what you think!

Learn more

Subscribe to the Docker Newsletter.

Get the latest release of Docker Desktop.

Vote on what’s next! Check out our public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Quelle: https://blog.docker.com/feed/

LXC vs. Docker: Which One Should You Use?

In today’s evolving software development landscape, containerization technology has emerged as a key tool for developers aiming to enhance efficiency and ensure consistency across environments. Among the various container technologies available today, Linux Containers (LXC) and Docker are two of the most popular choices. Understanding the differences between these technologies is crucial for developers to select the right tool that aligns with their specific project needs. 

This blog post delves into the LXC vs. Docker virtual environments, exploring their functionalities and helping you make an informed decision.

What is LXC?

Linux Containers, or LXC, is an advanced virtualization technology that utilizes key features of the Linux kernel to create lightweight and efficient isolated environments for running multiple applications on a single host system. This technology uses Linux kernel features, such as cgroups (control groups) and namespaces, to manage system resources and provide process isolation.

LXC began as an open source project to provide a virtualization method that operates at the operating system level, using the Linux kernel’s inherent capabilities. The project emerged in the late 2000s — with significant contributions from IBM, among others — and quickly became part of the mainstream Linux kernel. This integration allowed LXC to benefit from native support and optimizations, facilitating its adoption and ongoing development.

LXC has played a pivotal role in the evolution of container technologies. It laid the groundwork for future innovations in containerization, including the development of Docker, which initially relied on LXC as its default execution environment before transitioning to its own container runtime, libcontainer (now part of runc).

Key features of LXC

Resource management with cgroups: LXC manages resource allocation using cgroups, ensuring that each container has access to the resources it needs without impacting others, promoting efficient and stable performance.

Isolation with namespaces: Namespaces ensure that containers are kept isolated from each other, preventing processes in one container from interfering with those in another. This feature enhances security and system reliability.

Benefits of LXC

Lightweight nature: Unlike traditional virtual machines that require separate operating system (OS) instances, LXC containers share the host system’s kernel, making them more resource-efficient and faster to start.

Proximity to the operating system: Thanks to its integration with the Linux kernel, LXC provides functionality similar to that of virtual machines but with a fraction of the resource demand.

Efficient use of system resources: LXC maximizes resource utilization and scalability by enabling multiple containers to run on a single host without the overhead of multiple OS instances.

LXC is especially beneficial for users who need granular control over their environments and applications that require near-native performance. As an open source project, LXC continues to evolve, shaped by a community of developers committed to enhancing its capabilities and integration with the Linux kernel. LXC remains a powerful tool for developers looking for efficient, scalable, and secure containerization solutions.

What are Docker containers?

Docker offers a comprehensive platform and suite of tools that has revolutionized how applications are developed, shipped, and run. It is built upon the concept of containerization, simplifying it to such an extent that it has become synonymous with containers.

Docker, which launched in 2013, initially utilized LXC to provide an easier way to create, deploy, and run applications using containers. Docker’s introduction marked a significant shift in virtualization technology, offering a lighter, faster, and more agile way of handling applications than traditional virtual machines. Docker quickly evolved from using LXC as its default execution environment by developing its own container runtime, libcontainer, which now powers Docker containers.

This move enabled Docker to provide a standardized unit of software deployment, encapsulating applications and their dependencies in containers that could run anywhere, from a developer’s local laptop to a production server in the cloud.

Docker’s ecosystem

Docker DesktopDocker EngineDocker ScoutDocker HubDocker Build CloudKnown for its user-friendly interface, Docker Desktop simplifies tasks in building, running, and managing containers.The core runtime component of Docker shipped in Docker Desktop provides a lightweight and secure environment for running containerized applications.Docker Scout delivers real-time actionable insights, making it simple to secure and manage the software supply chain end-to-end.The world’s largest and most widely used image repository, Docker Hub serves as the go-to container registry for developers to share and manage containerized applications securely.Docker Build Cloud is a premium service that enhances the image-building process in enterprise environments.

These tools collectively form a solution stack that addresses the entire lifecycle of container management, from development to deployment. 

How Docker enhances LXC

Although Docker started with LXC, it added significant value by layering tools and services that enhance user experience and management capabilities. Docker Desktop abstracts much of the complexity of managing containers through user-friendly interfaces and commands, making containerization accessible to a broader range of developers.

Docker containers are lightweight, portable, and self-sufficient units that contain everything needed to run an application. They ensure consistency across multiple development and deployment environments.

Key benefits of Docker containers

Portability: Containers can be moved effortlessly between environments, from development to testing to production, without needing changes, thanks to Docker’s ability to ensure consistency across platforms.

Ease of use: Docker simplifies container management with intuitive commands like docker run, significantly lowering the learning curve for new users.

Vast ecosystem: Docker’s extensive library of container images available on Docker Hub and a wide array of management tools support rapid application development and deployment.

The evolution of Docker from a product that simplified the use of LXC to a comprehensive ecosystem that defines modern containerization practices showcases its transformative impact on the technology landscape. Docker made containers mainstream and established a global community of developers and organizations that continue to innovate on its platform.

Understanding use cases for LXC and Docker

Understanding their strengths and typical use cases is crucial when deciding between LXC and Docker. Both technologies serve the purpose of containerization but cater to different operational needs and user profiles.

LXC use cases

Efficient access to hardware resources: LXC’s close interaction with the host OS allows it to achieve near-native performance, which is beneficial for applications that require intensive computational power or direct hardware access. This can include data-heavy applications in fields like data analysis or video processing where performance is critical.

Virtual Desktop Infrastructure (VDI): LXC is well-suited for VDI setups because it can run full operating systems with a smaller footprint than traditional VMs. This makes LXC ideal for businesses deploying and managing virtual desktops efficiently.

LXC is not typically used for application development but for scenarios requiring full OS functionality or direct hardware integration. Its ability to provide isolated and secure environments with minimal overhead makes it suitable for infrastructure virtualization where traditional VMs might be too resource-intensive.

Docker use cases

Docker excels in environments where deployment speed and configuration simplicity are paramount, making it an ideal choice for modern software development. Key use cases where Docker demonstrates its strengths include:

Streamlined deployment: Docker packages applications into containers along with all their dependencies, ensuring consistent operation across any environment, from development through to production. This eliminates common deployment issues and enhances reliability.

Microservices architecture: Docker supports the development, deployment, and scaling of microservices independently, enhancing application agility and system resilience. Its integration with Kubernetes further streamlines the orchestration of complex containerized applications, managing their deployment and scaling efficiently.

CI/CD pipelines: Docker containers facilitate continuous integration and deployment, allowing developers to automate the testing and deployment processes. This approach reduces manual intervention and accelerates release cycles.

Extensive image repository and configuration management: Docker Hub offers a vast repository of pre-configured Docker images, simplifying application setup. Docker’s configuration management capabilities ensure consistent container setups, easing maintenance and updates.

Docker’s utility in supporting rapid development cycles and complex architectures makes it a valuable tool for developers aiming to improve efficiency and operational consistency in their projects.

Docker vs. LXC: Detailed comparison chart

Feature/AspectDockerLXCCore functionalityApplication containerization; runs apps in isolated containers.OS-level virtualization; runs multiple Linux systems on a host from a single OS.User interfaceHigh-level commands and graphical interface options for simpler management.Lower-level, command-line focused with finer granular control over settings.Ease of useUser-friendly for developers with minimal Linux/container knowledge.Requires more in-depth knowledge of Linux systems and configurations.Setup complexitySimplified setup with pre-built packages and extensive documentation.More complex setup requiring detailed OS configuration knowledge.PerformanceLightweight, with minimal overhead; suitable for microservices.Close to native performance, suitable for intensive computational tasks.SecurityStrong isolation with Docker Engine, support for namespaces, and cgroups.Uses Linux kernel security features, including AppArmor and SELinux profiles.ScalabilityHighly scalable, ideal for applications needing quick scaling.Less scalable compared to Docker; best used for more static, controlled environments.Application use casesIdeal for CI/CD pipelines, microservices, and any container-based applications.Best for running full Linux distributions, VDI, or applications needing direct hardware access.Resource efficiencyHighly efficient in resource usage due to shared OS components.More resource-intensive than Docker but less so than traditional VMs.Community and ecosystemLarge community with a vast ecosystem of tools and resources.Smaller community focused mainly on system administrators and advanced users.Typical deploymentCommon in development environments, cloud platforms, and serverless computing.Used in environments requiring stable, long-term deployments without frequent changes.

Although Docker and LXC are both powerful options for building containers, they serve different purposes and are suitable for different skill levels. Docker is designed for developers who want to quickly and efficiently build and deploy applications in various environments with minimal setup. On the other hand, LXC is more suitable for users who need a lightweight alternative to virtual machines and want more control over the operating system and hardware.

Conclusion

Choosing between Linux Containers vs. Docker depends on your project’s specific needs and operational environment. 

LXC is ideal for scenarios requiring full operating system functionality or extensive hardware interaction, making it suitable for projects needing deep system control or stable, long-term deployments. 

Docker is optimized for developers seeking to enhance application development and deployment efficiency, particularly in dynamic environments that demand rapid scaling and frequent updates. 

Each platform offers unique benefits tailored to different technical requirements and use cases, ensuring the selection aligns with your project goals and infrastructure demands.

Try Docker containers using Docker Desktop today.

Learn more

Get started by exploring Docker containers.

Download the latest version of Docker Desktop. 

Learn more about Linux containers. 

Visit Docker Resources to explore more materials.

Subscribe to the Docker newsletter. 

Quelle: https://blog.docker.com/feed/

Docker Launches 2024 State of Application Development Report

Docker launched its 2024 State of Application Development Report, providing a deep-focus snapshot of today’s rapidly evolving world of software development. Based on a wide-ranging survey of more than 1,300 respondents, the report shares a broad array of findings about respondents’ work, including what tools they use, their processes and frustrations, opinions about industry trends, participation in developer communities, Docker usage, and more.

What emerges is an illuminating picture of the current state of application development, along with insights into key trends such as the expanding roles of cloud and artificial intelligence/machine learning (AI/ML) in software development, the continued rise of microservices, and attitudes toward the shift-left approach to security.

Reflecting the changing state of the industry, the 2024 report drills down into three main areas: 

The state of application development today 

AI’s expanding role in application development

Security in application development

The online survey is a key vector through which Docker product managers, engineers, and designers gather insights from users to continuously develop and improve the company’s suite of tools.

“The findings in this report demonstrate how Docker continuously seeks to address market needs so that we can better empower development teams not just to compete, but to thrive and innovate with the right processes and tools for their workflows,” said Nahid Samsami, Vice President of Developer Experience at Docker.

Read on for key findings from this year’s report and download the full report for more details.

The rise of cloud for software development

A key insight was the growing popularity of developing software in the cloud. When asked about their main development environment, almost 64% of respondents cited their laptop or desktop. But the real story is that more than 36% cited non-local environments, such as ephemeral environments, personal remote dev environments or clusters, and remote development tools such as GitHub Codespaces, Gitpod, and Coder.

These findings appear to underscore the growing popularity of developing software in the cloud — a trend fanned by benefits such as increased efficiency, shorter build times, reduced time to market, and faster innovation. 

Why this apparent increase in reliance on cloud during development? It seems likely that the growing size of applications is a factor, along with the increasing number of dependencies and overall growth in complexity — all of which would render an all-local environment difficult, if not impossible, to maintain in parity with production.

AI/ML goes mainstream in app development

Another key finding was the growing penetration of AI/ML into the software development field. Most respondents (64%) reported already using AI for work — for tasks such as code writing, documentation, and research.

This trend is notably driven by junior/mid-level developers and DevOps/platform engineers, who expressed a higher dependency on AI compared with senior developers surveyed. In terms of AI tools, respondents most often used ChatGPT (46%), GitHub Copilot (30%), and Gemini (formerly Bard) (19%).

This year’s survey showed a growing interest in ML engineering and data science within the Docker community. When asked if they were working on ML in any capacity, almost half of respondents (46%) replied in the affirmative. Within that group, 54% said that they trained and deployed ML models in one or more projects, 43% worked on ML infrastructure, and 39% leveraged pre-trained ML models.

Where developers get stuck

A primary goal of the survey was to gain insights into how Docker can improve the app development experience. When we asked where their team gets stuck in the development process, respondents cited multiple stages, including planning (31%), estimation (24%), and designing (22%). Planning was also one of the most-selected areas in which respondents desired better tools (28% of respondents). 

These findings demonstrate that respondents hit sticking points in project-level tasks before development. However, there are areas identified for improvement within the development process itself, as 20% of respondents reported getting stuck during debugging/troubleshooting or testing phases. Testing was also one of the top areas in which respondents wanted better tools (28%).

Microservices, security, and open source

Other notable trends include the continued rise of microservices, frustration with the shift-left approach to security, and interest in open source. Underscoring the growing popularity of microservices, nearly three times more respondents (29%) said they were transitioning from monolithic to microservices than were moving in the other direction, from microservices to monolithic (11%).

The shift-left approach to security appears to be a source of frustration for developers and an area where more effective tools could make a difference. Security-related tasks topped the list of those deemed difficult/very difficult, with 34% of respondents selecting one of these options. Regarding the need for better tools in the development process, 25% of respondents selected security/vulnerability remediation tools (behind testing, planning, and monitoring/logging/maintenance).

Open source software is important to developer ecosystems and communities, with 59% of respondents saying they contributed to open source in the past year, compared with 41% saying they did not. Of the 41% who did not contribute, a large majority (72%) expressed interest in contributing to open source, while less than 25% did not. 

Get the full report

The 2024 Docker State of Application Development Report is based on an online, 20-minute survey conducted by Docker’s User Research Team in the fall of 2023. Survey respondents ranged from home hobbyists to professionals at companies with more than 5,000 employees. The findings are based on 885 completed responses from the roughly 1,300 respondents surveyed. 

The survey was developed to inform Docker’s product strategy. Given the fascinating information we discovered, we wanted to share the findings with the community. This was the second annual Docker State of Application Development survey; the third will take place in the fall of 2024. 

Download the full report now. 

Learn more

Read the AI Trends Report 2024: AI’s Growing Role in Software Development.

Subscribe to the Docker Newsletter.

Get the latest release of Docker Desktop.

Vote on what’s next! Check out our public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Docker’s User Research Team — Olga Diachkova, Julia Wilson, and Rebecca Floyd — conducted this survey, analyzed the results, and provided insights.

For a complete methodology, contact uxresearch@docker.com.
Quelle: https://blog.docker.com/feed/

Docker Desktop 4.31: Air-Gapped Containers, Accelerated Builds, and Beta Releases of Docker Desktop on Windows on Arm, Compose File Viewer, and GitHub Actions

In this post:

Air-gapped containers: Ensuring security and compliance

Accelerating Builds in Docker Desktop with Docker Build Cloud

Docker Desktop on Windows on Arm (Beta)

Compose File Viewer (Beta)

Enhanced CI visibility with GitHub Actions in Docker Desktop (Beta)

Docker Desktop’s latest release continues to empower development teams of every size, providing a secure hybrid development launchpad that supports productively building, sharing, and running innovative applications anywhere. 

Highlights from the Docker Desktop 4.31 release include: 

Air-gapped containers help secure developer environments and apps to ensure peace of mind. 

Accelerating Builds in Docker Desktop with Docker Build Cloud helps developers build rapidly to increase productivity and ROI.

Docker Desktop on Windows on Arm (WoA) Beta continues our commitment to supporting the Microsoft Developer ecosystem by leveraging the newest and most advanced development environments.

Compose File Viewer (Beta) see your Compose configuration with contextual docs.

Enhanced CI visibility with GitHub Actions in Docker Desktop (Beta) that streamline accessing detailed GitHub Actions build summaries, including performance metrics and error reports, directly within the Docker Desktop UI.

Air-gapped containers: Ensuring security and compliance

For our business users, we introduce support for air-gapped containers. This feature allows admins to configure Docker Desktop to restrict containers from accessing the external network (internet) while enabling access to the internal network (private network). Docker Desktop can apply a custom set of proxy rules to network traffic from containers. The proxy can be configured to allow network connections, reject network connections, and tunnel through an HTTP or SOCKS proxy (Figure 1). This enhances security by allowing admins to choose which outgoing TCP ports the policy applies to and whether to forward a single HTTP or SOCKS proxy, or to implement policy per destination via a PAC file.

Figure 1: Assuming enforced sign-in and Settings Management are enabled, add the new proxy configuration to the admin-settings.json file.

This functionality enables you to scale securely and is especially crucial for organizations with strict security requirements. Learn more about air-gapped containers on our Docker Docs.  

Accelerating Builds in Docker Desktop with Docker Build Cloud 

Did you know that in your Core Docker Subscription (Personal, Pro, Teams, Business) you have an included allocation of Docker Build Cloud minutes? Yes! This allocation of cloud compute time and shared cache lets you speed up your build times when you’re working with multi-container apps or large repos. 

For organizations, your build minutes are shared across your team, so anyone allocated Docker Build Cloud minutes with their Docker Desktop Teams or Business subscription can leverage available minutes and purchase additional minutes if necessary. Docker Build Cloud works for both developers building locally and in CI/CD.With Docker Desktop, you can use these minutes to accelerate your time to push and gain access to the Docker Build Cloud dashboard (build.docker.com)  where you can view build history, manage users, and view your usage stats. 

And now, from build.docker.com, you can quickly and easily create your team’s cloud builder using a one-click setup that connects your cloud builder to Docker Desktop. At the same time, you can choose to configure the Build Cloud builder as the default builder in Docker Desktop in about 30 seconds — check the Set the default builder radio button during the Connect via Docker Desktop setup (Figure 2).

Figure 2: Setting the default builder in Docker Desktop.

Docker Desktop on Windows on Arm

At Microsoft Build, we were thrilled to announce that Docker Desktop is available on Windows on Arm (WoA) as a beta release. This version will be available behind authentication and is aimed at users with Arm-based Windows devices. This feature ensures that developers using these devices can take full advantage of Docker’s capabilities. 

To learn more about leveraging WoA to accelerate your development practices, watch the Microsoft Build Session Introducing the Next Generation of Windows on Arm with Ivette Carreras and Jamshed Damkewala. You can also learn about the other better-together opportunities between Microsoft and Docker by visiting our Microsoft Build Docker Page and reading our event highlights blog post. 

Compose File Viewer (Beta)

With Compose File Viewer (Beta), developers can now see their Docker Compose configuration file in Docker Desktop, with relevant docs linked. This makes it easier to understand your Compose YAML at a glance, with proper syntax highlighting. 

Check out this new File Viewer through the View Configuration option in the Compose command line or by viewing a Compose stack in the Containers tab, then clicking the View Configuration button.

Introducing enhanced CI visibility with GitHub Actions in Docker Desktop

We’re happy to announce the beta release of our new feature for inspecting GitHub Actions builds directly in Docker Desktop. This enhancement provides in-depth summaries of Docker builds, including performance metrics, cache utilization, and detailed error reports. You can download build results as a .dockerbuild archive and inspect them locally using Docker Desktop 4.31. Now you can access all the details about your CI build as if you had reproduced them locally. 

Figure 3: Docker Desktop 4.31 Builds tab supporting one-click importing of builds from GitHub Actions.

Not familiar with the Builds View in Docker Desktop? It’s a feature we introduced last year to give you greater insight into your local Docker builds. Now, with the import functionality, you can explore the details of your remote builds from GitHub Actions just as thoroughly in a fraction of the time. This new capability aims to improve CI/CD efficiency and collaboration by offering greater visibility into your builds. Update to Docker Desktop 4.31 and configure your GitHub Actions with docker/build-push-action@v5  or docker/bake-action@v4 to get started.

Conclusion 

With this latest release, we’re doubling down on our mission to support Docker Desktop users with the ability to accelerate innovation, enable security at scale, and enhance productivity. 

Stay tuned for additional details and upcoming releases. Thank you for being part of our community as we continuously strive to empower development teams. 

Learn more

Authenticate and update to receive your subscription level’s newest Docker Desktop features.

New to Docker? Create an account. 

Visit our Microsoft Build Docker Page to learn about our partnership in supporting Microsoft developers.

Learn how Docker Build Cloud in Docker Desktop can accelerate builds.

Secure Your Supply Chain with Docker Scout in Docker Desktop.

Learn more about air-gapped containers.

Subscribe to the Docker Newsletter.

Quelle: https://blog.docker.com/feed/

10 Years Since Kubernetes Launched at DockerCon

It is not often you can reflect back and pinpoint a moment where an entire industry changed, less often to pinpoint that moment and know you were there to see it first hand.

On June 10th, 2014, day 2 of the first ever DockerCon, 16:04 seconds into his keynote speech, Google VP of Infrastructure Eric Brewer announced that Google was releasing the open source solution they built for orchestrating containers: Kubernetes. This was one of those moments. The announcement of Kubernetes began a tectonic shift in how the internet runs at scale, so many of the most important applications in the world today would not be possible without Docker and Kubernetes.

You can watch the announcement on YouTube.

We didn’t know how much Kubernetes would change things at that time. In fact, in those two days, Apache Mesos, Red Hat’s GearD, Docker Libswarm, and Facebook’s Tupperware were all also launched. This triggered what later became known by some as “the Container Orchestration War.” Fast forward three years and the community had consolidated on Kubernetes for the orchestration layer and Docker (powered by containerd) for the container format, distribution protocol, and runtime. In 2017,  Docker integrated Kubernetes in its desktop and server products, and this helped cement Kubernetes leadership.

Why was it so impactful? Kubernetes landed at just the right time and solved just the right problems. The number of containers and server nodes in production was increasing exponentially every day. The role of DevOps put a lot of burden on the engineer. They needed solutions that could help manage applications at unprecedented scale. Containers and their orchestration engines were, and continue to be, the lifeblood of modern application deployments because they are the only real way to solve this need.

We, the Docker team and community, consider ourselves incredibly fortunate to have played a role in this history. To look back and say we had a part in what has been built from that one moment is humbling.

… and the potential of what is yet to come is beyond exciting! Especially knowing that our impact continues today as a keystone to modern application development. Docker enables app development teams to rapidly deliver applications, secure their software supply chains, and do so without compromising the visibility and controls required by the business.

Happy 10th birthday Kubernetes! Congratulations to all who were and continue to be involved in creating this tremendous gift to the software industry.

Learn more

Join the conversation on LinkedIn.

Build Kubernetes-ready applications on your desktop with Docker.

Get the latest Docker news.

Quelle: https://blog.docker.com/feed/

Develop Kubernetes Operators in Java without Breaking a Sweat

Developing Kubernetes operators in Java is not yet the norm. So far, Go has been the language of choice here, not least because of its excellent support for writing corresponding tests. 

One challenge in developing Java-based projects has been the lack of easy automated integration testing that interacts with a Kubernetes API server. However, thanks to the open source library Kindcontainer, based on the widely used Testcontainers integration test library, this gap can be bridged, enabling easier development of Java-based Kubernetes projects. 

In this article, we’ll show how to use Testcontainers to test custom Kubernetes controllers and operators implemented in Java.

Kubernetes in Docker

Testcontainers allows starting arbitrary infrastructure components and processes running in Docker containers from tests running within a Java virtual machine (JVM). The framework takes care of binding the lifecycle and cleanup of Docker containers to the test execution. Even if the JVM is terminated abruptly during debugging, for example, it ensures that the started Docker containers are also stopped and removed. In addition to a generic class for any Docker image, Testcontainers offers specialized implementations in the form of subclasses — for components with sophisticated configuration options, for example. 

These specialized implementations can also be provided by third-party libraries. The open source project Kindcontainer is one such third-party library that provides specialized implementations for various Kubernetes containers based on Testcontainers:

ApiServerContainer

K3sContainer

KindContainer

Although ApiServerContainer focuses on providing only a small part of the Kubernetes control plane, namely the Kubernetes API server, K3sContainer and KindContainer launch complete single-node Kubernetes clusters in Docker containers. 

This allows for a trade-off depending on the requirements of the respective tests: If only interaction with the API server is necessary for testing, then the significantly faster-starting ApiServerContainer is usually sufficient. However, if testing complex interactions with other components of the Kubernetes control plane or even other operators is in the scope, then the two “larger” implementations provide the necessary tools for that — albeit at the expense of startup time. For perspective, depending on the hardware configuration, startup times can reach a minute or more.

A first example

To illustrate how straightforward testing against a Kubernetes container can be, let’s look at an example using JUnit 5:

@Testcontainers
public class SomeApiServerTest {
@Container
public ApiServerContainer<?> K8S = new ApiServerContainer<>();

@Test
public void verify_no_node_is_present() {
Config kubeconfig = Config.fromKubeconfig(K8S.getKubeconfig());
try (KubernetesClient client = new KubernetesClientBuilder().withConfig(kubeconfig).build()) {
// Verify that ApiServerContainer has no nodes
assertTrue(client.nodes().list().getItems().isEmpty());
}
}
}

Thanks to the @Testcontainers JUnit 5 extension, lifecycle management of the ApiServerContainer is easily handled by marking the container that should be managed with the @Container annotation. Once the container is started, a YAML document containing the necessary details to establish a connection with the API server can be retrieved via the getKubeconfig() method. 

This YAML document represents the standard way of presenting connection information in the Kubernetes world. The fabric8 Kubernetes client used in the example can be configured using Config.fromKubeconfig(). Any other Kubernetes client library will offer similar interfaces. Kindcontainer does not impose any specific requirements in this regard.

All three container implementations rely on a common API. Therefore, if it becomes clear at a later stage of development that one of the heavier implementations is necessary for a test, you can simply switch to it without any further code changes — the already implemented test code can remain unchanged.

Customizing your Testcontainers

In many situations, after the Kubernetes container has started, a lot of preparatory work needs to be done before the actual test case can begin. For an operator, for example, the API server must first be made aware of a Custom Resource Definition (CRD), or another controller must be installed via a Helm chart. What may sound complicated at first is made simple by Kindcontainer along with intuitively usable Fluent APIs for the command-line tools kubectl and helm.

The following listing shows how a CRD is first applied from the test’s classpath using kubectl, followed by the installation of a Helm chart:

@Testcontainers
public class FluentApiTest {
@Container
public static final K3sContainer<?> K3S = new K3sContainer<>()
.withKubectl(kubectl -> {
kubectl.apply.fileFromClasspath(“manifests/mycrd.yaml”).run();
})
.withHelm3(helm -> {
helm.repo.add.run(“repo”, “https://repo.example.com”);
helm.repo.update.run();
helm.install.run(“release”, “repo/chart”);
);
// Tests go here
}

Kindcontainer ensures that all commands are executed before the first test starts. If there are dependencies between the commands, they can be easily resolved; Kindcontainer guarantees that they are executed in the order they are specified.

The Fluent API is translated into calls to the respective command-line tools. These are executed in separate containers, which are automatically started with the necessary connection details and connected to the Kubernetes container via the Docker internal network. This approach avoids dependencies on the Kubernetes image and version conflicts regarding the available tooling within it.

Selecting your Kubernetes version

If nothing else is specified by the developer, Kindcontainer starts the latest supported Kubernetes version by default. However, this approach is generally discouraged, so the best practice would require you to explicitly specify one of the supported versions when creating the container, as shown:

@Testcontainers
public class SpecificVersionTest {
@Container
KindContainer<?> container=new KindContainer<>(KindContainerVersion.VERSION_1_24_1);
// Tests go here
}

Each of the three container implementations has its own Enum, through which one of the supported Kubernetes versions can be selected. The test suite of the Kindcontainer project itself ensures — with the help of an elaborate matrix-based integration test setup — that the full feature set can be easily utilized for each of these versions. This elaborate testing process is necessary because the Kubernetes ecosystem evolves rapidly, and different initialization steps need to be performed depending on the Kubernetes version.

Generally, the project places great emphasis on supporting all currently maintained Kubernetes major versions, which are released every 4 months. Older Kubernetes versions are marked as @Deprecated and eventually removed when supporting them in Kindcontainer becomes too burdensome. However, this should only happen at a time when using the respective Kubernetes version is no longer recommended.

Bring your own Docker registry

Accessing Docker images from public sources is often not straightforward, especially in corporate environments that rely on an internal Docker registry with manual or automated auditing. Kindcontainer allows developers to specify their own coordinates for the Docker images used for this purpose. However, because Kindcontainer still needs to know which Kubernetes version is being used due to potentially different initialization steps, these custom coordinates are appended to the respective Enum value:

@Testcontainers
public class CustomKubernetesImageTest {
@Container
KindContainer<?> container=new KindContainer<>(KindContainerVersion.VERSION_1_24_1.withImage(
“my-registry/kind:1.24.1”));
// Tests go here
}

In addition to the Kubernetes images themselves, Kindcontainer also uses several other Docker images. As already explained, command-line tools such as kubectl and helm are executed in their own containers. Appropriately, the Docker images required for these tools are configurable as well. Fortunately, no version-dependent code paths are needed for their execution. 

Therefore, the configuration shown in the following is simpler than in the case of the Kubernetes image:

@Testcontainers
public class CustomFluentApiImageTest {
@Container
KindContainer<?> container=new KindContainer<>()
.withKubectlImage(DockerImageName.parse(“my-registry/kubectl:1.21.9-debian-10-r10”))
.withHelm3Image(DockerImageName.parse(“my-registry/helm:3.7.2”));
// Tests go here
}

The coordinates of the images for all other containers started can also be easily chosen manually. However, it is always the developer’s responsibility to ensure the use of the same or at least compatible images. For this purpose, a complete list of the Docker images used and their versions can be found in the documentation of Kindcontainer on GitHub.

Admission controller webhooks

For the test scenarios shown so far, the communication direction is clear: A Kubernetes client running in the JVM accesses the locally or remotely running Kubernetes container over the network to communicate with the API server running inside it. Docker makes this standard case incredibly straightforward: A port is opened on the Docker container for the API server, making it accessible. 

Kindcontainer automatically performs the necessary configuration steps for this process and provides suitable connection information as Kubeconfig for the respective network configuration.

However, admission controller webhooks present a technically more challenging testing scenario. For these, the API server must be able to communicate with external webhooks via HTTPS when processing manifests. In our case, these webhooks typically run in the JVM where the test logic is executed. However, they may not be easily accessible from the Docker container.

To facilitate testing of these webhooks independently of the network setup, yet still make it simple, Kindcontainer employs a trick. In addition to the Kubernetes container itself, two more containers are started. An SSH server provides the ability to establish a tunnel from the test JVM into the Kubernetes container and set up reverse port forwarding, allowing the API server to communicate back to the JVM. 

Because Kubernetes requires TLS-secured communication with webhooks, an Nginx container is also started to handle TLS termination for the webhooks. Kindcontainer manages the administration of the required certificate material for this. 

The entire setup of processes, containers, and their network communication is illustrated in Figure 1.

Figure 1: Network setup for testing webhooks.

Fortunately, Kindcontainer hides this complexity behind an easy-to-use API:

@Testcontainers
public class WebhookTest {
@Container
ApiServerContainer<?> container=new ApiServerContainer<>().withAdmissionController(admission -> {
admission.mutating()
.withNewWebhook("mutating.example.com")
.atPort(webhookPort) // Local port of webhook
.withNewRule()
.withApiGroups("")
.withApiVersions("v1")
.withOperations("CREATE", "UPDATE")
.withResources("configmaps")
.withScope("Namespaced")
.endRule()
.endWebhook()
.build();
});

// Tests go here
}

The developer only needs to provide the port of the locally running webhook along with some necessary information for setting up in Kubernetes. Kindcontainer then automatically handles the configuration of SSH tunneling, TLS termination, and Kubernetes.

Consider Java

Starting from the simple example of a minimal JUnit test, we have shown how to test custom Kubernetes controllers and operators implemented in Java. We have explained how to use familiar command-line tools from the ecosystem with the help of Fluent APIs and how to easily execute integration tests even in restricted network environments. Finally, we have shown how even the technically challenging use case of testing admission controller webhooks can be implemented simply and conveniently with Kindcontainer. 

Thanks to these new testing possibilities, we hope more developers will consider Java as the language of choice for their Kubernetes-related projects in the future.

Learn more

Visit the Testcontainers website.

Get started with Testcontainers Cloud by creating a free account.

Get the latest release of Docker Desktop.

Vote on what’s next! Check out our public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Subscribe to the Docker Newsletter.

Quelle: https://blog.docker.com/feed/

Build Your Own AI-Driven Code Analysis Chatbot for Developers with the GenAI Stack

The topic of GenAI is everywhere now, but even with so much interest, many developers are still trying to understand what the real-world use cases are. Last year, Docker hosted an AI/ML Hackathon, and genuinely interesting projects were submitted. 

In this AI/ML Hackathon post, we will dive into a winning submission, Code Explorer, in the hope that it sparks project ideas for you. 

For developers, understanding and navigating codebases can be a constant challenge. Even popular AI assistant tools like ChatGPT can fail to understand the context of your projects through code access and struggle with complex logic or unique project requirements. Although large language models (LLMs) can be valuable companions during development, they may not always grasp the specific nuances of your codebase. This is where the need for a deeper understanding and additional resources comes in.

Imagine you’re working on a project that queries datasets for both cats and dogs. You already have functional code in DogQuery.py that retrieves dog data using pagination (a technique for fetching data in parts). Now, you want to update CatQuery.py to achieve the same functionality for cat data. Wouldn’t it be amazing if you could ask your AI assistant to reference the existing code in DogQuery.py and guide you through the modification process? 

This is where Code Explorer, an AI-powered chatbot comes in. 

What makes Code Explorer unique?

The following demo, which was submitted to the AI/ML Hackathon, provides an overview of Code Explorer (Figure 1).

Figure 1: Demo of the Code Explorer extension as submitted to the AI/ML Hackathon.

Code Explorer helps you find answers about your code by searching relevant information based on the programming language and folder location. Unlike chatbots, Code Explorer goes beyond generic coding knowledge. It leverages a powerful AI technique called retrieval-augmented generation (RAG) to understand your code’s specific context. This allows it to provide more relevant and accurate answers based on your actual project.

Code Explorer supports a variety of programming languages, such as *.swift, *.py, *.java, *.cs, etc. This tool can be useful for learning or debugging your code projects, such as Xcode projects, Android projects, AI applications, web dev, and more.

Benefits of the CodeExplorer include:

Effortless learning: Explore and understand your codebase more easily.

Efficient debugging: Troubleshoot issues faster by getting insights from your code itself.

Improved productivity: Spend less time deciphering code and more time building amazing things.

Supports various languages: Works with popular languages like Python, Java, Swift, C#, and more.

Use cases include:

Understanding complex logic: “Explain how the calculate_price function interacts with the get_discount function in billing.py.”

Debugging errors: “Why is my getUserData function in user.py returning an empty list?”

Learning from existing code: “How can I modify search.py to implement pagination similar to search_results.py?”

How does it work?

Code Explorer leverages the power of a RAG-based AI framework, providing context about your code to an existing LLM model. Figure 2 shows the magic behind the scenes.

Figure 2: Diagram of Code Explorer steps.

Step 1. Process documents

The user selects a codebase folder through the Streamlit app. The process_documents function in the file db.py is called. This function performs the following actions:

Parsing code: It reads and parses the code files within the selected folder. This involves using language-specific parsers (e.g., ast module for Python) to understand the code structure and syntax.

Extracting information: It extracts relevant information from the code, such as:

Variable names and their types

Function names, parameters, and return types

Class definitions and properties

Code comments and docstrings

Documents are loaded and chunked: It creates a RecursiveCharacterTextSplitter object based on the language. This object splits each document into smaller chunks of a specified size (5000 characters) with some overlap (500 characters) for better context.

Creating Neo4j vector store: It creates a Neo4j vector store, a type of database that stores and connects code elements using vectors. These vectors represent the relationships and similarities between different parts of the code.

Each code element (e.g., function, variable) is represented as a node in the Neo4j graph database.

Relationships between elements (e.g., function call, variable assignment) are represented as edges connecting the nodes.

Step 2. Create LLM chains

This step is triggered only after the codebase has been processed (Step 1).

Two LLM chains are created:

Create Documents QnA chain: This chain allows users to talk to the chatbot in a question-and-answer style. It will refer to the vector database when answering the coding question, referring to the source code files.

Create Agent chain: A separate Agent chain is created, which uses the QnA chain as a tool. You can think of it as an additional layer on top of the QnA chain that allows you to communicate with the chatbot more casually. Under the hood, the chatbot may ask the QnA chain if it needs help with the coding question, which is an AI discussing with another AI the user’s question before returning the final answer. In testing, the agent appears to summarize rather than give a technical response as opposed to the QA agent only.

Langchain is used to orchestrate the chatbot pipeline/flow.

Step 3. User asks questions and AI chatbot responds

The Streamlit app provides a chat interface for users to ask questions about their code. The user interacts with the Streamlit app’s chat interface, and user inputs are stored and used to query the LLM or the QA/Agent models. Based on the following factors, the app chooses how to answer the user:

Codebase processed:

Yes: The QA RAG chain is used if the user has selected Detailed mode in the sidebar. This mode leverages the processed codebase for in-depth answers.

Yes: A custom agent logic (using the get_agent function) is used if the user has selected Agent mode. This mode might provide more concise answers compared to the QA RAG model.

Codebase not processed:

The LLM chain is used directly if the user has not processed the codebase yet.

Getting started

To get started with Code Explorer, check the following:

Ensure that you have installed the latest version of Docker Desktop.

Ensure that you have Ollama running locally.

Then, complete the four steps explained below.

1. Clone the repository

Open a terminal window and run the following command to clone the sample application.

https://github.com/dockersamples/CodeExplorer

You should now have the following files in your CodeExplorer directory:

tree
.
├── LICENSE
├── README.md
├── agent.py
├── bot.Dockerfile
├── bot.py
├── chains.py
├── db.py
├── docker-compose.yml
├── images
│ ├── app.png
│ └── diagram.png
├── pull_model.Dockerfile
├── requirements.txt
└── utils.py

2 directories, 13 files

2. Create environment variables
Before running the GenAI stack services, open the .env and modify the following variables according to your needs. This file stores environment variables that influence your application’s behavior.

OPENAI_API_KEY=sk-XXXXX
LLM=codellama:7b-instruct
OLLAMA_BASE_URL=http://host.docker.internal:11434
NEO4J_URI=neo4j://database:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=XXXX
EMBEDDING_MODEL=ollama
LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
LANGCHAIN_TRACING_V2=true # false
LANGCHAIN_PROJECT=default
LANGCHAIN_API_KEY=ls__cbaXXXXXXXX06dd

Note:

If using EMBEDDING_MODEL=sentence_transformer, uncomment code in requirements.txt and chains.py. It was commented out to reduce code size.

Make sure to set the OLLAMA_BASE_URL=http://llm:11434 in the .env file when using the Ollama Docker container. If you’re running on Mac, set OLLAMA_BASE_URL=http://host.docker.internal:11434 instead.

3. Build and run Docker GenAI services
Run the following command to build and bring up Docker Compose services:

docker compose –profile linux up –build

This gets the following output:

+] Running 5/5
✔ Network codeexplorer_net Created 0.0s
✔ Container codeexplorer-database-1 Created 0.1s
✔ Container codeexplorer-llm-1 Created 0.1s
✔ Container codeexplorer-pull-model-1 Created 0.1s
✔ Container codeexplorer-bot-1 Created 0.1s
Attaching to bot-1, database-1, llm-1, pull-model-1
llm-1 | Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
llm-1 | Your new public key is:
llm-1 |
llm-1 | ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGEM2BIxSSje6NFssxK7J1+X+46n+cWTQufEQjMUzLGC
llm-1 |
llm-1 | 2024/05/23 15:05:47 routes.go:1008: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
llm-1 | time=2024-05-23T15:05:47.265Z level=INFO source=images.go:704 msg="total blobs: 0"
llm-1 | time=2024-05-23T15:05:47.265Z level=INFO source=images.go:711 msg="total unused blobs removed: 0"
llm-1 | time=2024-05-23T15:05:47.265Z level=INFO source=routes.go:1054 msg="Listening on [::]:11434 (version 0.1.38)"
llm-1 | time=2024-05-23T15:05:47.266Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2106292006/runners
pull-model-1 | pulling ollama model codellama:7b-instruct using http://host.docker.internal:11434
database-1 | Installing Plugin 'apoc' from /var/lib/neo4j/labs/apoc-*-core.jar to /var/lib/neo4j/plugins/apoc.jar
database-1 | Applying default values for plugin apoc to neo4j.conf
pulling manifest
pull-model-1 | pulling 3a43f93b78ec… 100% ▕████████████████▏ 3.8 GB
pulling manifest
pulling manifest
pull-model-1 | pulling 3a43f93b78ec… 100% ▕████████████████▏ 3.8 GB
pull-model-1 | pulling 8c17c2ebb0ea… 100% ▕████████████████▏ 7.0 KB
pull-model-1 | pulling 590d74a5569b… 100% ▕████████████████▏ 4.8 KB
pull-model-1 | pulling 2e0493f67d0c… 100% ▕████████████████▏ 59 B
pull-model-1 | pulling 7f6a57943a88… 100% ▕████████████████▏ 120 B
pull-model-1 | pulling 316526ac7323… 100% ▕████████████████▏ 529 B
pull-model-1 | verifying sha256 digest
pull-model-1 | writing manifest
pull-model-1 | removing any unused layers
pull-model-1 | success
llm-1 | time=2024-05-23T15:05:52.802Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cuda_v11]"
llm-1 | time=2024-05-23T15:05:52.806Z level=INFO source=types.go:71 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="7.7 GiB" available="2.5 GiB"
pull-model-1 exited with code 0
database-1 | 2024-05-23 15:05:53.411+0000 INFO Starting…
database-1 | 2024-05-23 15:05:53.933+0000 INFO This instance is ServerId{ddce4389} (ddce4389-d9fd-4d98-9116-affa229ad5c5)
database-1 | 2024-05-23 15:05:54.431+0000 INFO ======== Neo4j 5.11.0 ========
database-1 | 2024-05-23 15:05:58.048+0000 INFO Bolt enabled on 0.0.0.0:7687.
database-1 | [main] INFO org.eclipse.jetty.server.Server – jetty-10.0.15; built: 2023-04-11T17:25:14.480Z; git: 68017dbd00236bb7e187330d7585a059610f661d; jvm 17.0.8.1+1
database-1 | [main] INFO org.eclipse.jetty.server.handler.ContextHandler – Started o.e.j.s.h.MovedContextHandler@7c007713{/,null,AVAILABLE}
database-1 | [main] INFO org.eclipse.jetty.server.session.DefaultSessionIdManager – Session workerName=node0
database-1 | [main] INFO org.eclipse.jetty.server.handler.ContextHandler – Started o.e.j.s.ServletContextHandler@5bd5ace9{/db,null,AVAILABLE}
database-1 | [main] INFO org.eclipse.jetty.webapp.StandardDescriptorProcessor – NO JSP Support for /browser, did not find org.eclipse.jetty.jsp.JettyJspServlet
database-1 | [main] INFO org.eclipse.jetty.server.handler.ContextHandler – Started o.e.j.w.WebAppContext@38f183e9{/browser,jar:file:/var/lib/neo4j/lib/neo4j-browser-5.11.0.jar!/browser,AVAILABLE}
database-1 | [main] INFO org.eclipse.jetty.server.handler.ContextHandler – Started o.e.j.s.ServletContextHandler@769580de{/,null,AVAILABLE}
database-1 | [main] INFO org.eclipse.jetty.server.AbstractConnector – Started http@6bd87866{HTTP/1.1, (http/1.1)}{0.0.0.0:7474}
database-1 | [main] INFO org.eclipse.jetty.server.Server – Started Server@60171a27{STARTING}[10.0.15,sto=0] @5997ms
database-1 | 2024-05-23 15:05:58.619+0000 INFO Remote interface available at http://localhost:7474/
database-1 | 2024-05-23 15:05:58.621+0000 INFO id: F2936F8E5116E0229C97F43AD52142685F388BE889D34E000D35E074D612BE37
database-1 | 2024-05-23 15:05:58.621+0000 INFO name: system
database-1 | 2024-05-23 15:05:58.621+0000 INFO creationDate: 2024-05-23T12:47:52.888Z
database-1 | 2024-05-23 15:05:58.622+0000 INFO Started.

The logs indicate that the application has successfully started all its components, including the LLM, Neo4j database, and the main application container. You should now be able to interact with the application through the user interface.

You can view the services via the Docker Desktop dashboard (Figure 3).

Figure 3: The Docker Desktop dashboard showing the running Code Explorer powered with GenAI stack.

The Code Explorer stack consists of the following services:

Bot

The bot service is the core application. 

Built with Streamlit, it provides the user interface through a web browser. The build section uses a Dockerfile named bot.Dockerfile to build a custom image, containing your Streamlit application code. 

This service exposes port 8501, which makes the bot UI accessible through a web browser.

Pull model

This service downloads the codellama:7b-instruct model. 

The model is based on the Llama2 model, which achieves similar performance to OpenAI’s LLM but is trained with additional code context. 

However, codellama:7b-instruct is additionally trained on code-related contexts and fine-tuned to understand and respond in human language. 

This specialization makes it particularly adept at handling questions about code.

Note: You may notice that pull-model-1 service exits with code 0, which indicates successful execution. This service is designed just to download the LLM model (codellama:7b-instruct). Once the download is complete, there’s no further need for this service to remain running. Exiting with code 0 signifies that the service finished its task successfully (downloading the model).

Database

This service manages a Neo4j graph database.

It efficiently stores and retrieves vector embeddings, which represent the code files in a mathematical format suitable for analysis by the LLM model.

The Neo4j vector database can be explored at http://localhost:7474 (Figure 4).

Figure 4: Neo4j database information.

LLM

This service acts as the LLM host, utilizing the Ollama framework. 

It manages the downloaded LLM model (not the embedding), making it accessible for use by the bot application.

4. Access the application
You can now view your Streamlit app in your browser by accessing http://localhost:8501 (Figure 5).

Figure 5: View the app.

In the sidebar, enter the path to your code folder and select Process files (Figure 6). Then, you can start asking questions about your code in the main chat.

Figure 6: The app is running.

You will find a toggle switch in the sidebar. By default Detailed mode is enabled. Under this mode, the QA RAG chain chain is used (detailedMode=true) . This mode leverages the processed codebase for in-depth answers. 

When you toggle the switch to another mode (detailedMode=false), the Agent chain gets selected. This is similar to how one AI discusses with another AI to create the final answer. In testing, the agent appears to summarize rather than a technical response as opposed to the QA agent only.

Here’s a result when detailedMode=true (Figure 7):

Figure 7: Result when detailedMode=true.

Figure 8 shows a result when detailedMode=false:

Figure 8: Result when detailedMode=false.

Start exploring

Code Explorer, powered by the GenAI Stack, offers a compelling solution for developers seeking AI assistance with coding. This chatbot leverages RAG to delve into your codebase, providing insightful answers to your specific questions. Docker containers ensure smooth operation, while Langchain orchestrates the workflow. Neo4j stores code representations for efficient analysis. 

Explore Code Explorer and the GenAI Stack to unlock the potential of AI in your development journey!

Learn more

Subscribe to the Docker Newsletter.

Get the latest release of Docker Desktop.

Vote on what’s next! Check out our public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Quelle: https://blog.docker.com/feed/

Docker Announces SOC 2 Type 2 Attestation & ISO 27001 Certification

Docker is pleased to announce that we have received our SOC 2 Type 2 attestation and ISO 27001 certification with no exceptions or major non-conformities. 

Security is a fundamental pillar to Docker’s operations, which is embedded into our overall mission and company strategy. Docker’s products are core to our user community and our SOC 2 Type 2 attestation and ISO 27001 certification demonstrate Docker’s ongoing commitment to security to our user base.

What is a SOC 2 Type 2?

Defined by the American Institute of Certified Public Accountants (AICPA), a System and Organization Controls (SOC) is a suite of reports produced during an audit. A SOC 2 Type 2 is an audit report or attestation that evaluates the design and operating effectiveness of internal controls of information systems over five criteria principles, known as the Trust Services Principles: Security (also referred to as the common criteria), Availability, Confidentiality, Processing Integrity, and Privacy.

What is ISO 27001?

The International Organization for Standardization (ISO) is an independent, non-governmental international organization of national standards bodies. ISO was established in 1947 and has a long history of producing standards, requirements, and certifications to demonstrate different control environments.

ISO 27001 is a worldwide recognized standard for the information security management system (ISMS). An ISMS is a framework of policies, procedures, and controls for systematically managing an organization’s sensitive data. 

Continued compliance

Going forward, Docker will provide an annual SOC 2 Type 2 attestation and ISO 27001 certification following the timing of our fiscal year.

Docker is committed to providing our customers with secure products. Our compliance posture provides our commitment to lead the industry in providing developers with tools they can trust. 

To learn more about Docker’s security posture, visit our Docker Trust Center website. If you would like access to our compliance platform to receive the documents, fill out the Security Documentation form, and the Docker Sales team will follow up with you. 

Learn more

Subscribe to the Docker Newsletter.

Get the latest release of Docker Desktop.

Vote on what’s next! Check out our public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Quelle: https://blog.docker.com/feed/