How Drasi used GitHub Copilot to find documentation bugs

For early-stage open-source projects, the “Getting started” guide is often the first real interaction a developer has with the project. If a command fails, an output doesn’t match, or a step is unclear, most users won’t file a bug report—they will just move on.

Drasi, a CNCF sandbox project that detects changes in your data and triggers immediate reactions, is supported by our small team of four engineers in Microsoft Azure’s Office of the Chief Technology Officer, and we move fast. We have comprehensive tutorials, but we are shipping code faster than we can manually test them.

Detect and react to your first database change using DrasiThe team didn’t realize how big this gap was until late 2025, when GitHub updated its Dev Container infrastructure, bumping the minimum Docker version. The update broke the Docker daemon connection—and every single tutorial stopped working. Because we relied on manual testing, we didn’t immediately know the extent of the damage. Any developer trying Drasi during that window would have hit a wall.

This incident forced a realization: with advanced AI coding assistants, documentation testing can be converted to a monitoring problem.

The problem: Why does documentation break?Documentation usually breaks for two reasons:

The curse of knowledgeExperienced developers write documentation with implicit context. When we write “wait for the query to bootstrap,” we know to run drasi list query and watch for the Running status, or even better—run the drasi wait command. A new user has no such context. Neither does an AI agent. They read the instructions literally and don’t know what to do. They get stuck on the “how,” while we only document the “what.”

Silent driftDocumentation doesn’t fail loudly like code does. When you rename a configuration file in your codebase, the build fails immediately. But when your documentation still references the old filename, nothing happens. The drift accumulates silently until a user reports confusion.

This is compounded for tutorials like ours, which spin up sandbox environments with Docker, k3d, and sample databases. When any upstream dependency changes—a deprecated flag, a bumped version, or a new default—our tutorials can break silently.

The solution: Agents as synthetic usersTo solve this, we treated tutorial testing as a simulation problem. We built an AI agent that acts as a “synthetic new user.”

This agent has three critical characteristics:

It is naïve: It has no prior knowledge of Drasi—it knows only what is explicitly written in the tutorial.It is literal: It executes every command exactly as written. If a step is missing, it fails.It is unforgiving: It verifies every expected output. If the doc says, “You should see ‘Success’”, and the command line interface (CLI) just returns silently—the agent flags it and fails fast.The stack: GitHub Copilot CLI and Dev ContainersWe built a solution using GitHub Actions, Dev Containers, Playwright, and the GitHub Copilot CLI.

Our tutorials require heavy infrastructure:

A full Kubernetes cluster (k3d)Docker-in-DockerReal databases (such as PostgreSQL and MySQL)We needed an environment that exactly matches what our human users experience. If users run in a specific Dev Container on GitHub Codespaces, our test must run in that same Dev Container.

The architectureInside the container, we invoke the Copilot CLI with a specialized system prompt (view the full prompt here):

A screen shot of a computer terminal:

bash

copilot -p “$(cat prompt.md)” –allow-all-tools –allow-all-paths –deny-tool ‘fetch’ –deny-tool ‘websearch’ –deny-tool ‘githubRepo’ –deny-tool ‘shell(curl *)’

# … additional deny-tool flags

–allow-url localhost –allow-url 127.0.0.1This prompt using the prompt mode (-p) of the CLI agent gives us an agent that can execute terminal commands, write files, and run browser scripts—just like a human developer sitting at their terminal. For the agent to simulate a real user, it needs these capabilities.

To enable the agents to open webpages and interact with them as any human following the tutorial steps would, we also install Playwright on the Dev Container. The agent also takes screenshots which it then compares against those provided in the documentation.

Security modelOur security model is built around one principle: the container is the boundary.

Rather than trying to restrict individual commands (a losing game when the agent needs to run arbitrary node scripts for Playwright), we treat the entire Dev Container as an isolated sandbox and control what crosses its boundaries: no outbound network access beyond localhost, a Personal Access Token (PAT) with only “Copilot Requests” permission, ephemeral containers destroyed after each run, and a maintainer-approval gate for triggering workflows.

Dealing with non-determinismOne of the biggest challenges with AI-based testing is non-determinism. Large language models (LLMs) are probabilistic—sometimes the agent retries a command; other times it gives up.

We handled this with a three-stage retry with model escalation (start with Gemini-Pro, on failure try with Claude Opus), semantic comparison for screenshots instead of pixel-matching, and verification of core-data fields rather than volatile values.

We also have a list of tight constraints in our prompts that prevent the agent from going on a debugging journey, directives to control the structure of the final report, and also skip directives that tell the agent to bypass optional tutorial sections like setting up external services.

Artifacts for debuggingWhen a run fails, we need to know why. Since the agent is running in a transient container, we can’t just Secure Shell (SSH) in and look around.

So, our agent preserves evidence of every run—screenshots of web UIs, terminal output of critical commands, and a final markdown report detailing its reasoning like shown here:

Drasi Getting Started Tutorial Evaluation

Environment

Timestamp: 2026-02-20T13:32:07.998Z

Directory: /workspaces/learning/tutorial/getting-started

Step 1: Setup Drasi Environment

Skipped as per instructions (already in DevContainer).

Verified environment setup by checking resources folder existence.

Step 2: Create PostgreSQL Source

Command: drasi apply -f ./resources/hello-world-source.yaml

………………………………………… more steps ……….………………………………

Scenario 1: hello-world-from

Initial check: “Brian Kernighan” present. (Screenshot: 09_hello-world-from.png)

Action: Insert ‘Allen’, ‘Hello World’.

Verification: “Allen” appeared in UI. (Screenshot: 10_hello-world-from-updated.png)

Result: PASSED

……………………………………………………….. more validation by playwright taking screenshots …..……………………………………………………

Conclusion

The tutorial instructions were clear and the commands executed successfully. The expected behavior matches the actual behavior observed via the Debug Reaction UI.

STATUS: SUCCESS

These artifacts are uploaded to the GitHub Action run summary, allowing us to “time travel” back to the exact moment of failure and see what the agent saw.

Screenshot of Agents report output in a folder with other files.Parsing the agent’s reportWith LLMs, getting a definitive “Pass/Fail” signal that a machine can understand can be challenging. An agent might write a long, nuanced conclusion like:

To make this actionable in a CI/CD pipeline, we had to do some prompt engineering. We explicitly instructed the agent:

In our GitHub Action, we then simply grep for this specific string to set the exit code of the workflow.

Simple techniques like this bridge the gap between AI’s fuzzy, probabilistic outputs and CI’s binary pass/fail expectations.

AutomationWe now have an automated version of the workflow which runs weekly. This version evaluates all our tutorials every week in parallel—each tutorial gets its own sandbox container and a fresh perspective from the agent acting as a synthetic user. If any of the tutorial evaluation fails—the workflow is configured to file an issue on our GitHub repo.

This workflow can optionally also be run on pull-requests, but to prevent attacks we have added a maintainer-approval requirement and a pull_request_target trigger—which means that even on pull-requests by external contributors, the workflow that executes will be the one in our main branch.

Running the Copilot CLI requires a PAT token which is stored in the environment secrets for our repo. To make sure this does not leak, each run requires maintainer approval—except the automated weekly run which only runs on the main branch of our repo.

What we found: Bugs that matterSince implementing this system, we have run over 200 “synthetic user” sessions. The agent identified 18 distinct issues including some serious environment issues and other documentation issues like these. Fixing them improved the docs for everyone, not just the bot.

Implicit dependencies: In one tutorial, we instructed users to create a tunnel to a service. The agent ran the command, and then—following the next instruction—killed the process to run the next command.The fix: We realized we hadn’t told the user to keep that terminal open. We added a warning: “This command blocks. Open a new terminal for subsequent steps.”Missing verification steps: We wrote: “Verify the query is running.” The agent got stuck: “How, exactly?”The fix: We replaced the vague instruction with an explicit command: drasi wait -f query.yaml.Format drift: Our CLI output had evolved. New columns were added; older fields were deprecated. The documentation screenshots still showed the 2024 version of the interface. A human tester might gloss over this (“it looks mostly right”). The agent flagged every mismatch, forcing us to keep our examples up to date.AI as a force multiplierWe often hear about AI replacing humans, but in this case, the AI is providing us with a workforce we never had.

To replicate what our system does—running six tutorials across fresh environments every week—we would need a dedicated QA resource or a significant budget for manual testing. For a four-person team, that is impossible. By deploying these Synthetic Users, we have effectively hired a tireless QA engineer who works nights, weekends, and holidays.

Our tutorials are now validated weekly by synthetic users—try the Getting Started guide yourself and see the results firsthand. And if you’re facing the same documentation drift in your own project, consider GitHub Copilot CLI not just as a coding assistant, but as an agent—give it a prompt, a container, and a goal—and let it do the work a human doesn’t have time for.
The post How Drasi used GitHub Copilot to find documentation bugs appeared first on Microsoft Azure Blog.
Quelle: Azure

Microsoft named a Leader in The Forrester Wave™ for Sovereign Cloud Platforms

Digital sovereignty is no longer a niche requirement. For organizations operating across borders, regulated industries, and complex supply chains, sovereignty is now table stakes for cloud strategy.

That’s why we’re pleased that Microsoft has been named a Leader in The Forrester Wave TM: Sovereign Cloud Platforms, Q2 2026 – an evaluation that assessed the most significant sovereign cloud providers based on current offerings, strategy, and customer feedback.

We believe this recognition reflects Microsoft’s long-term commitment to helping organizations adopt cloud and AI without compromising on control, compliance, operational independence, or innovation.

Read the full report

Why this recognition matters

Forrester’s research highlights a key reality of sovereign clouds: there is no single deployment model that fits every sovereignty requirement. Instead, organizations combine public cloud, private cloud, and disconnected environments to achieve the level of sovereignty they need – balancing risk, regulations, functionality, and cost.

In this context, leadership isn’t about offering a single “sovereign cloud.” The goal is not isolation, but it’s about providing consistent sovereign controls across multiple environments to maintain access to modern cloud capabilities.

Forrester places Microsoft in the Leaders category based on its scores in the current offering and strategy categories. The report also notes Microsoft’s vision to offer sovereign controls across cloud, AI and productivity services. Specifically, Microsoft’s ability to extend sovereignty across AI, productivity, security, and cloud platform.

A platform approach to sovereignty

The Forrester report notes that Microsoft’s sovereign capabilities are available consistently for both private and public cloud. In practice, digital sovereignty is achieved through a combination of technical controls, operational practices, and contractual commitments applied consistently across deployment models.

Microsoft Sovereign Cloud brings together:

Public cloud with data residency and access controls, including region-specific residency controls such as EU Data Boundary.

Private cloud with hybrid deployments, enabled through Azure Local and consistent policy and management via Azure Arc.

Partner-operated national clouds, with Bleu and Delos Cloud, where infrastructure is independently owned and operated to meet national requirements.

This approach allows organizations to grow their sovereign IT posture over time, adapting to evolving regulatory, operational, or geopolitical conditions without having to abandon the Microsoft cloud ecosystem.

Read the Microsoft Sovereign Cloud in Europe white paper

Consistency across sovereign environments

One of the differentiators cited in the evaluation is Microsoft’s ability to make key capabilities available across sovereign public and sovereign private cloud. Forrester specifically calls out Microsoft’s container and Kubernetes capabilities, including the use of Azure Arc and Azure Local to run Kubernetes clusters in connected or disconnected environments, supported by infrastructure-as-code and GitOps tooling.

This consistency matters because sovereign cloud isn’t just about where data resides, but about whether organizations can:

Operate and secure workloads the same way across environments.

Maintain development and operation standards.

Avoid fragmenting teams, tools, and processes.

By extending common management, governance, and deployment models across environments, Microsoft Sovereign Cloud helps reduce complexity while giving organizations control.

Read the full report

Looking ahead

Sovereign cloud platforms are evolving quickly, especially as customers look to apply AI, analytics, and modern application services across different environments. Forrester notes that customers don’t “buy” sovereignty as a standalone product, they architect for it over time.

Microsoft’s recognition as a Leader in this evaluation underscores our commitment to keep investing in sovereign cloud innovation such as:

Advanced AI development and runtime capabilities.

Increasing consistency and parity across deployment models.

Supporting customers as sovereignty requirements continue to mature and evolve.

We’re grateful to our customers and partners who continue to shape our approach and we remain focused on helping organizations adopt cloud and AI with confidence, flexibility, and transparency wherever their workloads need to run.

Forrester does not endorse any company, product, brand, or service included in its research publications and does not advise any person to select the products or services of any company or brand based on the ratings included in such publications. Information is based on the best available resources. Opinions reflect judgment at the time and are subject to change. For more information, read about Forrester’s objectivity here.
The post Microsoft named a Leader in The Forrester Wave™ for Sovereign Cloud Platforms appeared first on Microsoft Azure Blog.
Quelle: Azure

Second-generation Amazon FSx for NetApp ONTAP is now available in four additional AWS commercial and AWS GovCloud (US) Regions

Amazon FSx for NetApp ONTAP second-generation file systems are now available in 4 additional AWS Regions: Europe (London), Asia Pacific (Hyderabad), South America (Sao Paulo), and AWS GovCloud (US-West).  Amazon FSx makes it easier and more cost effective to launch, run, and scale feature-rich, high-performance file systems in the cloud. Second-generation FSx for ONTAP file systems give you more performance scalability and flexibility over first-generation file systems by allowing you to create or expand file systems with up to 12 highly-available (HA) pairs of file servers, providing your workloads with up to 72 GBps of throughput and 1 PiB of provisioned SSD storage. With this regional expansion, second-generation FSx for ONTAP file systems are available in the following AWS Regions: US East (N. Virginia, Ohio), US West (N. California, Oregon), Canada (Central), Europe (Frankfurt, Ireland, London, Spain, Stockholm, Zurich), South America (Sao Paulo), Asia Pacific (Hyderabad, Mumbai, Seoul, Singapore, Sydney, Tokyo), and AWS GovCloud (US-West). You can create second-generation Multi-AZ file systems with a single HA pair, and Single-AZ file systems with up to 12 HA pairs. To learn more, visit the FSx for ONTAP user guide.
 
Quelle: aws.amazon.com

Amazon CloudWatch pipelines introduces new compliance and governance capabilities

Amazon CloudWatch pipelines now includes new compliance and governance capabilities to help you maintain data integrity and control access when processing logs. CloudWatch pipelines is a fully managed service that ingests, transforms, and routes log data to CloudWatch without requiring you to manage infrastructure. Because pipeline processors modify log events during transformation, organizations with audit or regulatory requirements need ways to preserve original data and track what has been changed. These new tools address those needs directly.
You can now enable a “keep original” toggle to automatically store a copy of your raw logs before any transformation takes place, ensuring the unmodified data is always available when needed. Pipelines also adds new metadata to processed log entries indicating that the log has been transformed, making it easy to distinguish between original and processed data during audits or investigations. Additionally, new IAM condition keys let administrators restrict who can create pipelines based on log source name and type, giving operators fine-grained control over pipeline creation across their organization.
These compliance and governance features are available at no additional cost. Standard CloudWatch Logs storage rates apply to both the original and transformed copies of your log data when the keep original log option is enabled. You can use these features in all AWS Regions where CloudWatch pipelines is generally available.
To get started, visit the CloudWatch Ingestion page in the Amazon CloudWatch console. To learn more, see the CloudWatch pipelines documentation.
Quelle: aws.amazon.com

AWS Deadline Cloud supports monitor creation in multiple regions

Today, AWS Deadline Cloud announces support for creating monitors in multiple AWS Regions without additional configuration of your IAM Identity Center instance. AWS Deadline Cloud is a fully managed service that helps creative teams manage and scale their rendering workloads in the cloud. You can now deploy render farms with monitors across multiple Regions without needing to adjust your existing IAM Identity Center configuration. You can operate more efficiently by placing rendering resources in regions closest to your artists and studios worldwide, and can run and compare workloads across regions to help optimize your rendering strategy or diversify your instance types. Deadline Cloud automatically routes authentication requests to your IAM Identity Center instance in its primary Region, so your identity data remains in place without replication and requires no changes to your identity management setup. To learn more, see Getting Started with Deadline Cloud in the AWS Deadline Cloud User Guide. 
Quelle: aws.amazon.com

Amazon CloudWatch pipelines now supports drop and conditional processing

Amazon CloudWatch pipelines now supports conditional processing and a new drop events processor, giving you more control over how your log data is transformed. CloudWatch pipelines is a fully managed service that ingests, transforms, and routes log data to CloudWatch without requiring you to manage infrastructure. Until now, processors applied to all log entries uniformly. With conditional processing, you can define rules that determine when a processor runs and which individual log entries it acts on, so you only transform the data that matters.
Conditional processing is available across 21 processors including Add Entries, Delete Entries, Copy Values, Grok, Rename Key, and more. For each processor, you can set a “run when” condition to skip the entire processor if the condition is not met, or an entry-level condition to control whether each individual action within the processor is applied. The new Drop Events processor lets you filter out unwanted log entries from third-party pipeline connectors based on conditions you define, helping reduce noise and lower costs.
Conditional processing and the Drop Events processor are available at no additional cost in all AWS Regions where CloudWatch pipelines is generally available. Standard CloudWatch Logs ingestion and storage rates still apply.
To get started, visit the CloudWatch pipelines page in the Amazon CloudWatch console. To learn more, see the CloudWatch pipelines documentation.
Quelle: aws.amazon.com

Amazon EC2 X8i instances are now available in Europe (Paris)

Amazon Web Services (AWS) is announcing the general availability of Amazon EC2 X8i instances, next-generation memory optimized instances powered by custom Intel Xeon 6 processors available only on AWS. X8i instances are SAP-certified and deliver the highest performance and fastest memory bandwidth among comparable Intel processors in the cloud. They deliver up to 43% higher performance, 1.5x more memory capacity (up to 6TB), and 3.3x more memory bandwidth compared to previous generation X2i instances. X8i instances are designed for memory-intensive workloads like SAP HANA, large databases, data analytics, and Electronic Design Automation (EDA). Compared to X2i instances, X8i instances offer up to 50% higher SAPS performance, up to 47% faster PostgreSQL performance, 88% faster Memcached performance, and 46% faster AI inference performance. X8i instances come in 14 sizes, from large to 96xlarge, including two bare metal options. X8i instances are available in the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Frankfurt), Europe (Stockholm) and Europe (Paris). To get started, visit the AWS Management Console. X8i instances can be purchased via Savings Plans, On-Demand instances, and Spot instances. For more information visit X8i instances page.
Quelle: aws.amazon.com