Azure reliability, resiliency, and recoverability: Build continuity by design

Modern cloud systems are expected to deliver more than uptime. Customers expect consistent performance, the ability to withstand disruption, and confidence that recovery is predictable and intentional.

In Azure, these expectations map the three distinct concepts: reliability, resiliency, and recoverability.

Explore technical methodologies with Azure Essentials

Reliability describes the degree to which a service or workload consistently performs at its intended service level within business-defined constraints and tradeoffs. Reliability is the outcome customers ultimately care about.

To achieve reliable outcomes, workloads are designed along two complementary dimensions. Resiliency is the ability to withstand faults and disruptive conditions such as infrastructure failures, zonal or regional outages, cyberattacks, or sudden change in load—and continue operating without customer-visible disruption. Recoverability is the ability to restore normal operations after disruption, returning the workload to a reliable state once resiliency limits are exceeded.

This blog anchors definitions and guidance to the Microsoft Cloud Adoption Framework, the Azure Well‑Architected Framework and the reliability guides for Azure services. Use the Reliability guides to confirm how each service behaves during faults, what protections are built in, and what you must configure and operate, so shared responsibility boundaries stay clear as workloads scale and during recovery scenarios.

Why this matters

When reliability, resiliency, and recoverability are used interchangeably, teams make the wrong design tradeoffs—over-investing in recovery when architectural resiliency is required, or assuming redundancy guarantees reliable outcomes. This post clarifies how these concepts differ, when each applies, and how they guide real design, migration, and incident-readiness decisions in Azure.

Industry perspective: Clarifying common confusion

Azure guidance treats reliability as the goal, achieved through deliberate resiliency and recoverability strategies. Resiliency describes workload behavior during disruption; recoverability describes restoring service after disruption.

Anchor principle: Reliability is the goal. Resiliency keeps you operational during disruption. Recoverability restores service when disruption exceeds design limits.

Part I — Reliability by design: Operating model and workload architecture

Reliable outcomes require alignment between organizational intent and workload architecture. Microsoft Cloud Adoption Framework helps organizations define governance, accountability, and continuity expectations that shape reliability priorities. Azure Well‑Architected Frameworktranslates those priorities into architectural principles, design patterns, and tradeoff guidance.

Part II — Reliability in practice: What you measure and operationalize

Reliability only matters if it is measured and sustained. Teams operationalize reliability by defining acceptable service levels, instrumenting steady-state behavior and customer experience, and validating assumptions with evidence.

Azure Monitor and Application Insights provide observability, while controlled fault testing (for example, with Azure Chaos Studio helps confirm designs behave as expected under stress.

Practical signals of “enough reliability” include meeting service levels for critical user flows, introducing changes safely, maintaining steady-state performance under expected load, and keeping deployment risk low through disciplined change practices.

Governance mechanisms such as Azure Policy, Azure landing zones, and Azure Verified Modules help apply these practices consistently as environments evolve.

The Reliability Maturity Model can help teams assess how consistently reliability practices are applied as workloads evolve, while remaining scoped to reliability practices rather than resiliency or recoverability architecture.

Part III — Resiliency in practice: From principle to staying operational

Resiliency by design is no longer a late-stage high-availability checklist. For mission-critical workloads, resiliency must be intentional, measurable, and continuously validated—built into how applications are designed, deployed, and operated.

Resiliency by design aims to keep systems operating through disruption wherever possible, not only recover after failures.

Resiliency is a lifecycle, not a feature

Effective practice shifts from isolated configurations to a repeatable lifecycle applied across workloads:

Start resilient—embed resiliency at design time using prescriptive architectures, secure-by-default configurations, and platform-native protections.  

Get resilient—assess existing applications, identify resiliency gaps, and remediate risks, prioritizing production mission-critical workloads. 

Stay resilient—continuously validate, monitor, and improve posture, ensuring configurations don’t drift and assumptions hold as scale, usage patterns, and threat models change.  

Withstanding disruption through architectural design

Resiliency focuses on how workloads behave during disruptive conditions such as failures, sudden changes in load, or unexpected operating stress—so they can continue operating and limit customer-visible impact. Some disruptive conditions are not “faults” in the traditional sense; elastic scale-out is a resiliency strategy for handling demand spikes even when infrastructure is healthy.

In Azure, resiliency is achieved through architectural and operational choices that tolerate faults, isolate failures, and limit their impact. Many decisions begin with failure-domain architecture: availability zones provide physical isolation within a region, zone-resilient configurations enable continued operation through zonal loss, and multi-region designs can extend operational continuity depending on routing, replication, and failover behavior.

The Reliable Web App reference architecture in the Azure Architecture Center illustrates how these principles come together through zone-resilient deployment, traffic routing, and elastic scaling paired with validation practices aligned to WAF. This reinforces a core tenet of resiliency by design: resiliency is achieved through intentional design and continuous verification, not assumed redundancy.  

Traffic management and fault isolation

Traffic management is central to resiliency behavior. Services such as Azure Load Balancer and Azure Front Door can route traffic away from unhealthy instances or regions, reducing user impact during disruption. Design guidance such as load-balancing decision trees can help teams select patterns that match their resiliency goals.

It is also important to distinguish resiliency from disaster recovery. Multi-region deployments may support high availability, fault isolation, or load distribution without necessarily meeting formal recovery objectives, depending on how failover, replication, and operational processes are implemented.

From resource checks to application-centric posture

Customers experience disruption as application outages, not as individual disk or VM failures. Resiliency must therefore be assessed and managed at the application level.

Azure’s zone resiliency experience supports this shift by grouping resources into logical application service groups, assessing risk, tracking posture over time, detecting drift, and guiding remediation with cost visibility. This turns resiliency from an assumption into an explicit, measurable posture.

Validation matters: configuration is not enough

Resiliency should be validated rather than assumed. Teams can simulate disruption through controlled drills, observe application behavior under stress, and measure continuity characteristics during expected scenarios. Strong observability is essential here: it shows how the application performs during and after drills.

Increasingly, assistive capabilities such as the Resiliency Agent (preview) in Azure Copilot help teams assess posture and guide remediation without blurring the distinction between resiliency (remaining operational through disruption) and recoverability (restoring service after disruption).  

What “enough resiliency” looks like: workloads remain functional during expected scenarios; failures are isolated, and systems degrade gracefully rather than causing customer-visible outages.

Part IV – Recoverability in practice: Restoring normal operations after disruption

Recoverability becomes relevant when disruption exceeds what resiliency mechanisms can withstand. It focuses on restoring normal operations after outages, data corruption events, or broader incidents, returning the system to a reliable state.

Recoverability strategies typically involve backup, restore, and recovery orchestration. In Azure, services such as Azure Backup and Azure Site Recovery support these scenarios, with behavior varying by service and configuration.

Recovery requirements such as Recovery Time Objective (RTO) and Recovery Point Objective (RPO) belong here. These metrics define restoration expectations after disruption, not how workloads remain operational during disruption.

Recoverability also depends on operational readiness: teams document runbooks, practice restores, verify backup integrity, and test recovery regularly, so recovery plans work under real pressure.

By separating recoverability from resiliency, teams can ensure recovery planning complements, rather than substitutes for, sound resiliency architecture.

A 30-day action plan: Turning intent into reliable outcomes

Within 30 days, translate concepts into deliberate decisions.

First, identify and classify critical workloads, confirm ownership, and define acceptable service levels and tradeoffs.

Next, assess resiliency posture against expected disruption scenarios (including zonal loss, regional failure, load spikes, and cyber disruption), validate failure-domain choices, and verify traffic management behavior. Use guardrails such as Azure Backup, Microsoft Defender for Cloud, and Microsoft Sentinel to strengthen continuity against cyberattacks.

Then, confirm recoverability paths for scenarios that exceed resiliency limits, including restoration paths and RTO/RPO targets.

Finally, align operational practices—change management, observability, governance, and continuous improvement—and validate assumptions using the Reliability guides for each Azure service.

Designing confident, reliable cloud systems

Modern cloud continuity is defined by how confidently systems perform, withstand disruption, and restore service when needed. Reliability is the outcome to design for; resiliency and recoverability are complementary strategies that make reliable operation possible.

Next step: Explore Azure Essentials for guidance and tools to build secure, resilient, cost-efficient Azure projects. To see how shared responsibility and Azure Essentials come together in practice, read Resiliency in the cloud—empowered by shared responsibility and Azure Essentials on the Microsoft Azure Blog.

For expert-led, outcome-based engagements to strengthen resiliency and operational readiness, Microsoft Unified provides end-to-end support across the Microsoft cloud. To move from guidance to execution, start your project with experts and investments through Azure Accelerate.

Azure capabilities referenced

Foundational guidance:

Get started with Microsoft Cloud Adoption Framework

Explore the Azure Well-Architected Framework

See all reliability guides in Azure services

Resiliency examples:

Read overview on Azure Resiliency

What are availability zones?

What is Azure Load Balancer?

What is Azure Front Door?

See how to use multi‑region support

Learn more about Resiliency Agent (preview) in Azure Copilot

Recoverability examples:

Protect your data with Azure Backup

Reduce risk with Azure Site Recovery

Understand redundancy, data replication, backup, and restore capabilities

Governance and validation examples:

Access Azure Monitor documentation

Read about Application Insights Experiences

Access Azure Chaos Studio documentation

What is Azure Policy?

What is Azure landing zone?

What are Azure Verified Modules?

The post Azure reliability, resiliency, and recoverability: Build continuity by design appeared first on Microsoft Azure Blog.
Quelle: Azure

Claude Sonnet 4.6 in Microsoft Foundry-Frontier Performance for Scale

Claude Sonnet 4.6 is available today in Microsoft Foundry, and it is designed for teams who want frontier performance across coding, agents, and professional work at scale.Last week, we took a major step forward with the availability of Claude Opus 4.6 in Microsoft Foundry, bringing frontier AI capable of deep reasoning, agentic workflows, and complex decision-making to enterprise developers and builders. If Opus represents the highest tier of AI performance, Sonnet 4.6 builds on that momentum by delivering nearly Opus-level intelligence at a lower price, while often being more token efficient than Claude Sonnet 4.5.

Claude Sonnet 4.6 is available today in Microsoft Foundry, and it is designed for teams who want frontier performance across coding, agents, and professional work at scale. With Sonnet 4.6, customers get access to powerful reasoning and productivity capabilities that make everyday AI a practical reality for development teams, enterprise knowledge workers, and automation scenarios.

Large Context, Adaptive Thinking, and Effort Controls

Claude Sonnet 4.6 delivers frontier intelligence at scale, built for coding, agents, and enterprise workflows.

A major highlight is its 1 million token context window (beta), matching the extended context capabilities of Claude Opus 4.6, alongside 128K maximum output. This enables teams to work across massive codebases, long financial models, multi-document analysis, and extended multi-turn workflows without fragmentation or repeated context resets.

Sonnet 4.6 also uses adaptive thinking and effort parameters, that gives Claude the freedom to think if and when it determines reasoning is required. This is an evolution from traditional extended thinking, optimizing both performance and speed. Teams can use effort parameters to better control quality-latency-cost tradeoffs.

A Developer’s Everyday Model

Claude Sonnet 4.6 is a full upgrade for software development. It is smart enough to work independently through complex codebases and handles iterative workflows without losing quality.

Enterprise software teams can expect Claude Sonnet 4.6 to deliver:

Stronger reasoning across code contextsBetter understanding of complex codebasesReliable performance across iterative development cyclesWhether you’re building features, refactoring existing modules, or debugging tricky issues, Sonnet 4.6 can follow your workflow, maintain architectural context, and adapt as you iterate.

Sonnet 4.6 is designed for back-and-forth development:

You define intentIt produces high-quality outputsYou guide refinementDeliverables stay consistent through iterationsFor teams building in Microsoft Foundry, this translates to fewer context resets, faster cycle times, and smoother development velocity.

Ref: Benchmark table published by Anthropic

Empowering High-Quality Knowledge Work

Sonnet 4.6 makes high-quality knowledge work accessible at scale, enabling teams to produce polished outputs with fewer editing cycles.

Improvements in search, analysis, and content generation make Sonnet 4.6 ideal for everyday enterprise workflows, such as:

Drafting and refining reportsSummarizing large document setsGenerating structured business documentationProducing polished presentations and narrativesConsistent quality across both single-turn tasks and extended multi-turn collaboration ensures teams spend less time refining and more time delivering.

Powerful Computer Use

Claude Sonnet 4.6 is Anthropic’s most capable computer use model yet, scoring 72.5% on OSWorld Verified. With improved precision, the model has better clicking accuracy on difficult UI elements. Claude Sonnet 4.6 enables browser automation at scale without API key dependency. It can navigate, interact, and complete tasks across any browser-based surface, including tools with no API, legacy systems, and sites you’re already logged into.

Claude Sonnet 4.6 can work across apps without explicit instruction. It can read context from one surface and act on another, checking a calendar, responding to a message, and creating an event, without the user having to orchestrate each step.

For organizations running business workflows on systems that predate modern APIs, Sonnet 4.6’s browser-based computer use is transformative. For developers, Sonnet 4.6 is a strong fit for software development workflows as a QA and testing layer. Spinning up a browser when needed, developers can delegate visual inspection and form-based validation.

Versatile Horizontal and Vertical Use Cases

Claude Sonnet 4.6 is a direct upgrade to Sonnet 4.5. Most workflows will require only minimal prompting changes.,

Search & Conversational Experiences

Sonnet 4.6 is an excellent choice for high-volume conversational products, delivering consistent quality across multi-turn exchanges while remaining cost-efficient for scale.

Agentic & Multi-Model Pipelines

Sonnet 4.6 can function as both lead agent and sub-agent in multi-model setups. Adaptive thinking, context compaction, and effort controls give developers precise orchestration tools for complex workflows.

Finance & Analytics

With stronger financial modeling intelligence and improved spreadsheet capabilities, Sonnet 4.6 is a strong fit for analysis, compliance review, and data summarization workflows where precision and iteration speed matter.

Enterprise Document & Workflow Production

Users need fewer rounds of editing to reach production-ready documents, spreadsheets, and presentations, making Claude Sonnet 4.6 a strong fit for finance, legal, and other precision-critical verticals where polish and domain accuracy matter.

Built for Scale in Microsoft Foundry

With Claude Sonnet 4.6 available in Microsoft Foundry, customers can deploy near-Opus-level intelligence within an enterprise-grade environment that supports governance, compliance, and operational tooling.

For teams building modern AI workflows, from developer assistants to enterprise automation agents, Claude Sonnet 4.6 provides a powerful, scalable foundation in Microsoft Foundry.

Try it today

And to go deeper, join us on February 23 for Model Mondays, where leaders from Anthropic will walk through both Claude Opus 4.6 and Claude Sonnet 4.6 including real-world use cases, architectural guidance, and what’s next for frontier models in enterprise deployment.
The post Claude Sonnet 4.6 in Microsoft Foundry-Frontier Performance for Scale appeared first on Microsoft Azure Blog.
Quelle: Azure

Introducing Budget Bytes: Build powerful AI apps for under $25

When developers hear “cloud” and “AI,” their first thought is often about cost. “How much will this cost me to learn? Can I build something meaningful without racking up a surprise bill?”

Budget Bytes is a new series is designed to inspire developers to build affordable, production-quality AI applications on Azure with a budget of $25 or less. Yes, you read that right, twenty-five dollars!

Budget Bytes title imageWhat is Budget Bytes?Budget Bytes is an episodic video series featuring developers building end-to-end scenarios from scratch. But here’s what makes it different:Real costs, tallied live – At the end of each episode, we show you exactly what it cost to build and run.Authentic development – Speakers show their actual process, including mistakes and debugging (because that’s real life).Practical patterns – Learn new tools, APIs, design patterns, and processes you can apply immediately.Replicable solutions – Every demo, you can look at the GitHub repository so you can deploy it yourself!This season centers around the Azure SQL Database Free Offer, demonstrating how you can leverage enterprise-grade database capabilities without the enterprise price tag.What You’ll LearnEach episode is packed with practical takeaways:

New tools and technologies – From Microsoft Foundry to Copilot Studio to Model Context ProtocolReal-world design patterns – See how experienced developers architect cost-effective solutionsHands-on deployment – Every solution can be deployed to your own Azure subscriptionContinuous learning – Each episode links to Microsoft Learn modules for deeper divesNew to Azure SQL? – Get started by learning through real use cases in each episode, and get inspired to apply them to your own ideas!

The Season LineupEpisode Date Topic Speaker(s) What You’ll Build1 January 29th, 2026 Microsoft Foundry Jasmine Greenaway AI Inventory Manager for free2 February 12th, 2026 AI-driven insurance scenarios Arvind Shyamsundar & Amar Patil Insurance AI Application3 February 26th, 2026 Agentic RAG for everyone Davide Mauri Model Context Protocol with .NET4 March 12th, 2026 Copilot Studio Integration Bob Ward AI Agents with your data using Copilot Studio for $10/month5 March 29th, 2026 Fireside Chat Wrap-Up Priya Sathy & Guests Series recap and key insightsTune inWatch the trailer here: Build Powerful AI Apps for under $25!

New episodes release regularly, each with:A full video walkthroughCompanion blog post with additional contextComplete source code on GitHubExplore the samples: Check out the Budget Bytes Samples Repository on GitHub (repositories go public with each episode release)Try the free tier: Azure SQL Database Free Offer Documentation

Ready to build on a budget? Subscribe to the Microsoft Developer YouTube channel and follow the samples repo to get notified when new episodes drop.

Budget Bytes: Real developers. Real apps. Real affordable.
The post Introducing Budget Bytes: Build powerful AI apps for under $25 appeared first on Microsoft Azure Blog.
Quelle: Azure