Bring In the “New” Infrastructure Stack

The post Bring In the “New” Infrastructure Stack appeared first on Mirantis | Pure Play Open Cloud.
Today, Mirantis has announced Mirantis Cloud Platform .0, which heralds an operations-centric approach to open cloud.  But what does that mean in terms of cloud services today and into the future?  I think it may change your perspective when considering or deploying cloud infrastructure.
When our Co-Founder Boris Renski declared Infrastructure Software Is Dead, he was not talking about the validity or usefulness of infrastructure software, he was talking about the delivery and operations model for infrastructure software.  Historically, infrastructure software has been complicated as well as being notorious for challenging in terms of lifecycle management.  The typical model encompassed a very slow release model comprised of very large, integrated releases that arrived on the order of years for major releases (1.x, 2.x, 3.x&;) and many quarters for minor releases (3.2, 3.3, 3.4…).  Moving from one to the other was an extremely taxing process for IT organizations, and combined with a typical hardware refresh cycle, this usually resulted in the mega-project mentality in our industry:

Architect and deploy service(s) on a top-to-bottom stack
Once it is working &; don’t touch it (keep it running)
Defer consumption of new features and innovation until next update
Define a mega-project plan (typically along a 3 year HW refresh)
Execute plan by starting at 1 again

While virtualization and cloud technologies provided a separation of hardware from applications, it didn’t necessarily solve this problem.  Even OpenStack by itself did not solve this problem.  As infrastructure software, it was still released and consumed in slow, integrated cycles.
Meanwhile, many interesting developments occurred in the application space.  Microservices, agile development methodologies, CI/CD, containers, DevOps — all focused on the ability to rapidly innovate and rapidly consume software in very small increments comprising a larger whole as opposed to one large integrated release.  This approach has been successful at the application level and has allowed an arms race to develop in the software economy: who can develop new services to drive revenue for their business faster than their competition?
Ironically, this movement has been happening with applications running on the older infrastructure methodology.  Why not leverage these innovations at the infrastructure level as well?
Enter Mirantis Cloud Platform (MCP)…
MCP was designed with the operations-centric approach in mind, to be able to consume and manage cloud infrastructure in the same way modern microservices are delivered at the application level.  The vision for MCP is that of a Continuously Delivered Cloud:

With a single platform for virtual machines, containers and bare metal
Delivered by a CI/CD pipeline
With continuous monitoring

Our rich OpenStack platform has been extended with a full Kubernetes distribution which together enables the deployment and orchestration of VMs, containers and bare metal together, all on the same cloud.  As containers become increasingly important as a means of microservices development and deployment, they can be managed within the same open cloud infrastructure.
Mirantis will update MCP on a continuous basis with a lifecycle determined in weeks, not years.  This allows for the rapid release and consumption of updates to the infrastructure in small increments as opposed to the large integrated releases necessitating the mega-project.  Your consumption is based on DriveTrain, the lifecycle management tool connecting your cloud to the innovation coming from Mirantis.  With DriveTrain you consume the technology at your desired pace, pushed through a CI/CD pipeline and tested in staging, then promoted into production deployment.  In the future, this will include new features and full upgrades performed non-disruptively in an automated fashion.  You will be able to take advantage of the latest innovations quickly, as opposed to waiting for the next infrastructure mega-project.
Operations Support Systems have always been paramount to successful IT delivery, and even more so in a distributed system based on a continuous lifecycle paradigm. StackLight is the OSS that is purpose-built for MCP and provides continuous monitoring to enable automated alerts with a goal of SLA compliance.  This is the same OSS used when your cloud is managed by Mirantis with our Mirantis Managed OpenStack (MMO) offering where we can deliver up to 99.99% SLA guarantees, or if you are managing MCP in-house with your own IT operations.  As part of our Build-Operate-Transfer model, we focus on operational training with StackLight such that post-transfer you are able to use the same in-place StackLight and same in-place standard operating procedures.
Finally!  Infrastructure software that can be consumed and managed in a modern approach just like microservices are consumed and managed at the application level.  Long live the new infrastructure!
To learn more about MCP, please sign up for our webinar on April 26. See you there!
The post Bring In the “New” Infrastructure Stack appeared first on Mirantis | Pure Play Open Cloud.
Quelle: Mirantis

The Galaxy S8 Is A Gorgeous Phone. Too Bad It’s Made By Samsung

Nicole Nguyen / BuzzFeed News

No company was closer to being a trash fire in the past year than Samsung. There were the exploding Note7 batteries, then the exploding Note7 battery replacements, then the exploding washing machines, and then, finally, the exploding Samsung battery factory.

Needless to say, the Korean conglomerate, which recently lost its 1 smartphone maker ranking to Apple for the first time in eight quarters, is looking for a win.

Enter the Galaxy S8, the headliner of Samsung’s Redemption Tour.

During my five days of testing, the Galaxy S8 did not catch fire. In fact, the S8 turned out to be exactly what I had expected after my first hands-on: a gorgeous device with great technology inside. Samsung crammed as much screen into this phone as possible. The Galaxy S8 hardware is 83% glass slab and 17% everything else — and it has all the promise of an iPhone/Pixel killer.

The only problem? Like all Samsung phones, it’s pre-loaded with redundant apps and features you don&;t need. And, though the Galaxy S8 ships with the latest version of Android (7.0 Nougat), eventually the phone will be about five months behind Google’s future operating system updates.

All that aside, the S8 is a *really* good phone, and Samsung devotees with contract renewals coming up are going to want to upgrade ASAP. But those looking to switch will have a lot more to consider.

There’s nothing else on the Android market quite like it.

If you’re looking to get a new high-end Android phone right now, here are the three phones I think you should be considering: the Google Pixel, the LG G6, and the Galaxy S8. (For the purposes of this review, I’m not looking at Motorola, Sony, HTC, or Huawei. Don’t @ me.)

Aesthetically, it’s clear which one is the standout: the Galaxy S8. In my initial review, I loved everything about the Pixel, except its uninspired hardware design. LG’s G6 and its small, display-maximizing borders are, in many ways, similar to the Galaxy S8, but it’s a heavy phone that feels bulky.

Nicole Nguyen / BuzzFeed News

The S8, on the other hand, is wrapped in a slick, polished case. This is especially true of “Midnight Black.” It is Posh Spice wearing an all-leather catsuit and Samsung&039;s other color offerings (“Arctic Silver” and the purplish “Orchid Gray”) pale in comparison. The S8 looks modern and clean, and you’d be hard-pressed to find another Android phone with its looks.

The mind-bogglingly good edge-to-edge wraparound display is crisp and saturated, which we&039;ve come to expect from Samsung. The blacks are extra dark and text appears sharp, pixel-less. The display bleeds into the surrounding hardware, and it’s hard to tell where the screen ends and the phone begins.

The only “bezel” is a centimeter-ish border at the top and bottom. There are no physical buttons on the front of the phone, just a pressure-sensitive, virtual home button area. Every other leading Android phone maker has already removed the home button, and Samsung finally followed suit. To maximize the immersive screen experience, the home button is sometimes invisible (like when you’re watching a video full-screen or playing a game) and you can simply press down on the bottom of the screen to return to the main page.

These screens are huge. There are two models: the S8 with a 5.8-inch display and an S8+ with a 6.2-inch display; both are at 2,960×,440 resolution. The viewing area has been increased by 36% from the previous versions, the S7 and S7 Edge.

But it doesn’t feel like you’re toting around a mini tablet. The nearly half a million extra pixels were added to the S8’s height, and its edges are curved on all four sides, so the phone is surprisingly grabbable.

The curved edges do, however, make texting with two hands in portrait feel a little cramped. When turned on its side, the phone is too wide for my hands to reach the keys in the middle. Perhaps big-handed users will have better luck.

It’s a very tall phone (nearly 6 inches for the S8 and slightly over 6 inches for the S8+), so enabling the phone’s “one-handed mode” has proven very useful for me. You can swipe your thumb diagonally from either bottom corner to use a mini, more manageable version of the software. Although, my frequent use of this feature reveals that perhaps I don’t need a big screen at all?&;?&033;

Nicole Nguyen / BuzzFeed News

Apparently the S8 is “mobile HDR premium certified,” which means that when you watch shows or movies, you see the same colors and contrasts “that filmmakers intended,” according to Samsung. So I did what any other reviewer would do “for journalism”: I bought the Planet Earth II “Mountains” episode and poured myself a glass (or three) of wine (spoiler alert: ibex goats are badass AF). The display is very bright and vibrant — good for getting into Planet Earth, but ultimately worrisome because I fear it will eventually burn my eyeballs to a crisp.

The S8 is 83% screen, so it’s only fitting that this review is also almost 83% about the screen. Here comes the other 17%.

I tried my hardest to trick the S8’s face recognition unlock, but to no avail.

Reports that Samsung’s face recognition technology had been defeated with a photo surfaced last month. I tried to replicate this with a printed-out photo, with a photo onscreen, and with a Photobooth video of me staring at the camera and blinking. The phone was unfazed. I will never be a hacker.

Trickery aside, face recognition is more a matter of convenience than security. It makes up for the awkwardly placed fingerprint sensor and I found myself relying on it quite a bit.

The fingerprint sensor has moved to the back, much to my chagrin.

Nicole Nguyen / BuzzFeed News

The fingerprint unlock feature has traditionally been programmed into the device’s home button. Seeing as the S8 ditched the button, it’s now on the back of the phone. The S8’s fingerprint sensor and the camera feel basically the same, which means I kept smudging the camera lens and unlocking the phone at the same time. It’s really too bad because, minus the finger smears, the camera is quite good.

Speaking of the camera, it’s the same as the Note7’s and the Galaxy S7 before it.

The phone’s rear camera hasn’t changed. It’s a 12MP lens with f/1.7 aperture, and it notably does not have the “dual lens” setup (a camera with two lenses) that Apple, LG, and Huawei introduced with their most recent flagship devices. But I didn’t really miss it in the S8.

Samsung likes to tout its primary camera’s low-light capabilities and fast auto-focus, even with motion. At full zoom, it handled capturing this surfer fairly well (in the rain&033;):

And this darting newt:

Nicole Nguyen / BuzzFeed News

And this amazing lemon poppyseed bundt cake my friend Lauren made:

Nicole Nguyen / BuzzFeed News

The real news is the S8’s upgraded front-facing camera, which is now 8MP (up from 7MP in the Note7) with the same f/1.7 aperture. Here’s an unedited Samsung selfie:

Nicole Nguyen / BuzzFeed News

And an iPhone’s (the iPhone’s camera is just 7 megapixels):

Nicole Nguyen / BuzzFeed News

The main difference is that, because it’s a higher-resolution image, you can zoom in more on the Samsung selfie. I&039;ve showed these photos to multiple people — and votes are split right down the middle. The look of a photo is ultimately a matter of preference and I will let you, Internet, be the final judge.

There are also new Snapchat-style stickers built-in, which…sigh.

Nicole Nguyen / BuzzFeed News

Bixby, the S8’s artificially intelligent assistant, is kind of…dumb right now.

Samsung created its own version of Alexa, Siri, and Google Assistant. It’s called Bixby, and it’s really an umbrella term for three different “intelligent” features: computer vision/image recognition software, a voice-enabled assistant, and a namesake app called Bixby that shows you different personalized “cards” that offer information like weather and upcoming flights (essentially this Google app feature).

Bixby Voice
What makes Bixby different from other assistants is that anything you can do on your phone with touch, it can allegedly do it with your voice instead. You can say things like, “Set display brightness to maximum” and more contextual requests like, “Rotate this photo to the left.” Unfortunately, Bixby Voice doesn’t launch until later this spring and I didn’t get to test it out myself.

Bixby Vision
I was, however, able to try Bixby’s vision recognition software, which uses the phone’s camera to “see.” For example, you can hold up a QR code and Bixby can take you directly to the link, or you can scan a business card and Bixby will isolate the text, then automatically add a contact from the camera app. It does those two things perfectly fine, but it’s not exactly groundbreaking tech. There are plenty of apps that can do the same thing.

One of the seemingly cooler features is being able to point your camera at a piece of furniture or clothing so Bixby can use use Pinterest-powered computer vision to find out where to buy it. I was excited to try this and hoped it would eliminate “where did you get that” small talk with more stylish ladyfriends.

Nicole Nguyen / BuzzFeed News

But when I tried it out (on my boyfriend’s white Adidas shoes and a pair of amazing culottes), Bixby showed me Amazon results that matched the general shape/generic version of what I was trying to search for — and nothing else. In fact, for the culottes, Google reverse image search fared much better and found a Pinterest pin with the specific brand in the description (they are Oak+Fort, btw). I then tried taking a pic of the pin with the hopes that the Pinterest-powered software would pick it up. Nada.

Bixby Vision results are like asking your mom for a custom American Girl doll that’s designed to look just like you, and getting a Secret Hero Mulan from a KB Toys closeout sale instead.

Bixby App
I didn’t find the Bixby app too helpful. It showed me details for an upcoming flight and the week’s weather, plus trending topics on Facebook, which was cool. There was a random puppy napping GIF from Giphy as well, though I’m not sure if that was personalized content.

Right now, it’s hard to assess whether Bixby is a success, because so much of the technology is still in development. As it stands, Bixby is a gimmick that’s fun for showing off to friends but not smart enough to actually be useful. Plus, Google Assistant, which ALSO comes with the S8, can do just about everything Bixby can do and then some.

The battery didn’t explode.

The 3,000 mAh battery in the S8, the version I tested rigorously, performed well. The phone, as I’ve previously mentioned, is all screen, so it isn’t surprising that the display was my 1 battery suck for three days in a row.

The phone’s battery takes about an hour and 40 minutes to fully charge via USB-C cable, and has lasted me about a day and a half on average. This is with reading articles in an hour-long round-trip commute, watching 30-minute videos, followed by 30 minutes of gameplay, and with the usual slew of Facebook and email notifications enabled. Batteries, of course, decay over time, so I’m not sure how long that’ll last. I’ll update this review if that changes.

It feels fast enough.

The Galaxy S8 is the first device to ship with the newest Qualcomm processor: the Snapdragon 835, which is faster than its predecessor (the Snapdragon 820) but uses less power than other chips. The phone felt zippy during this first week of testing, but, like batteries, its processor will decay over time.

I played Super Mario Run, a casual sidescroller, and CSR Racing 2, a 3D graphics-intensive racer, a LOT during the testing period. They played smoothly and didn’t significantly drain the battery.

The processor is apparently robust enough to power a computer, using the new Samsung Dex portable dock accessory (price TBD) that can be hooked up to a monitor, keyboard, and mouse. The dock essentially turns the phone into an instant, lightweight Chromebook — in the demo I saw, the phone ran two apps simultaneously. I didn’t get to test the Dex out either, but once I do, I’ll update this review.

And now, a rant.

As gorgeous as the hardware is, the S8 is a Samsung phone, and I can’t review this device without noting this disclaimer: Samsung phones are (still) filled with so much crap. Samsung’s OS (called “TouchWiz”) looks cleaner than ever before, and it’s getting better. But it remains full of bloatware.

For example, I tested a T-Mobile version of the device. Right off the bat, there are four T-Mobile apps on the homescreen that I’ll likely never use, including “T-Mobile TV.” Then there are Samsung apps, like the mobile browser aptly named “Internet,”plus the Google versions of those exact same apps, like Chrome, already installed. There’s Android Pay, and Samsung Pay. There’s Gallery, and Google Photos.

Then there are Galaxy apps (which are apps made by Samsung or special “themes” to customize how your phone looks), in addition to apps you choose to download from the Google Play Store. There’s a dedicated side button for Bixby Voice, and OK Google can be activated by longpressing the home button. It’s a hot mess.

All of this is pre-loaded on the phone — and I know it can be removed from the home screen or uninstalled, but…ugh&033;

Samsung deeply alters the Android experience, down to the way windows scroll in the app switcher. You’ll see on the Pixel that there’s a smooth, continuous scroll and on the S8, a clunkier unit scroll.

Quelle: <a href="The Galaxy S8 Is A Gorgeous Phone. Too Bad It’s Made By Samsung“>BuzzFeed

Configuring Private DNS Zones and Upstream Nameservers in Kubernetes

Editor’s note: this post is part of a series of in-depth articles on what’s new in Kubernetes .6Many users have existing domain name zones that they would like to integrate into their Kubernetes DNS namespace. For example, hybrid-cloud users may want to resolve their internal “.corp” domain addresses within the cluster. Other users may have a zone populated by a non-Kubernetes service discovery system (like Consul). We’re pleased to announce that, in Kubernetes 1.6, kube-dns adds support for configurable private DNS zones (often called “stub domains”) and external upstream DNS nameservers. In this blog post, we describe how to configure and use this feature.Default lookup flowKubernetes currently supports two DNS policies specified on a per-pod basis using the dnsPolicy flag: “Default” and “ClusterFirst”. If dnsPolicy is not explicitly specified, then “ClusterFirst” is used:If dnsPolicy is set to “Default”, then the name resolution configuration is inherited from the node the pods run on. Note: this feature cannot be used in conjunction with dnsPolicy: “Default”.If dnsPolicy is set to “ClusterFirst”, then DNS queries will be sent to the kube-dns service. Queries for domains rooted in the configured cluster domain suffix (any address ending in “.cluster.local” in the example above) will be answered by the kube-dns service. All other queries (for example, www.kubernetes.io) will be forwarded to the upstream nameserver inherited from the node.Before this feature, it was common to introduce stub domains by replacing the upstream DNS with a custom resolver. However, this caused the custom resolver itself to become a critical path for DNS resolution, where issues with scalability and availability may cause the cluster to lose DNS functionality. This feature allows the user to introduce custom resolution without taking over the entire resolution path.Customizing the DNS FlowBeginning in Kubernetes 1.6, cluster administrators can specify custom stub domains and upstream nameservers by providing a ConfigMap for kube-dns. For example, the configuration below inserts a single stub domain and two upstream nameservers. As specified, DNS requests with the “.acme.local” suffix will be forwarded to a DNS listening at 1..3.4. Additionally, Google Public DNS will serve upstream queries. See ConfigMap Configuration Notes at the end of this section for a few notes about the data format.apiVersion: v1kind: ConfigMapmetadata: name: kube-dns namespace: kube-systemdata: stubDomains: | {“acme.local”: [“1.2.3.4”]} upstreamNameservers: | [“8.8.8.8”, “8.8.4.4”]The diagram below shows the flow of DNS queries specified in the configuration above. With the dnsPolicy set to “ClusterFirst” a DNS query is first sent to the DNS caching layer in kube-dns. From here, the suffix of the request is examined and then forwarded to the appropriate DNS.  In this case, names with the cluster suffix (e.g.; “.cluster.local”) are sent to kube-dns. Names with the stub domain suffix (e.g.; “.acme.local”) will be sent to the configured custom resolver. Finally, requests that do not match any of those suffixes will be forwarded to the upstream DNS.Below is a table of example domain names and the destination of the queries for those domain names:Domain nameServer answering the querykubernetes.default.svc.cluster.localkube-dnsfoo.acme.localcustom DNS (1.2.3.4)widget.comupstream DNS (one of 8.8.8.8, 8.8.4.4)ConfigMap Configuration NotesstubDomains (optional)Format: a JSON map using a DNS suffix key (e.g.; “acme.local”) and a value consisting of a JSON array of DNS IPs.Note: The target nameserver may itself be a Kubernetes service. For instance, you can run your own copy of dnsmasq to export custom DNS names into the ClusterDNS namespace.upstreamNameservers (optional)Format: a JSON array of DNS IPs.Note: If specified, then the values specified replace the nameservers taken by default from the node’s /etc/resolv.confLimits: a maximum of three upstream nameservers can be specifiedExample 1: Adding a Consul DNS Stub DomainIn this example, the user has Consul DNS service discovery system they wish to integrate with kube-dns. The consul domain server is located at 10.150.0.1, and all consul names have the suffix “.consul.local”.  To configure Kubernetes, the cluster administrator simply creates a ConfigMap object as shown below.  Note: in this example, the cluster administrator did not wish to override the node’s upstream nameservers, so they didn’t need to specify the optional upstreamNameservers field.Example 2: Replacing the Upstream NameserversIn this example the cluster administrator wants to explicitly force all non-cluster DNS lookups to go through their own nameserver at 172.16.0.1.  Again, this is easy to accomplish; they just need to create a ConfigMap with the upstreamNameservers field specifying the desired nameserver.apiVersion: v1kind: ConfigMapmetadata: name: kube-dns namespace: kube-systemdata: upstreamNameservers: | [“172.16.0.1”]Get involvedIf you’d like to contribute or simply help provide feedback and drive the roadmap, join our community. Specifically for network related conversations participate though one of these channels:Chat with us on the Kubernetes Slack network channel Join our Special Interest Group, SIG-Network, which meets on Tuesdays at 14:00 PTThanks for your support and contributions. Read more in-depth posts on what’s new in Kubernetes 1.6 here.–Bowei Du, Software Engineer and Matthew DeLio, Product Manager, GooglePost questions (or answer questions) on Stack Overflow Join the community portal for advocates on K8sPortGet involved with the Kubernetes project on GitHub Follow us on Twitter @Kubernetesio for latest updatesConnect with the community on SlackDownload Kubernetes
Quelle: kubernetes

We installed an OpenStack cluster with close to 1000 nodes on Kubernetes. Here’s what we found out.

The post We installed an OpenStack cluster with close to 1000 nodes on Kubernetes. Here&;s what we found out. appeared first on Mirantis | Pure Play Open Cloud.
Late last year, we did a number of tests that looked at deploying close to 1000 OpenStack nodes on a pre-installed Kubernetes cluster as a way of finding out what problems you might run into, and fixing them, if at all possible. In all we found several, and though in general, we were able to fix them, we thought it would still be good to go over the types of things you need to look for.
Overall we deployed an OpenStack cluster that contained more than 900 nodes using Fuel-CCP on a Kubernetes that had been deployed using Kargo. The Kargo tool is part of the Kubernetes Incubator project and uses the Large Kubernetes Cluster reference architecture as a baseline.
As we worked, we documented issues we found, and contributed fixes to both the deployment tool and reference design document where appropriate.  Here&8217;s what we found.
The setup
We started with just over 175 bare metal machines, allocating 3 of them to be used for Kubernetes control plane services placement (API servers, ETCD, Kubernetes scheduler, etc.), others had 5 virtual machines on each node, where every VM was used as a Kubernetes minion node.
Each bare metal node had the following specifications:

HP ProLiant DL380 Gen9
CPU &; 2x Intel(R) Xeon(R) CPU E5-2680 v3 @ .50GHz
RAM &8211; 264G
Storage &8211; 3.0T on RAID on HP Smart Array P840 Controller, HDD &8211; 12 x HP EH0600JDYTL
Network &8211; 2x Intel Corporation Ethernet 10G 2P X710

The running OpenStack cluster (as far as Kubernetes is concerned) consists of:

OpenStack control plane services running on close to 150 pods over 6 nodes
Close to 4500 pods spread across all of the remaining nodes, at 5 pods per minion node

One major Prometheus problem
During the experiments we used Prometheus monitoring tool to verify resource consumption and the load put on the core system, Kubernetes, and OpenStack services. One note of caution when using Prometheus:  Deleting old data from Prometheus storage will indeed improve the Prometheus API speed &; but it will also delete any previous cluster information, making it unavailable for post-run investigation. So make sure to document any observed issue and its debugging thoroughly!
Thankfully, we had in fact done that documentation, but one thing we&8217;ve decided to do going forward to prevent this problem by configuring Prometheus to back up data to one of the persistent time series databases it supports, such as InfluxDB, Cassandra, or OpenTSDB. By default, Prometheus is optimized to be used as a real time monitoring / alerting system, and there is an official recommendation from the Prometheus developers team to keep monitoring data retention for only about 15 days to keep the tool working in a quick and responsive manner. By setting up the backup, we can store old data for an extended amount of time for post-processing needs.
Problems we experienced in our testing
Huge load on kube-apiserver
Symptoms
Initially, we had a setup with all nodes (including the Kubernetes control plane nodes) running on a virtualized environment, but the load was such that the API servers couldn&8217;t function at all so they were moved to bare metal.  Still, both API servers running in the Kubernetes cluster were utilising up to 2000% of the available CPU (up to 45% of total node compute performance capacity), even after we migrated them to hardware nodes.
Root cause
All services that are not on Kubernetes masters (kubelet, kube-proxy on all minions) access kube-apiserver via a local NGINX proxy. Most of those requests are watch requests that lie mostly idle after they are initiated (most timeouts on them are defined to be about 5-10 minutes). NGINX was configured to cut idle connections in 3 seconds, which causes all clients to reconnect and (even worse) restart aborted SSL sessions. On the server side, this it makes kube-apiserver consume up to 2000% of the CPU resources, making other requests very slow.
Solution
Set the proxy_timeout parameter to 10 minutes in the nginx.conf configuration file, which should be more than long enough to prevent cutting SSL connections before te requests time out by themselves. After this fix was applied, one api-server consumed only 100% of CPU (about 2% of total node compute performance capacity), while the second one consumed about 200% (about 4% of total node compute performance capacity) of CPU (with average response time 200-400 ms).
Upstream issue status: fixed
Make the Kargo deployment tool set proxy_timeout to 10 minutes: issue fixed with pull request by Fuel CCP team.
KubeDNS cannot handle large cluster load with default settings
Symptoms
When deploying an OpenStack cluster on this scale, kubedns becomes unresponsive because of the huge load. This end up with a slew of errors appearing in the logs of the dnsmasq container in the kubedns pod:
Maximum number of concurrent DNS queries reached.
Also, dnsmasq containers sometimes get restarted due to hitting the high memory limit.
Root cause
First of all, kubedns often seems to fail often in this architecture, even even without load. During the experiment we observed continuous kubedns container restarts even on an empty (but large enough) Kubernetes cluster. Restarts are caused by liveness check failing, although nothing notable is observed in any logs.
Second, dnsmasq should have taken the load off kubedns, but it needs some tuning to behave as expected (or, frankly, at all) for large loads.
Solution
Fixing this problem requires several levels of steps:

Set higher limits for dnsmasq containers: they take on most of the load.
Add more replicas to kubedns replication controller (we decided to stop on 6 replicas, as it solved the observed issue &8211; for bigger clusters it might be needed to increase this number even more).
Increase number of parallel connections dnsmasq should handle (we used &8211;dns-forward-max=1000 which is recommended parameter setup in dnsmasq manuals)
Increase size of cache in dnsmasq: it has hard limit of 10000 cache entries which seems to be reasonable amount.
Fix kubedns to handle this behaviour in proper way.

Upstream issue status: partially fixed
and 2 are fixed by making them configurable in Kargo by Kubernetes team: issue, pull request.
Others &8211; work has not yet started.
Kubernetes scheduler needs to be deployed on a separate node
Symptoms
During the huge OpenStack cluster deployment against Kubernetes, scheduler, controller-manager and kube-apiserver start fighting for CPU cycles as all of them are under a large load. Scheduler is the most resource-hungry, so we need a way to deploy it separately.
Solution
We moved the Kubernetes scheduler moved to a separate node manually; all other schedulers were manually killed to prevent them from moving to other nodes.
Upstream issue status: reported
Issue in Kargo.
Kubernetes scheduler is ineffective with pod antiaffinity
Symptoms
It takes a significant amount of time for the scheduler to process pods with pod antiaffinity rules specified on them. It is spending about 2-3 seconds on each pod, which makes the time needed to deploy an OpenStack cluster of 900 nodes unexpectedly long (about 3h for just scheduling). OpenStack deployment requires the use of antiaffinity rules to prevent several OpenStack compute nodes from being launched on a single Kubernetes minion node.
Root cause
According to profiling results, most of the time is spent on creating new Selectors to match existing pods against, which triggers the validation step. Basically we have O(N^2) unnecessary validation steps (where N = the number of pods), even if we have just 5 deployment entities scheduled to most of the nodes.
Solution
In this case, we needed a specific optimization that speeds up scheduling time up to about 300 ms/pod. It’s still slow in terms of common sense (about 30m spent just on pods scheduling for a 900 node OpenStack cluster), but it is at least close to reasonable. This solution lowers the number of very expensive operations to O(N), which is better, but still depends on the number of pods instead of deployments, so there is space for future improvement.
Upstream issue status: fixed
The optimization was merged into master (pull request) and backported to the 1.5 branch, and is part of the 1.5.2 release (pull request).
kube-apiserver has low default rate limit
Symptoms
Different services start receiving “429 Rate Limit Exceeded” HTTP errors, even though kube-apiservers can take more load. This problem was discovered through a scheduler bug (see below).
Solution
Raise the rate limit for the kube-apiserver process via the &8211;max-requests-inflight option. It defaults to 400, but in our case it became workable at 2000. This number should be configurable in the Kargo deployment tool, as bigger deployments might require an even bigger increase.
Upstream issue status: reported
Issue in Kargo.
Kubernetes scheduler can schedule incorrectly
Symptoms
When creating a huge amount of pods (~4500 in our case) and faced with HTTP 429 errors from kube-apiserver (see above), the scheduler can schedule several pods of the same deployment on one node, in violation of the pod antiaffinity rule on them.
Root cause
See pull request below.
Upstream issue status: pull request
Fix from Mirantis team: pull request (merged, part of Kubernetes 1.6 release).
Docker sometimes becomes unresponsive
Symptoms
The Docker process sometimes hangs on several nodes, which results in timeouts in the kubelet logs. When this happens, pods cannot be spawned or terminated successfully on the affected minion node. Although many similar issues have been fixed in Docker since 1.11, we are still observing these symptoms.
Workaround
The Docker daemon logs do not contain any notable information, so we had to restart the docker service on the affected node. (During the experiments we used Docker 1.12.3, but we have observed similar symptoms in 1.13 release candidates as well.)
OpenStack services don’t handle PXC pseudo-deadlocks
Symptoms
When run in parallel, create operations of lots of resources were failing with DBError saying that Percona Xtradb Cluster identified a deadlock and the transaction should be restarted.
Root cause
oslo.db is responsible for wrapping errors received from the DB into proper classes so that services can restart transactions if similar errors occur, but it didn’t expect the error in the format that is being sent by Percona. After we fixed this, however, we still experienced similar errors, because not all transactions that could be restarted were properly decorated in Nova code.
Upstream issue status: fixed
The bug has been fixed by Roman Podolyaka’s CR and backported to Newton. It fixes Percona deadlock error detection, but there’s at least one place in Nova that still needs to be fixed.
Live migration failed with live_migration_uri configuration
Symptoms
With the live_migration_uri configuration, live migrations fails because one compute host can’t connect to a libvirt on another host.
Root cause
We can’t specify which IP address to use in the live_migration_uri template, so it was trying to use the address from the first interface that happened to be in the PXE network, while libvirt listens on the private network. We couldn’t use the live_migration_inbound_addr, which would solve this problem, because of a problem in upstream Nova.
Upstream issue status: fixed
A bug in Nova has been fixed and backported to Newton. We switched to using live_migration_inbound_addr after that.
The post We installed an OpenStack cluster with close to 1000 nodes on Kubernetes. Here&8217;s what we found out. appeared first on Mirantis | Pure Play Open Cloud.
Quelle: Mirantis

Scam Calls Are The Devil, So Phone Carriers Are Doing More To Block Them

T-Mobile said today that it will start labeling scam calls in caller ID.

In a statement, T-Mobile told BuzzFeed News the filtering technology works by comparing an incoming call to a “database of tens of thousands of known scammer numbers” and analyzing how people typically respond to the number. If identified as a possible scam, the number will identify the caller as “Scam Likely” on the phone&;s screen.

T-Mobile

T-Mobile users will also be able to opt into complete scam call blocking by dialing # (), which won&039;t allow any calls labeled as possible scams to go through. The technology is launching on April 5. Scam calls affect 75% of Americans and collectively cost consumers half a billion dollars, according to T-Mobile estimates.

T-Mobile said these calls come in myriad forms, “from IRS scam to Medicare cons to &039;free&039; travel to credit card scams,” according to its press release. The company is specifically targeting automated calls that ping thousands of customers per minute.

AT&T introduced similar technology, AT&T Protect, in December 2016, for iOS and Android phones.

T-Mobile said the feature was part of its collaboration with the Federal Communication Commission to battle robocalling. The FCC voted unanimously on March 23 to give telecommunications companies broader power in filtering out spam calls.

FCC chairman Ajit Pai said robocalls are the consumer complaint his bureau receives.

And the commission is establishing a “Robocall Strike Force” in hopes of eliminating these loathed calls.

Giphy

Which isn&039;t surprising.

Quelle: <a href="Scam Calls Are The Devil, So Phone Carriers Are Doing More To Block Them“>BuzzFeed

Announcing Azure SQL Database Threat Detection general availability coming in April 2017

Today we are happy to announce that Azure SQL Database Threat Detection will be generally available in April 2017. Through the course of the preview we optimized our offering and it has received 90% positive feedback from customers regarding the usefulness of SQL threat alerts. At general availability, SQL Database Threat Detection will cost of $15 / server / month. We invite you to try it out for 60 days for free.

What is Azure SQL Database Threat Detection?

Azure SQL Database Threat Detection provides an additional layer of security intelligence built into the Azure SQL Database service. It helps customers using Azure SQL Database to secure their databases within minutes without needing to be an expert in database security. It works around the clock to learn, profile and detect anomalous database activities indicating unusual and potentially harmful attempts to access or exploit databases.

How to use SQL Database Threat Detection

Just turn it ON – SQL Database Threat Detection is incredibly easy to enable. You simply switch on Threat Detection from the Auditing & Threat Detection configuration blade in the Azure portal, select the Azure storage account (where the SQL audit log will be saved) and configure at least one email address for receiving alerts.

Real-time actionable alerts – SQL Database Threat Detection runs multiple sets of algorithms which detect potential vulnerabilities and SQL injection attacks, as well as anomalous database access patterns (such as access from an unusual location or by an unfamiliar principal). Security officers or other designated administrators get email notification once a threat is detected on the database. Each notification provides details of the suspicious activity and recommends how to further investigate and mitigate the threat.

Live SQL security tile – SQL Database Threat Detection integrates its alerts with Azure Security Center. A live SQL security tile within the database blade in Azure portal tracks the status of active threats. Clicking on the SQL security tile launches the Azure Security Center alerts blade and provides an overview of active SQL threats detected on the database. Clicking on a specific alert provides additional details and actions for investigating and preventing similar threats in the future.

Investigate SQL threat – Each SQL Database Threat Detection email notification and Azure Security Center alert includes a direct link to the SQL audit log. Clicking on this link launches the Azure portal and opens the SQL audit records around the time of the event, making it easy to find the SQL statements that were executed (who accessed, what he did and when) and determine if the event was legitimate or malicious (e.g. application vulnerability to SQL injection was exploited, someone breached sensitive data, etc.).

Recent customer experiences using SQL Database Threat Detection

During our preview, many customers benefited from the enhanced security SQL Database Threat detection provides.

Case : Anomalous access from a new network to production database

Justin Windhorst, Head of IT North America at Archroma

“Archroma runs a custom built ERP/e-Commerce solution, consisting of more than 20 Web servers and 20 Databases using a multi-tier architecture, with Azure SQL Database at its core.  I love the built-in features that bring added value such as the enterprise level features: SQL Database Threat Detection (for security) and Geo Replication (for availability).  Case in point: With just a few clicks, we successfully enabled SQL Auditing and Threat Detection to ensure continuous monitoring occurred for all activities within our databases.  A few weeks later, we received an email alert that "Someone has logged on to our SQL server from an unusual location”. The alert was triggered as a result of unusual access from a new network to our production database for testing purposes.  Knowing that we have the power of Microsoft behind us that automatically brings to light anomalous such as these gives Archroma incredible peace of mind, and thus allows us to focus on delivering a better service.”

Case : Preventing SQL Injection attacks

Fernando Sola, Cloud Technology Consultant at HSI

“Thanks to Azure SQL Database Threat Detection, we were able to detect and fix vulnerabilities to SQL injection attacks and prevent potential threats to our database. I was very impressed with how simple it was to enable threat detection using the Azure portal. A while after enabling Azure SQL Database Threat Detection, we received an email notification about ‘An application generated a faulty SQL statement on our database, which may indicate a vulnerability of the application to SQL injection.’  The notification provided details of the suspicious activity and recommended actions how to observe and fix the faulty SQL statement in our application code using SQL Audit Log. The alert also pointed me to the Microsoft documentation that explained us how to fix an application code that is vulnerable to SQL injection attacks. SQL Database Threat Detection and Auditing help my team to secure our data in Azure SQL Database within minutes and with no need to be an expert in databases or security.”

Summary

We would like to thank all of you that provided feedback and shared experiences during the public preview. Your active participation validated that SQL Database Threat Detection provides an important layer of security built into the Azure SQL Database service to help secure databases without the need to be an expert in database security.

Click the following links for more information to:

Learn more about Azure SQL Database Threat Detection

Learn more about Azure SQL Database Auditing
Learn more about Azure SQL Database
Learn more about Azure Security Center

Quelle: Azure

The first and final words on OpenStack availability zones

The post The first and final words on OpenStack availability zones appeared first on Mirantis | Pure Play Open Cloud.
Availability zones are one of the most frequently misunderstood and misused constructs in OpenStack. Each cloud operator has a different idea about what they are and how to use them. What&;s more, each OpenStack service implements availability zones differently &; if it even implements them at all.
Often, there isn’t even agreement over the basic meaning or purpose of availability zones.
On the one hand, you have the literal interpretation of the words “availability zone”, which would lead us to think of some logical subdivision of resources into failure domains, allowing cloud applications to intelligently deploy in ways to maximize their availability. (We’ll be running with this definition for the purposes of this article.)
On the other hand, the different ways that projects implement availability zones lend themselves to certain ways of using the feature as a result. In other words, because this feature has been implemented in a flexible manner that does not tie us down to one specific concept of an availability zone, there&8217;s a lot of confusion over how to use them.
In this article, we&8217;ll look at the traditional definition of availability zones, insights into and best practices for planning and using them, and even a little bit about non-traditional uses. Finally, we hope to address the question: Are availability zones right for you?
OpenStack availability zone Implementations
One of the things that complicates use of availability zones is that each OpenStack project implements them in their own way (if at all). If you do plan to use availability zones, you should evaluate which OpenStack projects you&8217;re going to use support them, and how that affects your design and deployment of those services.
For the purposes of this article, we will look at three core services with respect to availability zones: , Cinder, and Neutron. We won&8217;t go into the steps to set up availability zones, but but instead we&8217;ll focus on a few of the key decision points, limitations, and trouble areas with them.
Nova availability zones
Since host aggregates were first introduced in OpenStack Grizzly, I have seen a lot of confusion about availability zones in Nova. Nova tied their availability zone implementation to host aggregates, and because the latter is a feature unique to the Nova project, its implementation of availability zones is also unique.
I have had many people tell me they use availability zones in Nova, convinced they are not using host aggregates. Well, I have news for these people &8212; all* availability zones in Nova are host aggregates (though not all host aggregates are availability zones):
* Exceptions being the default_availability_zone that compute nodes are placed into when not in another user-defined availability zone, and the internal_service_availability_zone where other nova services live
Some of this confusion may come from the nova CLI. People may do a quick search online, see they can create an availability zone with one command, and may not realize that they’re actually creating a host aggregate. Ex:
$ nova aggregate-create <aggregate name> <AZ name>
$ nova aggregate-create HA1 AZ1
+—-+———+——————-+——-+————————+
| Id | Name    | Availability Zone | Hosts | Metadata               |
+—-+———+——————-+——-+————————+
| 4  |   HA1   | AZ1               |       | ‘availability_zone=AZ1’|
+—-+———+——————-+——-+————————+
I have seen people get confused with the second argument (the AZ name). This is just a shortcut for setting the availability_zone metadata for a new host aggregate you want to create.
This command is equivalent to creating a host aggregate, and then setting the metadata:
$ nova aggregate-create HA1
+—-+———+——————-+——-+———-+
| Id | Name    | Availability Zone | Hosts | Metadata |
+—-+———+——————-+——-+———-+
| 7  |   HA1   | –                 |       |          |
+—-+———+——————-+——-+———-+
$ nova aggregate-set-metadata HA1 availability_zone=AZ1
Metadata has been successfully updated for aggregate 7.
+—-+———+——————-+——-+————————+
| Id | Name    | Availability Zone | Hosts | Metadata               |
+—-+———+——————-+——-+————————+
| 7  |   HA1   | AZ1               |       | ‘availability_zone=AZ1’|
+—-+———+——————-+——-+————————+
Doing it this way, it’s more apparent that the workflow is the same as any other host aggregate, the only difference is the “magic” metadata key availability_zone which we set to AZ1 (notice we also see AZ1 show up under the Availability Zone column). And now when we add compute nodes to this aggregate, they will be automatically transferred out of the default_availability_zone and into the one we have defined. For example:
Before:
$ nova availability-zone-list
| nova              | available                              |
| |- node-27        |                                        |
| | |- nova-compute | enabled :-) 2016-11-06T05:13:48.000000 |
+——————-+—————————————-+
After:
$ nova availability-zone-list
| AZ1               | available                              |
| |- node-27        |                                        |
| | |- nova-compute | enabled :-) 2016-11-06T05:13:48.000000 |
+——————-+—————————————-+
Note that there is one behavior that sets apart the availability zone host aggregates apart from others. Namely, nova does not allow you to assign the same compute host to multiple aggregates with conflicting availability zone assignments. For example, we can first add compute a node to the previously created host aggregate with availability zone AZ1:
$ nova aggregate-add-host HA1 node-27
Host node-27 has been successfully added for aggregate 7
+—-+——+——————-+———-+————————+
| Id | Name | Availability Zone | Hosts    | Metadata               |
+—-+——+——————-+———-+————————+
| 7  | HA1  | AZ1               | ‘node-27’| ‘availability_zone=AZ1’|
+—-+——+——————-+———-+————————+
Next, we create a new host aggregate for availability zone AZ2:
$ nova aggregate-create HA2

+—-+———+——————-+——-+———-+
| Id | Name    | Availability Zone | Hosts | Metadata |
+—-+———+——————-+——-+———-+
| 13 |   HA2   | –                 |       |          |
+—-+———+——————-+——-+———-+

$ nova aggregate-set-metadata HA2 availability_zone=AZ2
Metadata has been successfully updated for aggregate 13.
+—-+———+——————-+——-+————————+
| Id | Name    | Availability Zone | Hosts | Metadata               |
+—-+———+——————-+——-+————————+
| 13 |   HA2   | AZ2               |       | ‘availability_zone=AZ2’|
+—-+———+——————-+——-+————————+
Now if we try to add the original compute node to this aggregate, we get an error because this aggregate has a conflicting availability zone:
$ nova aggregate-add-host HA2 node-27
ERROR (Conflict): Cannot add host node-27 in aggregate 13: host exists (HTTP 409)
+—-+——+——————-+———-+————————+
| Id | Name | Availability Zone | Hosts    | Metadata               |
+—-+——+——————-+———-+————————+
| 13 | HA2  | AZ2               |          | ‘availability_zone=AZ2’|
+—-+——+——————-+———-+————————+
(Incidentally, it is possible to have multiple host aggregates with the same availability_zone metadata, and add the same compute host to both. However, there are few, if any, good reasons for doing this.)
In contrast, Nova allows you to assign this compute node to another host aggregate with other metadata fields, as long as the availability_zone doesn&8217;t conflict:
You can see this if you first create a third host aggregate:
$ nova aggregate-create HA3
+—-+———+——————-+——-+———-+
| Id | Name    | Availability Zone | Hosts | Metadata |
+—-+———+——————-+——-+———-+
| 16 |   HA3   | –                 |       |          |
+—-+———+——————-+——-+———-+
Next, tag  the host aggregate for some purpose not related to availability zones (for example, an aggregate to track compute nodes with SSDs):
$ nova aggregate-set-metadata HA3 ssd=True
Metadata has been successfully updated for aggregate 16.
+—-+———+——————-+——-+———–+
| Id | Name    | Availability Zone | Hosts |  Metadata |
+—-+———+——————-+——-+———–+
| 16 |   HA3   | –                 |       | ‘ssd=True’|
+—-+———+——————-+——-+———–+
Adding original node to another aggregate without conflicting availability zone metadata works:
$ nova aggregate-add-host HA3 node-27
Host node-27 has been successfully added for aggregate 16
+—-+——-+——————-+———–+————+
| Id | Name  | Availability Zone | Hosts     |  Metadata  |
+—-+——-+——————-+———–+————+
| 16 | HA3   | –                 | ‘node-27′ | ‘ssd=True’ |
+—-+——-+——————-+———–+————+
(Incidentally, Nova will also happily let you assign the same compute node to another aggregate with ssd=False for metadata, even though that clearly doesn&8217;t make sense. Conflicts are only checked/enforced in the case of the availability_zone metadata.)
Nova configuration also holds parameters relevant to availability zone behavior. In the nova.conf read by your nova-api service, you can set a default availability zone for scheduling, which is used if users do not specify an availability zone in the API call:
[DEFAULT]
default_schedule_zone=AZ1
However, most operators leave this at its default setting (None), because it allows users who don’t care about availability zones to omit it from their API call, and the workload will be scheduled to any availability zone where there is available capacity.
If a user requests an invalid or undefined availability zone, the Nova API will report back with an HTTP 400 error. There is no availability zone fallback option.
Cinder
Creating availability zones in Cinder is accomplished by setting the following configuration parameter in cinder.conf, on the nodes where your cinder-volume service runs:
[DEFAULT]
storage_availability_zone=AZ1
Note that you can only set the availability zone to one value. This is consistent with availability zones in other OpenStack projects that do not allow for the notion of overlapping failure domains or multiple failure domain levels or tiers.
The change takes effect when you restart your cinder-volume services. You can confirm your availability zone assignments as follows:
cinder service-list
+—————+——————-+——+———+——-+
|     Binary    |        Host       | Zone | Status  | State |
+—————+——————-+——+———+——-+
| cinder-volume | hostname1@LVM     |  AZ1 | enabled |   up  |
| cinder-volume | hostname2@LVM     |  AZ2 | enabled |   up  |
If you would like to establish a default availability zone, you can set the this parameter in cinder.conf on the cinder-api nodes:
[DEFAULT]
default_availability_zone=AZ1
This instructs Cinder which availability zone to use if the API call did not specify one. If you don’t, it will use a hardcoded default, nova. In the case of our example, where we&8217;ve set the default availability zone in Nova to AZ1, this would result in a failure. This also means that unlike Nova, users do not have the flexibility of omitting availability zone information and expecting that Cinder will select any available backend with spare capacity in any availability zone.
Therefore, you have a choice with this parameter. You can set it to one of your availability zones so API calls without availability zone information don’t fail, but causing a potential situation of uneven storage allocation across your availability zones. Or, you can not set this parameter, and accept that user API calls that forget or omit availability zone info will fail.
Another option is to set the default to a non-existent availability zone you-must-specify-an-AZ or something similar, so when the call fails due to the non-existant availability zone, this information will be included in the error message sent back to the client.
Your storage backends, storage drivers, and storage architecture may also affect how you set up your availability zones. If we are using the reference Cinder LVM ISCSI Driver deployed on commodity hardware, and that hardware fits the same availability zone criteria of our computes, then we could setup availability zones to match what we have defined in Nova. We could also do the same if we had a third party storage appliance in each availability zone, e.g.:
|     Binary    |           Host          | Zone | Status  | State |
| cinder-volume | hostname1@StorageArray1 |  AZ1 | enabled |   up  |
| cinder-volume | hostname2@StorageArray2 |  AZ2 | enabled |   up  |
(Note: Notice that the hostnames (hostname1 and hostname2) are still different in this example. The cinder multi-backend feature allows us to configure multiple storage backends in the same cinder.conf (for the same cinder-volume service), but Cinder availability zones can only be defined per cinder-volume service, and not per-backend per-cinder-volume service. In other words, if you define multiple backends in one cinder.conf, they will all inherit the same availability zone.)
However, in many cases if you’re using a third party storage appliance, then these systems usually have their own built-in redundancy that exist outside of OpenStack notions of availability zones. Similarly if you use a distributed storage solution like Ceph, then availability zones have little or no meaning in this context. In this case, you can forgo Cinder availability zones.
The one issue in doing this, however, is that any availability zones you defined for Nova won’t match. This can cause problems when Nova makes API calls to Cinder &; for example, when performing a Boot from Volume API call through Nova. If Nova decided to provision your VM in AZ1, it will tell Cinder to provision a boot volume in AZ1, but Cinder doesn’t know anything about AZ1, so this API call will fail. To prevent this from happening, we need to set the following parameter in cinder.conf on your nodes running cinder-api:
[DEFAULT]
allow_availability_zone_fallback=True
This parameter prevents the API call from failing, because if the requested availability zone does not exist, Cinder will fallback to another availability zone (whichever you defined in default_availability_zone parameter, or in storage_availability_zone if the default is not set). The hardcoded default storage_availability_zone is nova, so the fallback availability zone should match the default availability zone for your cinder-volume services, and everything should work.
The easiest way to solve the problem, however is to remove the AvailabilityZoneFilter from your filter list in cinder.conf on nodes running cinder-scheduler. This makes the scheduler ignore any availability zone information passed to it altogether, which may also be helpful in case of any availability zone configuration mismatch.
Neutron
Availability zone support was added to Neutron in the Mitaka release. Availability zones can be set for DHCP and L3 agents in their respective configuration files:
[AGENT]
Availability_zone = AZ1
Restart the agents, and confirm availability zone settings as follows:
neutron agent-show <agent-id>
+———————+————+
| Field               | Value      |
+———————+————+
| availability_zone   | AZ1        |

If you would like to establish a default availability zone, you can set the this parameter in neutron.conf on neutron-server nodes:
[DEFAULT]
default_availability_zones=AZ1,AZ2
This parameters tells Neutron which availability zones to use if the API call did not specify any. Unlike Cinder, you can specify multiple availability zones, and leaving it undefined places no constraints in scheduling, as there are no hard coded defaults. If you have users making API calls that do not care about the availability zone, then you can enumerate all your availability zones for this parameter, or simply leave it undefined &8211; both would yield the same result.
Additionally, when users do specify an availability zone, such requests are fulfilled as a “best effort” in Neutron. In other words, there is no need for an availability zone fallback parameter, because your API call still execute even if your availability zone hint can’t be satisfied.
Another important distinction that sets Neutron aside from Nova and Cinder is that it implements availability zones as scheduler hints, meaning that on the client side you can repeat this option to chain together multiple availability zone specifications in the event that more than one availability zone would satisfy your availability criteria. For example:
$ neutron net-create –availability-zone-hint AZ1 –availability-zone-hint AZ2 new_network
As with Cinder, the Neutron plugins and backends you’re using deserve attention, as the support or need for availability zones may be different depending on their implementation. For example, if you’re using a reference Neutron deployment with the ML2 plugin and with DHCP and L3 agents deployed to commodity hardware, you can likely place these agents consistently according to the same availability zone criteria used for your computes.
Whereas in contrast, other alternatives such as the Contrail plugin for Neutron do not support availability zones. Or if you are using Neutron DVR for example, then availability zones have limited significance for Layer 3 Neutron.
OpenStack Project availability zone Comparison Summary
Before we move on, it&8217;s helpful to review how each project handles availability zones.

Nova
Cinder
Neutron

Default availability zone scheduling
Can set to one availability zone or None
Can set one availability zone; cannot set None
Can set to any list of availability zones or none

Availability zone fallback
None supported
Supported through configuration
N/A; scheduling to availability zones done on a best effort basis

Availability zone definition restrictions
No more than availability zone per nova-compute
No more than 1 availability zone per cinder-volume
No more than 1 availability zone per neutron agent

Availability zone client restrictions
Can specify one availability zone or none
Can specify one availability zone or none
Can specify an arbitrary number of availability zones

Availability zones typically used when you have &;
Commodity HW for computes, libvirt driver
Commodity HW for storage, LVM iSCSI driver
Commodity HW for neutron agents, ML2 plugin

Availability zones not typically used when you have&8230;
Third party hypervisor drivers that manage their own HA for VMs (DRS for VCenter)
Third party drivers, backends, etc. that manage their own HA
Third party plugins, backends, etc. that manage their own HA

Best Practices for availability zones
Now let&8217;s talk about how to best make use of availability zones.
What should my availability zones represent?
The first thing you should do as a cloud operator is to nail down your own accepted definition of an availability zone and how you will use them, and remain consistent. You don’t want to end up in a situation where availability zones are taking on more than one meaning in the same cloud. For example:
Fred’s AZ            | Example of AZ used to perform tenant workload isolation
VMWare cluster 1 AZ | Example of AZ used to select a specific hypervisor type
Power source 1 AZ   | Example of AZ used to select a specific failure domain
Rack 1 AZ           | Example of AZ used to select a specific failure domain
Such a set of definitions would be a source of inconsistency and confusion in your cloud. It’s usually better to keep things simple with one availability zone definition, and use OpenStack features such as Nova Flavors or Nova/Cinder boot hints to achieve other requirements for multi-tenancy isolation, ability to select between different hypervisor options and other features, and so on.
Note that OpenStack currently does not support the concept of multiple failure domain levels/tiers. Even though we may have multiple ways to define failure domains (e.g., by power circuit, rack, room, etc), we must pick a single convention.
For the purposes of this article, we will discuss availability zones in the context of failure domains. However, we will cover one other use for availability zones in the third section.
How many availability zones do I need?
One question that customers frequently get hung up on is how many availability zones they should create. This can be tricky because the setup and management of availability zones involves stakeholders at every layer of the solution stack, from tenant applications to cloud operators, down to data center design and planning.

A good place to start is your cloud application requirements: How many failure domains are they designed to work with (i.e. redundancy factor)? The likely answer is two (primary + backup), three (for example, for a database or other quorum-based system), or one (for example, a legacy app with single points of failure). Therefore, the vast majority of clouds will have either 2, 3, or 1 availability zone.
Also keep in mind that as a general design principle, you want to minimize the number of availability zones in your environment, because the side effect of availability zone proliferation is that you are dividing your capacity into more resource islands. The resource utilization in each island may not be equal, and now you have an operational burden to track and maintain capacity in each island/availability zone. Also, if you have a lot of availability zones (more than the redundancy factor of tenant applications), tenants are left to guess which availability zones to use and which have available capacity.
How do I organize my availability zones?
The value proposition of availability zones is that tenants are able to achieve a higher level of availability in their applications. In order to make good on that proposition, we need to design our availability zones in ways that mitigate single points of failure.             
For example, if our resources are split between two power sources in the data center, then we may decide to define two resource pools (availability zones) according to their connected power source:
Or, if we only have one TOR switch in our racks, then we may decide to define availability zones by rack. However, here we can run into problems if we make each rack its own availability zone, as this will not scale from a capacity management perspective for more than 2-3 racks/availability zones (because of the &;resource island&; problem). In this case, you might consider dividing/arranging your total rack count into into 2 or 3 logical groupings that correlate to your defined availability zones:
We may also find situations where we have redundant TOR switch pairs in our racks, power source diversity to each rack, and lack a single point of failure. You could still place racks into availability zones as in the previous example, but the value of availability zones is marginalized, since you need to have a double failure (e.g., both TORs in the rack failing) to take down a rack.
Ultimately with any of the above scenarios, the value added by your availability zones will depend on the likelihood and expression of failure modes &8211; both the ones you have designed for, and the ones you have not. But getting accurate and complete information on such failure modes may not be easy, so the value of availability zones from this kind of unplanned failure can be difficult to pin down.
There is however another area where availability zones have the potential to provide value &8212; planned maintenance. Have you ever needed to move, recable, upgrade, or rebuild a rack? Ever needed to shut off power to part of the data center to do electrical work? Ever needed to apply disruptive updates to your hypervisors, like kernel or QEMU security updates? How about upgrades of OpenStack or the hypervisor operating system?
Chances are good that these kinds of planned maintenance are a higher source of downtime than unplanned hardware failures that happen out of the blue. Therefore, this type of planned maintenance can also impact how you define your availability zones. In general the grouping of racks into availability zones (described previously) still works well, and is the most common availability zone paradigm we see for our customers.
However, it could affect the way in which you group your racks into availability zones. For example, you may choose to create availability zones based on physical parameters like which floor, room or building the equipment is located in, or other practical considerations that would help in the event you need to vacate or rebuild certain space in your DC(s). Ex:
One of the limitations of the OpenStack implementation of availability zones made apparent in this example is that you are forced to choose one definition. Applications can request a specific availability zone, but are not offered the flexibility of requesting building level isolation, vs floor, room, or rack level isolation. This will be a fixed, inherited property of the availability zones you create. If you need more, then you start to enter the realm of other OpenStack abstractions like Regions and Cells.
Other uses for availability zones?
Another way in which people have found to use availability zones is in multi-hypervisor environments. In the ideal world of the “implementation-agnostic” cloud, we would abstract such underlying details from users of our platform. However there are some key differences between hypervisors that make this aspiration difficult. Take the example of KVM & VMWare.
When an iSCSI target is provisioned through with the LVM iSCSI Cinder driver, it cannot be attached to or consumed by ESXi nodes. The provision request must go through VMWare’s VMDK Cinder driver, which proxies the creation and attachment requests to vCenter. This incompatibility can cause errors and thus a negative user experience issues if the user tries to attach a volume to their hypervisor provisioned from the wrong backend.
To solve this problem, some operators use availability zones as a way for users to select hypervisor types (for example, AZ_KVM1, AZ_VMWARE1), and set the following configuration in their nova.conf:
[cinder]
cross_az_attach = False
This presents users with an error if they attempt to attach a volume from one availability zone (e.g., AZ_VMWARE1) to another availability zone (e.g., AZ_KVM1). The call would have certainly failed regardless, but with a different error from farther downstream from one of the nova-compute agents, instead of from nova-api. This way, it&8217;s easier for the user to see where they went wrong and correct the problem.
In this case, the availability zone also acts as a tag to remind users which hypervisor their VM resides on.
In my opinion, this is stretching the definition of availability zones as failure domains. VMWare may be considered its own failure domain separate from KVM, but that’s not the primary purpose of creating availability zones this way. The primary purpose is to differentiate hypervisor types. Different hypervisor types have a different set of features and capabilities. If we think about the problem in these terms, there are a number of other solutions that come to mind:

Nova Flavors: Define a “VMWare” set of flavors that map to your VCenter cluster(s). If your tenants that use VMWare are different than your tenants who use KVM, you can make them private flavors, ensuring that tenants only ever see or interact with the hypervisor type they expect.
Cinder: Similarly for Cinder, you can make the VMWare backend private to specific tenants, and/or set quotas differently for that backend, to ensure that tenants will only allocate from the correct storage pools for their hypervisor type.
Image metadata: You can tag your images according to the hypervisor they run on. Set image property hypervisor_type to vmware for VMDK images, and to qemu for other images. The ComputeCapabilitiesFilter in Nova will honor the hypervisor placement request.

Soo… Are availability zones right for me?
So wrapping up, think of availability zones in terms of:

Unplanned failures: If you have a good history of failure data, well understood failure modes, or some known single point of failure that availability zones can help mitigate, then availability zones may be a good fit for your environment.
Planned maintenance: If you have well understood maintenance processes that have awareness of your availability zone definitions and can take advantage of them, then availability zones may be a good fit for your environment. This is where availability zones can provide some of the creates value, but is also the most difficult to achieve, as it requires intelligent availability zone-aware rolling updates and upgrades, and affects how data center personnel perform maintenance activities.
Tenant application design/support: If your tenants are running legacy apps, apps with single points of failure, or do not support use of availability zones in their provisioning process, then availability zones will be of no use for these workloads.
Other alternatives for achieving app availability: Workloads built for geo-redundancy can achieve the same level of HA (or better) in a multi-region cloud. If this were the case for your cloud and your cloud workloads, availability zones would be unnecessary.
OpenStack projects: You should factor into your decision the limitations and differences in availability zone implementations between different OpenStack projects, and perform this analysis for all OpenStack projects in your scope of deployment.

The post The first and final words on OpenStack availability zones appeared first on Mirantis | Pure Play Open Cloud.
Quelle: Mirantis

The first and final words on OpenStack availability zones

The post The first and final words on OpenStack availability zones appeared first on Mirantis | Pure Play Open Cloud.
Availability zones are one of the most frequently misunderstood and misused constructs in OpenStack. Each cloud operator has a different idea about what they are and how to use them. What&;s more, each OpenStack service implements availability zones differently &; if it even implements them at all.
Often, there isn’t even agreement over the basic meaning or purpose of availability zones.
On the one hand, you have the literal interpretation of the words “availability zone”, which would lead us to think of some logical subdivision of resources into failure domains, allowing cloud applications to intelligently deploy in ways to maximize their availability. (We’ll be running with this definition for the purposes of this article.)
On the other hand, the different ways that projects implement availability zones lend themselves to certain ways of using the feature as a result. In other words, because this feature has been implemented in a flexible manner that does not tie us down to one specific concept of an availability zone, there&8217;s a lot of confusion over how to use them.
In this article, we&8217;ll look at the traditional definition of availability zones, insights into and best practices for planning and using them, and even a little bit about non-traditional uses. Finally, we hope to address the question: Are availability zones right for you?
OpenStack availability zone Implementations
One of the things that complicates use of availability zones is that each OpenStack project implements them in their own way (if at all). If you do plan to use availability zones, you should evaluate which OpenStack projects you&8217;re going to use support them, and how that affects your design and deployment of those services.
For the purposes of this article, we will look at three core services with respect to availability zones: , Cinder, and Neutron. We won&8217;t go into the steps to set up availability zones, but but instead we&8217;ll focus on a few of the key decision points, limitations, and trouble areas with them.
Nova availability zones
Since host aggregates were first introduced in OpenStack Grizzly, I have seen a lot of confusion about availability zones in Nova. Nova tied their availability zone implementation to host aggregates, and because the latter is a feature unique to the Nova project, its implementation of availability zones is also unique.
I have had many people tell me they use availability zones in Nova, convinced they are not using host aggregates. Well, I have news for these people &8212; all* availability zones in Nova are host aggregates (though not all host aggregates are availability zones):
* Exceptions being the default_availability_zone that compute nodes are placed into when not in another user-defined availability zone, and the internal_service_availability_zone where other nova services live
Some of this confusion may come from the nova CLI. People may do a quick search online, see they can create an availability zone with one command, and may not realize that they’re actually creating a host aggregate. Ex:
$ nova aggregate-create <aggregate name> <AZ name>
$ nova aggregate-create HA1 AZ1
+—-+———+——————-+——-+————————+
| Id | Name    | Availability Zone | Hosts | Metadata               |
+—-+———+——————-+——-+————————+
| 4  |   HA1   | AZ1               |       | ‘availability_zone=AZ1’|
+—-+———+——————-+——-+————————+
I have seen people get confused with the second argument (the AZ name). This is just a shortcut for setting the availability_zone metadata for a new host aggregate you want to create.
This command is equivalent to creating a host aggregate, and then setting the metadata:
$ nova aggregate-create HA1
+—-+———+——————-+——-+———-+
| Id | Name    | Availability Zone | Hosts | Metadata |
+—-+———+——————-+——-+———-+
| 7  |   HA1   | –                 |       |          |
+—-+———+——————-+——-+———-+
$ nova aggregate-set-metadata HA1 availability_zone=AZ1
Metadata has been successfully updated for aggregate 7.
+—-+———+——————-+——-+————————+
| Id | Name    | Availability Zone | Hosts | Metadata               |
+—-+———+——————-+——-+————————+
| 7  |   HA1   | AZ1               |       | ‘availability_zone=AZ1’|
+—-+———+——————-+——-+————————+
Doing it this way, it’s more apparent that the workflow is the same as any other host aggregate, the only difference is the “magic” metadata key availability_zone which we set to AZ1 (notice we also see AZ1 show up under the Availability Zone column). And now when we add compute nodes to this aggregate, they will be automatically transferred out of the default_availability_zone and into the one we have defined. For example:
Before:
$ nova availability-zone-list
| nova              | available                              |
| |- node-27        |                                        |
| | |- nova-compute | enabled :-) 2016-11-06T05:13:48.000000 |
+——————-+—————————————-+
After:
$ nova availability-zone-list
| AZ1               | available                              |
| |- node-27        |                                        |
| | |- nova-compute | enabled :-) 2016-11-06T05:13:48.000000 |
+——————-+—————————————-+
Note that there is one behavior that sets apart the availability zone host aggregates apart from others. Namely, nova does not allow you to assign the same compute host to multiple aggregates with conflicting availability zone assignments. For example, we can first add compute a node to the previously created host aggregate with availability zone AZ1:
$ nova aggregate-add-host HA1 node-27
Host node-27 has been successfully added for aggregate 7
+—-+——+——————-+———-+————————+
| Id | Name | Availability Zone | Hosts    | Metadata               |
+—-+——+——————-+———-+————————+
| 7  | HA1  | AZ1               | ‘node-27’| ‘availability_zone=AZ1’|
+—-+——+——————-+———-+————————+
Next, we create a new host aggregate for availability zone AZ2:
$ nova aggregate-create HA2

+—-+———+——————-+——-+———-+
| Id | Name    | Availability Zone | Hosts | Metadata |
+—-+———+——————-+——-+———-+
| 13 |   HA2   | –                 |       |          |
+—-+———+——————-+——-+———-+

$ nova aggregate-set-metadata HA2 availability_zone=AZ2
Metadata has been successfully updated for aggregate 13.
+—-+———+——————-+——-+————————+
| Id | Name    | Availability Zone | Hosts | Metadata               |
+—-+———+——————-+——-+————————+
| 13 |   HA2   | AZ2               |       | ‘availability_zone=AZ2’|
+—-+———+——————-+——-+————————+
Now if we try to add the original compute node to this aggregate, we get an error because this aggregate has a conflicting availability zone:
$ nova aggregate-add-host HA2 node-27
ERROR (Conflict): Cannot add host node-27 in aggregate 13: host exists (HTTP 409)
+—-+——+——————-+———-+————————+
| Id | Name | Availability Zone | Hosts    | Metadata               |
+—-+——+——————-+———-+————————+
| 13 | HA2  | AZ2               |          | ‘availability_zone=AZ2’|
+—-+——+——————-+———-+————————+
(Incidentally, it is possible to have multiple host aggregates with the same availability_zone metadata, and add the same compute host to both. However, there are few, if any, good reasons for doing this.)
In contrast, Nova allows you to assign this compute node to another host aggregate with other metadata fields, as long as the availability_zone doesn&8217;t conflict:
You can see this if you first create a third host aggregate:
$ nova aggregate-create HA3
+—-+———+——————-+——-+———-+
| Id | Name    | Availability Zone | Hosts | Metadata |
+—-+———+——————-+——-+———-+
| 16 |   HA3   | –                 |       |          |
+—-+———+——————-+——-+———-+
Next, tag  the host aggregate for some purpose not related to availability zones (for example, an aggregate to track compute nodes with SSDs):
$ nova aggregate-set-metadata HA3 ssd=True
Metadata has been successfully updated for aggregate 16.
+—-+———+——————-+——-+———–+
| Id | Name    | Availability Zone | Hosts |  Metadata |
+—-+———+——————-+——-+———–+
| 16 |   HA3   | –                 |       | ‘ssd=True’|
+—-+———+——————-+——-+———–+
Adding original node to another aggregate without conflicting availability zone metadata works:
$ nova aggregate-add-host HA3 node-27
Host node-27 has been successfully added for aggregate 16
+—-+——-+——————-+———–+————+
| Id | Name  | Availability Zone | Hosts     |  Metadata  |
+—-+——-+——————-+———–+————+
| 16 | HA3   | –                 | ‘node-27′ | ‘ssd=True’ |
+—-+——-+——————-+———–+————+
(Incidentally, Nova will also happily let you assign the same compute node to another aggregate with ssd=False for metadata, even though that clearly doesn&8217;t make sense. Conflicts are only checked/enforced in the case of the availability_zone metadata.)
Nova configuration also holds parameters relevant to availability zone behavior. In the nova.conf read by your nova-api service, you can set a default availability zone for scheduling, which is used if users do not specify an availability zone in the API call:
[DEFAULT]
default_schedule_zone=AZ1
However, most operators leave this at its default setting (None), because it allows users who don’t care about availability zones to omit it from their API call, and the workload will be scheduled to any availability zone where there is available capacity.
If a user requests an invalid or undefined availability zone, the Nova API will report back with an HTTP 400 error. There is no availability zone fallback option.
Cinder
Creating availability zones in Cinder is accomplished by setting the following configuration parameter in cinder.conf, on the nodes where your cinder-volume service runs:
[DEFAULT]
storage_availability_zone=AZ1
Note that you can only set the availability zone to one value. This is consistent with availability zones in other OpenStack projects that do not allow for the notion of overlapping failure domains or multiple failure domain levels or tiers.
The change takes effect when you restart your cinder-volume services. You can confirm your availability zone assignments as follows:
cinder service-list
+—————+——————-+——+———+——-+
|     Binary    |        Host       | Zone | Status  | State |
+—————+——————-+——+———+——-+
| cinder-volume | hostname1@LVM     |  AZ1 | enabled |   up  |
| cinder-volume | hostname2@LVM     |  AZ2 | enabled |   up  |
If you would like to establish a default availability zone, you can set the this parameter in cinder.conf on the cinder-api nodes:
[DEFAULT]
default_availability_zone=AZ1
This instructs Cinder which availability zone to use if the API call did not specify one. If you don’t, it will use a hardcoded default, nova. In the case of our example, where we&8217;ve set the default availability zone in Nova to AZ1, this would result in a failure. This also means that unlike Nova, users do not have the flexibility of omitting availability zone information and expecting that Cinder will select any available backend with spare capacity in any availability zone.
Therefore, you have a choice with this parameter. You can set it to one of your availability zones so API calls without availability zone information don’t fail, but causing a potential situation of uneven storage allocation across your availability zones. Or, you can not set this parameter, and accept that user API calls that forget or omit availability zone info will fail.
Another option is to set the default to a non-existent availability zone you-must-specify-an-AZ or something similar, so when the call fails due to the non-existant availability zone, this information will be included in the error message sent back to the client.
Your storage backends, storage drivers, and storage architecture may also affect how you set up your availability zones. If we are using the reference Cinder LVM ISCSI Driver deployed on commodity hardware, and that hardware fits the same availability zone criteria of our computes, then we could setup availability zones to match what we have defined in Nova. We could also do the same if we had a third party storage appliance in each availability zone, e.g.:
|     Binary    |           Host          | Zone | Status  | State |
| cinder-volume | hostname1@StorageArray1 |  AZ1 | enabled |   up  |
| cinder-volume | hostname2@StorageArray2 |  AZ2 | enabled |   up  |
(Note: Notice that the hostnames (hostname1 and hostname2) are still different in this example. The cinder multi-backend feature allows us to configure multiple storage backends in the same cinder.conf (for the same cinder-volume service), but Cinder availability zones can only be defined per cinder-volume service, and not per-backend per-cinder-volume service. In other words, if you define multiple backends in one cinder.conf, they will all inherit the same availability zone.)
However, in many cases if you’re using a third party storage appliance, then these systems usually have their own built-in redundancy that exist outside of OpenStack notions of availability zones. Similarly if you use a distributed storage solution like Ceph, then availability zones have little or no meaning in this context. In this case, you can forgo Cinder availability zones.
The one issue in doing this, however, is that any availability zones you defined for Nova won’t match. This can cause problems when Nova makes API calls to Cinder &; for example, when performing a Boot from Volume API call through Nova. If Nova decided to provision your VM in AZ1, it will tell Cinder to provision a boot volume in AZ1, but Cinder doesn’t know anything about AZ1, so this API call will fail. To prevent this from happening, we need to set the following parameter in cinder.conf on your nodes running cinder-api:
[DEFAULT]
allow_availability_zone_fallback=True
This parameter prevents the API call from failing, because if the requested availability zone does not exist, Cinder will fallback to another availability zone (whichever you defined in default_availability_zone parameter, or in storage_availability_zone if the default is not set). The hardcoded default storage_availability_zone is nova, so the fallback availability zone should match the default availability zone for your cinder-volume services, and everything should work.
The easiest way to solve the problem, however is to remove the AvailabilityZoneFilter from your filter list in cinder.conf on nodes running cinder-scheduler. This makes the scheduler ignore any availability zone information passed to it altogether, which may also be helpful in case of any availability zone configuration mismatch.
Neutron
Availability zone support was added to Neutron in the Mitaka release. Availability zones can be set for DHCP and L3 agents in their respective configuration files:
[AGENT]
Availability_zone = AZ1
Restart the agents, and confirm availability zone settings as follows:
neutron agent-show <agent-id>
+———————+————+
| Field               | Value      |
+———————+————+
| availability_zone   | AZ1        |

If you would like to establish a default availability zone, you can set the this parameter in neutron.conf on neutron-server nodes:
[DEFAULT]
default_availability_zones=AZ1,AZ2
This parameters tells Neutron which availability zones to use if the API call did not specify any. Unlike Cinder, you can specify multiple availability zones, and leaving it undefined places no constraints in scheduling, as there are no hard coded defaults. If you have users making API calls that do not care about the availability zone, then you can enumerate all your availability zones for this parameter, or simply leave it undefined &8211; both would yield the same result.
Additionally, when users do specify an availability zone, such requests are fulfilled as a “best effort” in Neutron. In other words, there is no need for an availability zone fallback parameter, because your API call still execute even if your availability zone hint can’t be satisfied.
Another important distinction that sets Neutron aside from Nova and Cinder is that it implements availability zones as scheduler hints, meaning that on the client side you can repeat this option to chain together multiple availability zone specifications in the event that more than one availability zone would satisfy your availability criteria. For example:
$ neutron net-create –availability-zone-hint AZ1 –availability-zone-hint AZ2 new_network
As with Cinder, the Neutron plugins and backends you’re using deserve attention, as the support or need for availability zones may be different depending on their implementation. For example, if you’re using a reference Neutron deployment with the ML2 plugin and with DHCP and L3 agents deployed to commodity hardware, you can likely place these agents consistently according to the same availability zone criteria used for your computes.
Whereas in contrast, other alternatives such as the Contrail plugin for Neutron do not support availability zones. Or if you are using Neutron DVR for example, then availability zones have limited significance for Layer 3 Neutron.
OpenStack Project availability zone Comparison Summary
Before we move on, it&8217;s helpful to review how each project handles availability zones.

Nova
Cinder
Neutron

Default availability zone scheduling
Can set to one availability zone or None
Can set one availability zone; cannot set None
Can set to any list of availability zones or none

Availability zone fallback
None supported
Supported through configuration
N/A; scheduling to availability zones done on a best effort basis

Availability zone definition restrictions
No more than availability zone per nova-compute
No more than 1 availability zone per cinder-volume
No more than 1 availability zone per neutron agent

Availability zone client restrictions
Can specify one availability zone or none
Can specify one availability zone or none
Can specify an arbitrary number of availability zones

Availability zones typically used when you have &;
Commodity HW for computes, libvirt driver
Commodity HW for storage, LVM iSCSI driver
Commodity HW for neutron agents, ML2 plugin

Availability zones not typically used when you have&8230;
Third party hypervisor drivers that manage their own HA for VMs (DRS for VCenter)
Third party drivers, backends, etc. that manage their own HA
Third party plugins, backends, etc. that manage their own HA

Best Practices for availability zones
Now let&8217;s talk about how to best make use of availability zones.
What should my availability zones represent?
The first thing you should do as a cloud operator is to nail down your own accepted definition of an availability zone and how you will use them, and remain consistent. You don’t want to end up in a situation where availability zones are taking on more than one meaning in the same cloud. For example:
Fred’s AZ            | Example of AZ used to perform tenant workload isolation
VMWare cluster 1 AZ | Example of AZ used to select a specific hypervisor type
Power source 1 AZ   | Example of AZ used to select a specific failure domain
Rack 1 AZ           | Example of AZ used to select a specific failure domain
Such a set of definitions would be a source of inconsistency and confusion in your cloud. It’s usually better to keep things simple with one availability zone definition, and use OpenStack features such as Nova Flavors or Nova/Cinder boot hints to achieve other requirements for multi-tenancy isolation, ability to select between different hypervisor options and other features, and so on.
Note that OpenStack currently does not support the concept of multiple failure domain levels/tiers. Even though we may have multiple ways to define failure domains (e.g., by power circuit, rack, room, etc), we must pick a single convention.
For the purposes of this article, we will discuss availability zones in the context of failure domains. However, we will cover one other use for availability zones in the third section.
How many availability zones do I need?
One question that customers frequently get hung up on is how many availability zones they should create. This can be tricky because the setup and management of availability zones involves stakeholders at every layer of the solution stack, from tenant applications to cloud operators, down to data center design and planning.

A good place to start is your cloud application requirements: How many failure domains are they designed to work with (i.e. redundancy factor)? The likely answer is two (primary + backup), three (for example, for a database or other quorum-based system), or one (for example, a legacy app with single points of failure). Therefore, the vast majority of clouds will have either 2, 3, or 1 availability zone.
Also keep in mind that as a general design principle, you want to minimize the number of availability zones in your environment, because the side effect of availability zone proliferation is that you are dividing your capacity into more resource islands. The resource utilization in each island may not be equal, and now you have an operational burden to track and maintain capacity in each island/availability zone. Also, if you have a lot of availability zones (more than the redundancy factor of tenant applications), tenants are left to guess which availability zones to use and which have available capacity.
How do I organize my availability zones?
The value proposition of availability zones is that tenants are able to achieve a higher level of availability in their applications. In order to make good on that proposition, we need to design our availability zones in ways that mitigate single points of failure.             
For example, if our resources are split between two power sources in the data center, then we may decide to define two resource pools (availability zones) according to their connected power source:
Or, if we only have one TOR switch in our racks, then we may decide to define availability zones by rack. However, here we can run into problems if we make each rack its own availability zone, as this will not scale from a capacity management perspective for more than 2-3 racks/availability zones (because of the &;resource island&; problem). In this case, you might consider dividing/arranging your total rack count into into 2 or 3 logical groupings that correlate to your defined availability zones:
We may also find situations where we have redundant TOR switch pairs in our racks, power source diversity to each rack, and lack a single point of failure. You could still place racks into availability zones as in the previous example, but the value of availability zones is marginalized, since you need to have a double failure (e.g., both TORs in the rack failing) to take down a rack.
Ultimately with any of the above scenarios, the value added by your availability zones will depend on the likelihood and expression of failure modes &8211; both the ones you have designed for, and the ones you have not. But getting accurate and complete information on such failure modes may not be easy, so the value of availability zones from this kind of unplanned failure can be difficult to pin down.
There is however another area where availability zones have the potential to provide value &8212; planned maintenance. Have you ever needed to move, recable, upgrade, or rebuild a rack? Ever needed to shut off power to part of the data center to do electrical work? Ever needed to apply disruptive updates to your hypervisors, like kernel or QEMU security updates? How about upgrades of OpenStack or the hypervisor operating system?
Chances are good that these kinds of planned maintenance are a higher source of downtime than unplanned hardware failures that happen out of the blue. Therefore, this type of planned maintenance can also impact how you define your availability zones. In general the grouping of racks into availability zones (described previously) still works well, and is the most common availability zone paradigm we see for our customers.
However, it could affect the way in which you group your racks into availability zones. For example, you may choose to create availability zones based on physical parameters like which floor, room or building the equipment is located in, or other practical considerations that would help in the event you need to vacate or rebuild certain space in your DC(s). Ex:
One of the limitations of the OpenStack implementation of availability zones made apparent in this example is that you are forced to choose one definition. Applications can request a specific availability zone, but are not offered the flexibility of requesting building level isolation, vs floor, room, or rack level isolation. This will be a fixed, inherited property of the availability zones you create. If you need more, then you start to enter the realm of other OpenStack abstractions like Regions and Cells.
Other uses for availability zones?
Another way in which people have found to use availability zones is in multi-hypervisor environments. In the ideal world of the “implementation-agnostic” cloud, we would abstract such underlying details from users of our platform. However there are some key differences between hypervisors that make this aspiration difficult. Take the example of KVM & VMWare.
When an iSCSI target is provisioned through with the LVM iSCSI Cinder driver, it cannot be attached to or consumed by ESXi nodes. The provision request must go through VMWare’s VMDK Cinder driver, which proxies the creation and attachment requests to vCenter. This incompatibility can cause errors and thus a negative user experience issues if the user tries to attach a volume to their hypervisor provisioned from the wrong backend.
To solve this problem, some operators use availability zones as a way for users to select hypervisor types (for example, AZ_KVM1, AZ_VMWARE1), and set the following configuration in their nova.conf:
[cinder]
cross_az_attach = False
This presents users with an error if they attempt to attach a volume from one availability zone (e.g., AZ_VMWARE1) to another availability zone (e.g., AZ_KVM1). The call would have certainly failed regardless, but with a different error from farther downstream from one of the nova-compute agents, instead of from nova-api. This way, it&8217;s easier for the user to see where they went wrong and correct the problem.
In this case, the availability zone also acts as a tag to remind users which hypervisor their VM resides on.
In my opinion, this is stretching the definition of availability zones as failure domains. VMWare may be considered its own failure domain separate from KVM, but that’s not the primary purpose of creating availability zones this way. The primary purpose is to differentiate hypervisor types. Different hypervisor types have a different set of features and capabilities. If we think about the problem in these terms, there are a number of other solutions that come to mind:

Nova Flavors: Define a “VMWare” set of flavors that map to your VCenter cluster(s). If your tenants that use VMWare are different than your tenants who use KVM, you can make them private flavors, ensuring that tenants only ever see or interact with the hypervisor type they expect.
Cinder: Similarly for Cinder, you can make the VMWare backend private to specific tenants, and/or set quotas differently for that backend, to ensure that tenants will only allocate from the correct storage pools for their hypervisor type.
Image metadata: You can tag your images according to the hypervisor they run on. Set image property hypervisor_type to vmware for VMDK images, and to qemu for other images. The ComputeCapabilitiesFilter in Nova will honor the hypervisor placement request.

Soo… Are availability zones right for me?
So wrapping up, think of availability zones in terms of:

Unplanned failures: If you have a good history of failure data, well understood failure modes, or some known single point of failure that availability zones can help mitigate, then availability zones may be a good fit for your environment.
Planned maintenance: If you have well understood maintenance processes that have awareness of your availability zone definitions and can take advantage of them, then availability zones may be a good fit for your environment. This is where availability zones can provide some of the creates value, but is also the most difficult to achieve, as it requires intelligent availability zone-aware rolling updates and upgrades, and affects how data center personnel perform maintenance activities.
Tenant application design/support: If your tenants are running legacy apps, apps with single points of failure, or do not support use of availability zones in their provisioning process, then availability zones will be of no use for these workloads.
Other alternatives for achieving app availability: Workloads built for geo-redundancy can achieve the same level of HA (or better) in a multi-region cloud. If this were the case for your cloud and your cloud workloads, availability zones would be unnecessary.
OpenStack projects: You should factor into your decision the limitations and differences in availability zone implementations between different OpenStack projects, and perform this analysis for all OpenStack projects in your scope of deployment.

The post The first and final words on OpenStack availability zones appeared first on Mirantis | Pure Play Open Cloud.
Quelle: Mirantis

How To Build Planet Scale Mobile App in Minutes with Xamarin and DocumentDB

Most mobile apps need to store data in the cloud, and  DocumentDB is an awesome cloud database for mobile apps. It has everything a mobile developer needs, a fully managed NoSQL database as a service that scales on demand, and can bring your data where your users go around the globe — completely transparently to your application. Today we are excited to announce Azure DocumentDB SDK for Xamarin mobile platform, enabling mobile apps to interact directly with DocumentDB, without a middle-tier.

Here is what mobile developers get out of the box with DocumentDB:

Rich queries over schemaless data. DocumentDB stores data as schemaless JSON documents in heterogeneous collections, and offers rich and fast queries without the need to worry about schema or indexes.
Fast. Guaranteed. It takes only few milliseconds to read and write documents with DocumentDB. Developers can specify the throughput they need and DocumentDB will honor it with 99.99% SLA.
Limitless Scale. Your DocumentDB collections will grow as your app grows. You can start with small data size and 100s requests per second and grow to arbitrarily large, 10s and 100s of millions requests per second throughput, and petabytes of data.
Globally Distributed. Your mobile app users are on the go, often across the world. DocumentDB is a globally distributed database, and with just one click on a map it will bring the data wherever your users are.
Built-in rich authorization. With DocumentDB you can easy to implement popular patterns like per-user data, or multi-user shared data without custom complex authorization code.
Geo-spatial queries. Many mobile apps offer geo-contextual experiences today. With the first class support for geo-spatial types DocumentDB makes these experiences very easy to accomplish.
Binary attachments. Your app data often includes binary blobs. Native support for attachments makes it easier to use DocumentDB as one-stop shop for your app data.

Let&;s build an app together!

Step . Get Started

It&039;s easy to get started with DocumentDB, just go to Azure portal, create a new DocumentDB account,  go to the Quickstart tab, and download a Xamarin Forms todo list sample, already connected to your DocumentDB account. 

Or if you have an existing Xamarin app, you can just add this DocumentDB NuGet package. Today we support Xamarin.IOS, Xamarin.Android, as well as Xamarin Forms shared libraries.

Step . Work with data

Your data records are stored in DocumentDB as schemaless JSON documents in heterogeneous collections. You can store documents with different structures in the same collection.

In your Xamarin projects you can use language integtated queries over schemaless data:

Step . Add Users

Like many get started samples, the DocumentDB sample you downloaded above authenticates to the service using master key hardcoded in the app&039;s code. This is of course not a good idea for an app you intend to run anywhere except your local emulator. If an attacker gets a hold of the master key, all the data across your DocumentDB account is compromised.

Instead we want our app to only have access to the records for the logged in user. DocumentDB allows developers to grant application read or read/write access to all documents in a collection, a set of documents, or a specific document, depending on the needs.

Here is for example, how to modify our todo list app into a multi-user todolist app, a complete version of the sample is available here: 

Add Login to your app, using Facebook, Active Directory or any other provider.
Create a DocumentDB UserItems collection with /userId as a partition key. Specifying partition key for your collection allows DocumentDB to scale infinitely as the number of our app users growth, while offering fast queries.
Add DocumentDB Resource Token Broker, a simple Web API that authenticates the users and issues short lived tokens to the logged in users with access only to the documents within the user&039;s partition. In this example we host Resource Token Broker in App Service.
Modify the app to authenticate to Resource Token Broker with Facebook and request the resource tokens for the logged in Facebook user, then access users data in the UserItems collection.  

This diagram illustrates the solution. We are investigating eliminating the need for Resource Token Broker by supporting OAuth in DocumentDB first class, please upvote this uservoice item if you think it&039;s a good idea!

Now if we want two users get access to the same todolist, we just add additional permissions to the access token in Resource Token Broker. You can find the complete sample here.

Step . Scale on demand.

DocumentDB is a managed database as a service. As your user base grows, you don&039;t need to worry about provisioning VMs or increasing cores. All you need to tell DocumentDB is how many operations per second (throughput) your app needs. You can specify the throughput via portal Scale tab using a measure of throughput called Request Units per second (RUs). For example, a read operation on a 1KB document requires 1 RU. You can also add alerts for "Throughput" metric to monitor the traffic growth and programmatically change the throughput as alerts fire.

  

Step . Go Planet Scale!

As your app gains popularity, you may acquire users accross the globe. Or may be you just don&039;t want to be caught of guard if a meteorite strkes the Azure data centers where you created your DocumentDB collection. Go to Azure portal, your DocumentDB account, and with a click on a map, make your data continuously replicate to any number of regions accross the world. This ensures your data is available whereever your users are, and you can add failover policies to be prepared for the rainy day.

We hope you find this blog and samples useful to take advantage of DocumentDB in your Xamarin application. Similar pattern can be used in Cordova apps using DocumentDB JavaScript SDK, as well as native iOS / Android apps using DocumentDB REST APIs.

As always, let us know how we are doing and what improvements you&039;d like to see going forward for DocumentDB through UserVoice, StackOverflow azure-documentdb, or Twitter @DocumentDB.
Quelle: Azure

The Alt-Right’s Meltdown Is Just Like Any Other Message Board Drama

Things have gotten bumpy for the alt-right online movement since the election. It’s facing an identity crisis (what does it mean to be the “alt” if you’re getting what you want?) and grappling with certain fundamental questions like “Are we OK with Nazis?” (Even if its very name was coined by, well, Nazis.) The handful of leaders who emerged over the last year or two are at odds with each other over those and other questions, forcing helpless anime-avatared Twitter trolls caught in the middle to choose sides.

The kerfuffle surrounds the DeploraBall, a black-tie-optional party in DC on Inauguration Night. There has been nasty and public fighting among the organizers. Stick with me here: Mike Cernovich, a lawyer who became an alt-right leader after taking up the GamerGate mantle, feuded with a fellow leader who goes by “Baked Alaska” and announced that Baked Alaska had been removed from headlining the event because he had said anti-Semitic things on Twitter. Another leader, Bill Mitchell, announced he was no longer part of the alt-right after they started using the racist hashtag . And just recently, Baked Alaska accused (and sources confirmed to BuzzFeed News) one of the DeploraBall organizers of planting a “rape Melania” sign at an anti-Trump protest in an attempt to make protesters look depraved. In the latest surreal twist, a popular alt-right podcaster and founder of the website The Right Stuff was revealed to have a Jewish wife, which sent his fans into a tailspin.

At first, this disarray might seem surprising. After all, the alt-right claims to be an unprecedented political phenomenon that memed a president into office. But if you want to understand what’s happening there, it’s helpful to think about it as an internet-first creature. While it’s possible — and necessary — to view it through the lens of political or social thought that it echoes, the other way of making sense of it is to look at it as a digital community, regardless of its politics. And if you view it as an online community rather than a political movement, its trajectory starts to look very, very familiar.

What we have here is a classic case of “mod drama.”

As someone who has spent a lot of time taxonimizing online communities, from places like Fark to SomethingAwful, 4chan to Facebook groups for moms, I can assure you that one need only look at how other internet groups rise and fall to see what’s happening in the alt-right.

STAGES OF A “MOD DRAMA” MELTDOWN:

. IRL gone wrong:

KnowYourMeme / Via knowyourmeme.com

The first stage of an online community hive death is the disastrous IRL meetup — for the alt-right this seems to be the DeploraBall. It’s also worth noting that the event does not even need to take place — the disaster can arise simply in the organizing of it. People who spend vast amounts of time on the internet are perhaps not best suited to real-world planning and action. There’s a rich graveyard of notable away-from-keyboard flameouts. Here are just a few examples:

  • DashCon: A meetup for Tumblr users that went so poorly it became a punchline of the worst stereotypes of Tumblr users. The organizers ran into money problems, claiming they needed more money from convention attendees (who had already paid for passes in advance) to keep the hotel space. After a speaker canceled, rumors flew that attendees would only receive compensation in the form of a free hour in the world’s saddest ball pit. (Eventually, organizers sent an email offering refunds.)
  • Goon Island: In 2009, a group of posters from the message board Something Awful attempted to move to Hawaii and live off the land. To the amusement of users on a different board from the same site, the group of message board posters was not exactly suited to life in the wild jungle. One moment in particular — a photo of one of the “goons” (as Something Awful posters call themselves) trying to shoot a wild pig with a BB gun — encapsulated how underprepared they were for the jungle.
  • Celeb Heights: If you’ve ever googled a famous person’s height (which, weirdly, you probably have), chances are you’ve ended up on a celebheights.com, a forum for a small subculture of people obsessed with celebrity sizes. When the owner of the site finally met up with one of the most prolific volunteers, he was shocked to discover the volunteer was shorter than he claimed, thereby throwing off everything he posted. A massive blog post about the drama was made, and the volunteer was permabanned.


2. Metaboard mocking:

Stage two takes place when the community members begin to question their community and leadership within the community itself, using its existing norms and forums to make their points. (On a traditional forum this would take place on the metaboard — a board to talk about the board.) This then forces community leaders to react, and sometimes to overreact. On Reddit, this exists as the /circlejerk board — a board that exists just to make fun of stuff that happens on the other channels, and sometimes to make fun of the admins and leaders.

For the alt-right, Twitter acts as its own form of metaboard — and unlike traditional metaboards, the discussion that happens there takes place in public. Bickering between the leaders like Mike Cernovich, Milo Yiannopoulos, and Baked Alaska over the DeploraBall has created rifts among the loyal. They in turn have begun attacking leadership, often using the same language and tactics previously used by the leadership itself. For example, calling Cernovich “Cuckovich”:

3. Splinter board formation:

Phase three, often a sure indicator of imminent online hive death, is the schism of the most devout into two groups, one of which decamps to another forum. This is also known as “the splinter board.” You can think of the splinter board as an inevitable consequence of the metaboard infighting when things really go south. This trajectory typically happens after moderators of the board run afoul of devout users, usually by instituting hardline rules or issuing bans on users.

One of the best examples of a splinter board is 4chan/8chan. In 2013, 4chan’s admin cracked down on GamerGate talk, and that faction fled to another site, 8chan, who promised an uncensored refuge for those deemed literally too nasty for 4chan.

But splinter boards aren’t just for raucous places like 4chan — you see it in all sorts of tamer internet worlds. For Facebook groups, the telltale signs are in the group names, where a group may proudly proclaim it’s splinter status. Take “Suffolk county thrift without the dumb rules,” a group for buy/sell/trade on Long Island. Clearly, some bad shit went down in the regular Suffolk county thrift group and a new, more lawless group was formed. A popular group for “freebirth” (no midwives or even checkups while pregnant) would ban people who mentioned any sort of medical talk — leading to the splinter group “Freebirth/Unassisted birth — NO JUDGEMENT&;”

For the alt-right, this splinter board schism is well underway.

Just before the New Year, Bill Mitchell, a prolific tweeter and radio host, announced that he was no longer affiliated with the alt-right after he was shocked — shocked&033; — to discover that the alt-right *may* be anti-Semitic and racist. He announced he was now going to be .

In the world of the alt-right — which has a slew of discussion forums, but its most public one is on Twitter — a hashtag can be its own universe. People follow these tags as much as they do individuals. They use the tags to organize themselves and keep up with the latest discussions. So, when a prominent figure rallies around a new one, that person is basically creating a splinter board. Which leads to….

4. An identity crisis of priorities, complete with censorship and fear of outsiders:

In the Neopets forums, a place for people to discuss an online role-playing game for children, experienced a crisis of censorship when mods had to ban any discussion of the Twilight series, going so far as to ban the keywords “Edward,” “Bella,” and “Jacob.”

A community for fans of the parenting podcast The Longest Shortest Time had a meltdown and eventually shut down their Facebook group when social justice topics kept coming up and the discussion became too heated.

A Facebook group for women writers to network, called Binders Full of Women, asked members not to talk about the existence of the group — something that became increasingly improbable as it ballooned to tens of thousands of members. When someone wrote about the group for Vogue, the author was instantly banned and mods treated it as the ultimate betrayal. Yet the group splintered and persisted with real-world conventions for female writers called BinderCon. At least until this year, when there was a dustup over breastfeeding mothers not being allowed to bring their infants (see: Phase 1 about IRL meetup disasters).

The central issue the alt-right seems to be struggling with is to what degree they’re willing to either support or tolerate actual white supremacists and white nationalists — either because they disagree with the actual dogma or because they’re just afraid that it looks bad to outsiders.

And those optics to outsiders are starting to matter more now that the alt-right&;s candidate of choice is in power. Once the goal of getting Trump elected was realized, some of its leaders are experiencing their own swings at mainstream success beyond just “popular poster on the internet.” Bill Mitchell, who gained attention by accurately predicting the election and tweeting A LOT, now appears on Fox News and has ambitions to join the mainstream news media. Milo Yiannopoulos, who was banned from Twitter permanently for writing bad things, is now reportedly being paid $250,000 by Simon & Schuster to write things in book form. For the leaders, real money and careers are at stake over what is acceptable speech within the alt-right.

Having “mod drama” has nothing to do with the political leanings of the alt-right or the fact that it’s mostly male. On the other side of the spectrum, the Facebook pages for people supporting the Women’s March on Washington have become similar epicenters of infighting and mod drama. According to the New York Times, when admins changed the name of one local march page, “many applauded the name change, which was meant to signal the start of a new social justice movement in Nashville, [but] some complained that the event had turned from a march for all women into a march for black women.”

Just as some online dissent about the dogma of a feminist march doesn’t mean the march won’t happen or its goals won’t be achieved, the mod drama of the alt-right doesn’t necessarily diminish its influence. The breakup of the centralized leadership may end up making it more powerful — if the “actual Nazis” cleave to one side, then the “I don’t approve of Nazis” crew like Bill Mitchell will be able to become more mainstream.

It’s impossible to say how the alt-right’s mod drama will ultimately play out. It’s a long way from memers to Neopets posters — one is filled with horrible people bent on moving the Overton window of acceptable social norms and the other is lousy with white supremacists. (I kid&033; I kid&033;) But while we may not be able to tell the future, the past is often a pretty good precedent. So I would humbly suggest, for the movement’s sake, that they invest in a really good ball pit for the inauguration.

Quelle: <a href="The Alt-Right’s Meltdown Is Just Like Any Other Message Board Drama“>BuzzFeed