The first and final words on OpenStack availability zones

The post The first and final words on OpenStack availability zones appeared first on Mirantis | Pure Play Open Cloud.
Availability zones are one of the most frequently misunderstood and misused constructs in OpenStack. Each cloud operator has a different idea about what they are and how to use them. What&;s more, each OpenStack service implements availability zones differently &; if it even implements them at all.
Often, there isn’t even agreement over the basic meaning or purpose of availability zones.
On the one hand, you have the literal interpretation of the words “availability zone”, which would lead us to think of some logical subdivision of resources into failure domains, allowing cloud applications to intelligently deploy in ways to maximize their availability. (We’ll be running with this definition for the purposes of this article.)
On the other hand, the different ways that projects implement availability zones lend themselves to certain ways of using the feature as a result. In other words, because this feature has been implemented in a flexible manner that does not tie us down to one specific concept of an availability zone, there&8217;s a lot of confusion over how to use them.
In this article, we&8217;ll look at the traditional definition of availability zones, insights into and best practices for planning and using them, and even a little bit about non-traditional uses. Finally, we hope to address the question: Are availability zones right for you?
OpenStack availability zone Implementations
One of the things that complicates use of availability zones is that each OpenStack project implements them in their own way (if at all). If you do plan to use availability zones, you should evaluate which OpenStack projects you&8217;re going to use support them, and how that affects your design and deployment of those services.
For the purposes of this article, we will look at three core services with respect to availability zones: , Cinder, and Neutron. We won&8217;t go into the steps to set up availability zones, but but instead we&8217;ll focus on a few of the key decision points, limitations, and trouble areas with them.
Nova availability zones
Since host aggregates were first introduced in OpenStack Grizzly, I have seen a lot of confusion about availability zones in Nova. Nova tied their availability zone implementation to host aggregates, and because the latter is a feature unique to the Nova project, its implementation of availability zones is also unique.
I have had many people tell me they use availability zones in Nova, convinced they are not using host aggregates. Well, I have news for these people &8212; all* availability zones in Nova are host aggregates (though not all host aggregates are availability zones):
* Exceptions being the default_availability_zone that compute nodes are placed into when not in another user-defined availability zone, and the internal_service_availability_zone where other nova services live
Some of this confusion may come from the nova CLI. People may do a quick search online, see they can create an availability zone with one command, and may not realize that they’re actually creating a host aggregate. Ex:
$ nova aggregate-create <aggregate name> <AZ name>
$ nova aggregate-create HA1 AZ1
+—-+———+——————-+——-+————————+
| Id | Name    | Availability Zone | Hosts | Metadata               |
+—-+———+——————-+——-+————————+
| 4  |   HA1   | AZ1               |       | ‘availability_zone=AZ1’|
+—-+———+——————-+——-+————————+
I have seen people get confused with the second argument (the AZ name). This is just a shortcut for setting the availability_zone metadata for a new host aggregate you want to create.
This command is equivalent to creating a host aggregate, and then setting the metadata:
$ nova aggregate-create HA1
+—-+———+——————-+——-+———-+
| Id | Name    | Availability Zone | Hosts | Metadata |
+—-+———+——————-+——-+———-+
| 7  |   HA1   | –                 |       |          |
+—-+———+——————-+——-+———-+
$ nova aggregate-set-metadata HA1 availability_zone=AZ1
Metadata has been successfully updated for aggregate 7.
+—-+———+——————-+——-+————————+
| Id | Name    | Availability Zone | Hosts | Metadata               |
+—-+———+——————-+——-+————————+
| 7  |   HA1   | AZ1               |       | ‘availability_zone=AZ1’|
+—-+———+——————-+——-+————————+
Doing it this way, it’s more apparent that the workflow is the same as any other host aggregate, the only difference is the “magic” metadata key availability_zone which we set to AZ1 (notice we also see AZ1 show up under the Availability Zone column). And now when we add compute nodes to this aggregate, they will be automatically transferred out of the default_availability_zone and into the one we have defined. For example:
Before:
$ nova availability-zone-list
| nova              | available                              |
| |- node-27        |                                        |
| | |- nova-compute | enabled :-) 2016-11-06T05:13:48.000000 |
+——————-+—————————————-+
After:
$ nova availability-zone-list
| AZ1               | available                              |
| |- node-27        |                                        |
| | |- nova-compute | enabled :-) 2016-11-06T05:13:48.000000 |
+——————-+—————————————-+
Note that there is one behavior that sets apart the availability zone host aggregates apart from others. Namely, nova does not allow you to assign the same compute host to multiple aggregates with conflicting availability zone assignments. For example, we can first add compute a node to the previously created host aggregate with availability zone AZ1:
$ nova aggregate-add-host HA1 node-27
Host node-27 has been successfully added for aggregate 7
+—-+——+——————-+———-+————————+
| Id | Name | Availability Zone | Hosts    | Metadata               |
+—-+——+——————-+———-+————————+
| 7  | HA1  | AZ1               | ‘node-27’| ‘availability_zone=AZ1’|
+—-+——+——————-+———-+————————+
Next, we create a new host aggregate for availability zone AZ2:
$ nova aggregate-create HA2

+—-+———+——————-+——-+———-+
| Id | Name    | Availability Zone | Hosts | Metadata |
+—-+———+——————-+——-+———-+
| 13 |   HA2   | –                 |       |          |
+—-+———+——————-+——-+———-+

$ nova aggregate-set-metadata HA2 availability_zone=AZ2
Metadata has been successfully updated for aggregate 13.
+—-+———+——————-+——-+————————+
| Id | Name    | Availability Zone | Hosts | Metadata               |
+—-+———+——————-+——-+————————+
| 13 |   HA2   | AZ2               |       | ‘availability_zone=AZ2’|
+—-+———+——————-+——-+————————+
Now if we try to add the original compute node to this aggregate, we get an error because this aggregate has a conflicting availability zone:
$ nova aggregate-add-host HA2 node-27
ERROR (Conflict): Cannot add host node-27 in aggregate 13: host exists (HTTP 409)
+—-+——+——————-+———-+————————+
| Id | Name | Availability Zone | Hosts    | Metadata               |
+—-+——+——————-+———-+————————+
| 13 | HA2  | AZ2               |          | ‘availability_zone=AZ2’|
+—-+——+——————-+———-+————————+
(Incidentally, it is possible to have multiple host aggregates with the same availability_zone metadata, and add the same compute host to both. However, there are few, if any, good reasons for doing this.)
In contrast, Nova allows you to assign this compute node to another host aggregate with other metadata fields, as long as the availability_zone doesn&8217;t conflict:
You can see this if you first create a third host aggregate:
$ nova aggregate-create HA3
+—-+———+——————-+——-+———-+
| Id | Name    | Availability Zone | Hosts | Metadata |
+—-+———+——————-+——-+———-+
| 16 |   HA3   | –                 |       |          |
+—-+———+——————-+——-+———-+
Next, tag  the host aggregate for some purpose not related to availability zones (for example, an aggregate to track compute nodes with SSDs):
$ nova aggregate-set-metadata HA3 ssd=True
Metadata has been successfully updated for aggregate 16.
+—-+———+——————-+——-+———–+
| Id | Name    | Availability Zone | Hosts |  Metadata |
+—-+———+——————-+——-+———–+
| 16 |   HA3   | –                 |       | ‘ssd=True’|
+—-+———+——————-+——-+———–+
Adding original node to another aggregate without conflicting availability zone metadata works:
$ nova aggregate-add-host HA3 node-27
Host node-27 has been successfully added for aggregate 16
+—-+——-+——————-+———–+————+
| Id | Name  | Availability Zone | Hosts     |  Metadata  |
+—-+——-+——————-+———–+————+
| 16 | HA3   | –                 | ‘node-27′ | ‘ssd=True’ |
+—-+——-+——————-+———–+————+
(Incidentally, Nova will also happily let you assign the same compute node to another aggregate with ssd=False for metadata, even though that clearly doesn&8217;t make sense. Conflicts are only checked/enforced in the case of the availability_zone metadata.)
Nova configuration also holds parameters relevant to availability zone behavior. In the nova.conf read by your nova-api service, you can set a default availability zone for scheduling, which is used if users do not specify an availability zone in the API call:
[DEFAULT]
default_schedule_zone=AZ1
However, most operators leave this at its default setting (None), because it allows users who don’t care about availability zones to omit it from their API call, and the workload will be scheduled to any availability zone where there is available capacity.
If a user requests an invalid or undefined availability zone, the Nova API will report back with an HTTP 400 error. There is no availability zone fallback option.
Cinder
Creating availability zones in Cinder is accomplished by setting the following configuration parameter in cinder.conf, on the nodes where your cinder-volume service runs:
[DEFAULT]
storage_availability_zone=AZ1
Note that you can only set the availability zone to one value. This is consistent with availability zones in other OpenStack projects that do not allow for the notion of overlapping failure domains or multiple failure domain levels or tiers.
The change takes effect when you restart your cinder-volume services. You can confirm your availability zone assignments as follows:
cinder service-list
+—————+——————-+——+———+——-+
|     Binary    |        Host       | Zone | Status  | State |
+—————+——————-+——+———+——-+
| cinder-volume | hostname1@LVM     |  AZ1 | enabled |   up  |
| cinder-volume | hostname2@LVM     |  AZ2 | enabled |   up  |
If you would like to establish a default availability zone, you can set the this parameter in cinder.conf on the cinder-api nodes:
[DEFAULT]
default_availability_zone=AZ1
This instructs Cinder which availability zone to use if the API call did not specify one. If you don’t, it will use a hardcoded default, nova. In the case of our example, where we&8217;ve set the default availability zone in Nova to AZ1, this would result in a failure. This also means that unlike Nova, users do not have the flexibility of omitting availability zone information and expecting that Cinder will select any available backend with spare capacity in any availability zone.
Therefore, you have a choice with this parameter. You can set it to one of your availability zones so API calls without availability zone information don’t fail, but causing a potential situation of uneven storage allocation across your availability zones. Or, you can not set this parameter, and accept that user API calls that forget or omit availability zone info will fail.
Another option is to set the default to a non-existent availability zone you-must-specify-an-AZ or something similar, so when the call fails due to the non-existant availability zone, this information will be included in the error message sent back to the client.
Your storage backends, storage drivers, and storage architecture may also affect how you set up your availability zones. If we are using the reference Cinder LVM ISCSI Driver deployed on commodity hardware, and that hardware fits the same availability zone criteria of our computes, then we could setup availability zones to match what we have defined in Nova. We could also do the same if we had a third party storage appliance in each availability zone, e.g.:
|     Binary    |           Host          | Zone | Status  | State |
| cinder-volume | hostname1@StorageArray1 |  AZ1 | enabled |   up  |
| cinder-volume | hostname2@StorageArray2 |  AZ2 | enabled |   up  |
(Note: Notice that the hostnames (hostname1 and hostname2) are still different in this example. The cinder multi-backend feature allows us to configure multiple storage backends in the same cinder.conf (for the same cinder-volume service), but Cinder availability zones can only be defined per cinder-volume service, and not per-backend per-cinder-volume service. In other words, if you define multiple backends in one cinder.conf, they will all inherit the same availability zone.)
However, in many cases if you’re using a third party storage appliance, then these systems usually have their own built-in redundancy that exist outside of OpenStack notions of availability zones. Similarly if you use a distributed storage solution like Ceph, then availability zones have little or no meaning in this context. In this case, you can forgo Cinder availability zones.
The one issue in doing this, however, is that any availability zones you defined for Nova won’t match. This can cause problems when Nova makes API calls to Cinder &; for example, when performing a Boot from Volume API call through Nova. If Nova decided to provision your VM in AZ1, it will tell Cinder to provision a boot volume in AZ1, but Cinder doesn’t know anything about AZ1, so this API call will fail. To prevent this from happening, we need to set the following parameter in cinder.conf on your nodes running cinder-api:
[DEFAULT]
allow_availability_zone_fallback=True
This parameter prevents the API call from failing, because if the requested availability zone does not exist, Cinder will fallback to another availability zone (whichever you defined in default_availability_zone parameter, or in storage_availability_zone if the default is not set). The hardcoded default storage_availability_zone is nova, so the fallback availability zone should match the default availability zone for your cinder-volume services, and everything should work.
The easiest way to solve the problem, however is to remove the AvailabilityZoneFilter from your filter list in cinder.conf on nodes running cinder-scheduler. This makes the scheduler ignore any availability zone information passed to it altogether, which may also be helpful in case of any availability zone configuration mismatch.
Neutron
Availability zone support was added to Neutron in the Mitaka release. Availability zones can be set for DHCP and L3 agents in their respective configuration files:
[AGENT]
Availability_zone = AZ1
Restart the agents, and confirm availability zone settings as follows:
neutron agent-show <agent-id>
+———————+————+
| Field               | Value      |
+———————+————+
| availability_zone   | AZ1        |

If you would like to establish a default availability zone, you can set the this parameter in neutron.conf on neutron-server nodes:
[DEFAULT]
default_availability_zones=AZ1,AZ2
This parameters tells Neutron which availability zones to use if the API call did not specify any. Unlike Cinder, you can specify multiple availability zones, and leaving it undefined places no constraints in scheduling, as there are no hard coded defaults. If you have users making API calls that do not care about the availability zone, then you can enumerate all your availability zones for this parameter, or simply leave it undefined &8211; both would yield the same result.
Additionally, when users do specify an availability zone, such requests are fulfilled as a “best effort” in Neutron. In other words, there is no need for an availability zone fallback parameter, because your API call still execute even if your availability zone hint can’t be satisfied.
Another important distinction that sets Neutron aside from Nova and Cinder is that it implements availability zones as scheduler hints, meaning that on the client side you can repeat this option to chain together multiple availability zone specifications in the event that more than one availability zone would satisfy your availability criteria. For example:
$ neutron net-create –availability-zone-hint AZ1 –availability-zone-hint AZ2 new_network
As with Cinder, the Neutron plugins and backends you’re using deserve attention, as the support or need for availability zones may be different depending on their implementation. For example, if you’re using a reference Neutron deployment with the ML2 plugin and with DHCP and L3 agents deployed to commodity hardware, you can likely place these agents consistently according to the same availability zone criteria used for your computes.
Whereas in contrast, other alternatives such as the Contrail plugin for Neutron do not support availability zones. Or if you are using Neutron DVR for example, then availability zones have limited significance for Layer 3 Neutron.
OpenStack Project availability zone Comparison Summary
Before we move on, it&8217;s helpful to review how each project handles availability zones.

Nova
Cinder
Neutron

Default availability zone scheduling
Can set to one availability zone or None
Can set one availability zone; cannot set None
Can set to any list of availability zones or none

Availability zone fallback
None supported
Supported through configuration
N/A; scheduling to availability zones done on a best effort basis

Availability zone definition restrictions
No more than availability zone per nova-compute
No more than 1 availability zone per cinder-volume
No more than 1 availability zone per neutron agent

Availability zone client restrictions
Can specify one availability zone or none
Can specify one availability zone or none
Can specify an arbitrary number of availability zones

Availability zones typically used when you have &;
Commodity HW for computes, libvirt driver
Commodity HW for storage, LVM iSCSI driver
Commodity HW for neutron agents, ML2 plugin

Availability zones not typically used when you have&8230;
Third party hypervisor drivers that manage their own HA for VMs (DRS for VCenter)
Third party drivers, backends, etc. that manage their own HA
Third party plugins, backends, etc. that manage their own HA

Best Practices for availability zones
Now let&8217;s talk about how to best make use of availability zones.
What should my availability zones represent?
The first thing you should do as a cloud operator is to nail down your own accepted definition of an availability zone and how you will use them, and remain consistent. You don’t want to end up in a situation where availability zones are taking on more than one meaning in the same cloud. For example:
Fred’s AZ            | Example of AZ used to perform tenant workload isolation
VMWare cluster 1 AZ | Example of AZ used to select a specific hypervisor type
Power source 1 AZ   | Example of AZ used to select a specific failure domain
Rack 1 AZ           | Example of AZ used to select a specific failure domain
Such a set of definitions would be a source of inconsistency and confusion in your cloud. It’s usually better to keep things simple with one availability zone definition, and use OpenStack features such as Nova Flavors or Nova/Cinder boot hints to achieve other requirements for multi-tenancy isolation, ability to select between different hypervisor options and other features, and so on.
Note that OpenStack currently does not support the concept of multiple failure domain levels/tiers. Even though we may have multiple ways to define failure domains (e.g., by power circuit, rack, room, etc), we must pick a single convention.
For the purposes of this article, we will discuss availability zones in the context of failure domains. However, we will cover one other use for availability zones in the third section.
How many availability zones do I need?
One question that customers frequently get hung up on is how many availability zones they should create. This can be tricky because the setup and management of availability zones involves stakeholders at every layer of the solution stack, from tenant applications to cloud operators, down to data center design and planning.

A good place to start is your cloud application requirements: How many failure domains are they designed to work with (i.e. redundancy factor)? The likely answer is two (primary + backup), three (for example, for a database or other quorum-based system), or one (for example, a legacy app with single points of failure). Therefore, the vast majority of clouds will have either 2, 3, or 1 availability zone.
Also keep in mind that as a general design principle, you want to minimize the number of availability zones in your environment, because the side effect of availability zone proliferation is that you are dividing your capacity into more resource islands. The resource utilization in each island may not be equal, and now you have an operational burden to track and maintain capacity in each island/availability zone. Also, if you have a lot of availability zones (more than the redundancy factor of tenant applications), tenants are left to guess which availability zones to use and which have available capacity.
How do I organize my availability zones?
The value proposition of availability zones is that tenants are able to achieve a higher level of availability in their applications. In order to make good on that proposition, we need to design our availability zones in ways that mitigate single points of failure.             
For example, if our resources are split between two power sources in the data center, then we may decide to define two resource pools (availability zones) according to their connected power source:
Or, if we only have one TOR switch in our racks, then we may decide to define availability zones by rack. However, here we can run into problems if we make each rack its own availability zone, as this will not scale from a capacity management perspective for more than 2-3 racks/availability zones (because of the &;resource island&; problem). In this case, you might consider dividing/arranging your total rack count into into 2 or 3 logical groupings that correlate to your defined availability zones:
We may also find situations where we have redundant TOR switch pairs in our racks, power source diversity to each rack, and lack a single point of failure. You could still place racks into availability zones as in the previous example, but the value of availability zones is marginalized, since you need to have a double failure (e.g., both TORs in the rack failing) to take down a rack.
Ultimately with any of the above scenarios, the value added by your availability zones will depend on the likelihood and expression of failure modes &8211; both the ones you have designed for, and the ones you have not. But getting accurate and complete information on such failure modes may not be easy, so the value of availability zones from this kind of unplanned failure can be difficult to pin down.
There is however another area where availability zones have the potential to provide value &8212; planned maintenance. Have you ever needed to move, recable, upgrade, or rebuild a rack? Ever needed to shut off power to part of the data center to do electrical work? Ever needed to apply disruptive updates to your hypervisors, like kernel or QEMU security updates? How about upgrades of OpenStack or the hypervisor operating system?
Chances are good that these kinds of planned maintenance are a higher source of downtime than unplanned hardware failures that happen out of the blue. Therefore, this type of planned maintenance can also impact how you define your availability zones. In general the grouping of racks into availability zones (described previously) still works well, and is the most common availability zone paradigm we see for our customers.
However, it could affect the way in which you group your racks into availability zones. For example, you may choose to create availability zones based on physical parameters like which floor, room or building the equipment is located in, or other practical considerations that would help in the event you need to vacate or rebuild certain space in your DC(s). Ex:
One of the limitations of the OpenStack implementation of availability zones made apparent in this example is that you are forced to choose one definition. Applications can request a specific availability zone, but are not offered the flexibility of requesting building level isolation, vs floor, room, or rack level isolation. This will be a fixed, inherited property of the availability zones you create. If you need more, then you start to enter the realm of other OpenStack abstractions like Regions and Cells.
Other uses for availability zones?
Another way in which people have found to use availability zones is in multi-hypervisor environments. In the ideal world of the “implementation-agnostic” cloud, we would abstract such underlying details from users of our platform. However there are some key differences between hypervisors that make this aspiration difficult. Take the example of KVM & VMWare.
When an iSCSI target is provisioned through with the LVM iSCSI Cinder driver, it cannot be attached to or consumed by ESXi nodes. The provision request must go through VMWare’s VMDK Cinder driver, which proxies the creation and attachment requests to vCenter. This incompatibility can cause errors and thus a negative user experience issues if the user tries to attach a volume to their hypervisor provisioned from the wrong backend.
To solve this problem, some operators use availability zones as a way for users to select hypervisor types (for example, AZ_KVM1, AZ_VMWARE1), and set the following configuration in their nova.conf:
[cinder]
cross_az_attach = False
This presents users with an error if they attempt to attach a volume from one availability zone (e.g., AZ_VMWARE1) to another availability zone (e.g., AZ_KVM1). The call would have certainly failed regardless, but with a different error from farther downstream from one of the nova-compute agents, instead of from nova-api. This way, it&8217;s easier for the user to see where they went wrong and correct the problem.
In this case, the availability zone also acts as a tag to remind users which hypervisor their VM resides on.
In my opinion, this is stretching the definition of availability zones as failure domains. VMWare may be considered its own failure domain separate from KVM, but that’s not the primary purpose of creating availability zones this way. The primary purpose is to differentiate hypervisor types. Different hypervisor types have a different set of features and capabilities. If we think about the problem in these terms, there are a number of other solutions that come to mind:

Nova Flavors: Define a “VMWare” set of flavors that map to your VCenter cluster(s). If your tenants that use VMWare are different than your tenants who use KVM, you can make them private flavors, ensuring that tenants only ever see or interact with the hypervisor type they expect.
Cinder: Similarly for Cinder, you can make the VMWare backend private to specific tenants, and/or set quotas differently for that backend, to ensure that tenants will only allocate from the correct storage pools for their hypervisor type.
Image metadata: You can tag your images according to the hypervisor they run on. Set image property hypervisor_type to vmware for VMDK images, and to qemu for other images. The ComputeCapabilitiesFilter in Nova will honor the hypervisor placement request.

Soo… Are availability zones right for me?
So wrapping up, think of availability zones in terms of:

Unplanned failures: If you have a good history of failure data, well understood failure modes, or some known single point of failure that availability zones can help mitigate, then availability zones may be a good fit for your environment.
Planned maintenance: If you have well understood maintenance processes that have awareness of your availability zone definitions and can take advantage of them, then availability zones may be a good fit for your environment. This is where availability zones can provide some of the creates value, but is also the most difficult to achieve, as it requires intelligent availability zone-aware rolling updates and upgrades, and affects how data center personnel perform maintenance activities.
Tenant application design/support: If your tenants are running legacy apps, apps with single points of failure, or do not support use of availability zones in their provisioning process, then availability zones will be of no use for these workloads.
Other alternatives for achieving app availability: Workloads built for geo-redundancy can achieve the same level of HA (or better) in a multi-region cloud. If this were the case for your cloud and your cloud workloads, availability zones would be unnecessary.
OpenStack projects: You should factor into your decision the limitations and differences in availability zone implementations between different OpenStack projects, and perform this analysis for all OpenStack projects in your scope of deployment.

The post The first and final words on OpenStack availability zones appeared first on Mirantis | Pure Play Open Cloud.
Quelle: Mirantis

The first and final words on OpenStack availability zones

The post The first and final words on OpenStack availability zones appeared first on Mirantis | Pure Play Open Cloud.
Availability zones are one of the most frequently misunderstood and misused constructs in OpenStack. Each cloud operator has a different idea about what they are and how to use them. What&;s more, each OpenStack service implements availability zones differently &; if it even implements them at all.
Often, there isn’t even agreement over the basic meaning or purpose of availability zones.
On the one hand, you have the literal interpretation of the words “availability zone”, which would lead us to think of some logical subdivision of resources into failure domains, allowing cloud applications to intelligently deploy in ways to maximize their availability. (We’ll be running with this definition for the purposes of this article.)
On the other hand, the different ways that projects implement availability zones lend themselves to certain ways of using the feature as a result. In other words, because this feature has been implemented in a flexible manner that does not tie us down to one specific concept of an availability zone, there&8217;s a lot of confusion over how to use them.
In this article, we&8217;ll look at the traditional definition of availability zones, insights into and best practices for planning and using them, and even a little bit about non-traditional uses. Finally, we hope to address the question: Are availability zones right for you?
OpenStack availability zone Implementations
One of the things that complicates use of availability zones is that each OpenStack project implements them in their own way (if at all). If you do plan to use availability zones, you should evaluate which OpenStack projects you&8217;re going to use support them, and how that affects your design and deployment of those services.
For the purposes of this article, we will look at three core services with respect to availability zones: , Cinder, and Neutron. We won&8217;t go into the steps to set up availability zones, but but instead we&8217;ll focus on a few of the key decision points, limitations, and trouble areas with them.
Nova availability zones
Since host aggregates were first introduced in OpenStack Grizzly, I have seen a lot of confusion about availability zones in Nova. Nova tied their availability zone implementation to host aggregates, and because the latter is a feature unique to the Nova project, its implementation of availability zones is also unique.
I have had many people tell me they use availability zones in Nova, convinced they are not using host aggregates. Well, I have news for these people &8212; all* availability zones in Nova are host aggregates (though not all host aggregates are availability zones):
* Exceptions being the default_availability_zone that compute nodes are placed into when not in another user-defined availability zone, and the internal_service_availability_zone where other nova services live
Some of this confusion may come from the nova CLI. People may do a quick search online, see they can create an availability zone with one command, and may not realize that they’re actually creating a host aggregate. Ex:
$ nova aggregate-create <aggregate name> <AZ name>
$ nova aggregate-create HA1 AZ1
+—-+———+——————-+——-+————————+
| Id | Name    | Availability Zone | Hosts | Metadata               |
+—-+———+——————-+——-+————————+
| 4  |   HA1   | AZ1               |       | ‘availability_zone=AZ1’|
+—-+———+——————-+——-+————————+
I have seen people get confused with the second argument (the AZ name). This is just a shortcut for setting the availability_zone metadata for a new host aggregate you want to create.
This command is equivalent to creating a host aggregate, and then setting the metadata:
$ nova aggregate-create HA1
+—-+———+——————-+——-+———-+
| Id | Name    | Availability Zone | Hosts | Metadata |
+—-+———+——————-+——-+———-+
| 7  |   HA1   | –                 |       |          |
+—-+———+——————-+——-+———-+
$ nova aggregate-set-metadata HA1 availability_zone=AZ1
Metadata has been successfully updated for aggregate 7.
+—-+———+——————-+——-+————————+
| Id | Name    | Availability Zone | Hosts | Metadata               |
+—-+———+——————-+——-+————————+
| 7  |   HA1   | AZ1               |       | ‘availability_zone=AZ1’|
+—-+———+——————-+——-+————————+
Doing it this way, it’s more apparent that the workflow is the same as any other host aggregate, the only difference is the “magic” metadata key availability_zone which we set to AZ1 (notice we also see AZ1 show up under the Availability Zone column). And now when we add compute nodes to this aggregate, they will be automatically transferred out of the default_availability_zone and into the one we have defined. For example:
Before:
$ nova availability-zone-list
| nova              | available                              |
| |- node-27        |                                        |
| | |- nova-compute | enabled :-) 2016-11-06T05:13:48.000000 |
+——————-+—————————————-+
After:
$ nova availability-zone-list
| AZ1               | available                              |
| |- node-27        |                                        |
| | |- nova-compute | enabled :-) 2016-11-06T05:13:48.000000 |
+——————-+—————————————-+
Note that there is one behavior that sets apart the availability zone host aggregates apart from others. Namely, nova does not allow you to assign the same compute host to multiple aggregates with conflicting availability zone assignments. For example, we can first add compute a node to the previously created host aggregate with availability zone AZ1:
$ nova aggregate-add-host HA1 node-27
Host node-27 has been successfully added for aggregate 7
+—-+——+——————-+———-+————————+
| Id | Name | Availability Zone | Hosts    | Metadata               |
+—-+——+——————-+———-+————————+
| 7  | HA1  | AZ1               | ‘node-27’| ‘availability_zone=AZ1’|
+—-+——+——————-+———-+————————+
Next, we create a new host aggregate for availability zone AZ2:
$ nova aggregate-create HA2

+—-+———+——————-+——-+———-+
| Id | Name    | Availability Zone | Hosts | Metadata |
+—-+———+——————-+——-+———-+
| 13 |   HA2   | –                 |       |          |
+—-+———+——————-+——-+———-+

$ nova aggregate-set-metadata HA2 availability_zone=AZ2
Metadata has been successfully updated for aggregate 13.
+—-+———+——————-+——-+————————+
| Id | Name    | Availability Zone | Hosts | Metadata               |
+—-+———+——————-+——-+————————+
| 13 |   HA2   | AZ2               |       | ‘availability_zone=AZ2’|
+—-+———+——————-+——-+————————+
Now if we try to add the original compute node to this aggregate, we get an error because this aggregate has a conflicting availability zone:
$ nova aggregate-add-host HA2 node-27
ERROR (Conflict): Cannot add host node-27 in aggregate 13: host exists (HTTP 409)
+—-+——+——————-+———-+————————+
| Id | Name | Availability Zone | Hosts    | Metadata               |
+—-+——+——————-+———-+————————+
| 13 | HA2  | AZ2               |          | ‘availability_zone=AZ2’|
+—-+——+——————-+———-+————————+
(Incidentally, it is possible to have multiple host aggregates with the same availability_zone metadata, and add the same compute host to both. However, there are few, if any, good reasons for doing this.)
In contrast, Nova allows you to assign this compute node to another host aggregate with other metadata fields, as long as the availability_zone doesn&8217;t conflict:
You can see this if you first create a third host aggregate:
$ nova aggregate-create HA3
+—-+———+——————-+——-+———-+
| Id | Name    | Availability Zone | Hosts | Metadata |
+—-+———+——————-+——-+———-+
| 16 |   HA3   | –                 |       |          |
+—-+———+——————-+——-+———-+
Next, tag  the host aggregate for some purpose not related to availability zones (for example, an aggregate to track compute nodes with SSDs):
$ nova aggregate-set-metadata HA3 ssd=True
Metadata has been successfully updated for aggregate 16.
+—-+———+——————-+——-+———–+
| Id | Name    | Availability Zone | Hosts |  Metadata |
+—-+———+——————-+——-+———–+
| 16 |   HA3   | –                 |       | ‘ssd=True’|
+—-+———+——————-+——-+———–+
Adding original node to another aggregate without conflicting availability zone metadata works:
$ nova aggregate-add-host HA3 node-27
Host node-27 has been successfully added for aggregate 16
+—-+——-+——————-+———–+————+
| Id | Name  | Availability Zone | Hosts     |  Metadata  |
+—-+——-+——————-+———–+————+
| 16 | HA3   | –                 | ‘node-27′ | ‘ssd=True’ |
+—-+——-+——————-+———–+————+
(Incidentally, Nova will also happily let you assign the same compute node to another aggregate with ssd=False for metadata, even though that clearly doesn&8217;t make sense. Conflicts are only checked/enforced in the case of the availability_zone metadata.)
Nova configuration also holds parameters relevant to availability zone behavior. In the nova.conf read by your nova-api service, you can set a default availability zone for scheduling, which is used if users do not specify an availability zone in the API call:
[DEFAULT]
default_schedule_zone=AZ1
However, most operators leave this at its default setting (None), because it allows users who don’t care about availability zones to omit it from their API call, and the workload will be scheduled to any availability zone where there is available capacity.
If a user requests an invalid or undefined availability zone, the Nova API will report back with an HTTP 400 error. There is no availability zone fallback option.
Cinder
Creating availability zones in Cinder is accomplished by setting the following configuration parameter in cinder.conf, on the nodes where your cinder-volume service runs:
[DEFAULT]
storage_availability_zone=AZ1
Note that you can only set the availability zone to one value. This is consistent with availability zones in other OpenStack projects that do not allow for the notion of overlapping failure domains or multiple failure domain levels or tiers.
The change takes effect when you restart your cinder-volume services. You can confirm your availability zone assignments as follows:
cinder service-list
+—————+——————-+——+———+——-+
|     Binary    |        Host       | Zone | Status  | State |
+—————+——————-+——+———+——-+
| cinder-volume | hostname1@LVM     |  AZ1 | enabled |   up  |
| cinder-volume | hostname2@LVM     |  AZ2 | enabled |   up  |
If you would like to establish a default availability zone, you can set the this parameter in cinder.conf on the cinder-api nodes:
[DEFAULT]
default_availability_zone=AZ1
This instructs Cinder which availability zone to use if the API call did not specify one. If you don’t, it will use a hardcoded default, nova. In the case of our example, where we&8217;ve set the default availability zone in Nova to AZ1, this would result in a failure. This also means that unlike Nova, users do not have the flexibility of omitting availability zone information and expecting that Cinder will select any available backend with spare capacity in any availability zone.
Therefore, you have a choice with this parameter. You can set it to one of your availability zones so API calls without availability zone information don’t fail, but causing a potential situation of uneven storage allocation across your availability zones. Or, you can not set this parameter, and accept that user API calls that forget or omit availability zone info will fail.
Another option is to set the default to a non-existent availability zone you-must-specify-an-AZ or something similar, so when the call fails due to the non-existant availability zone, this information will be included in the error message sent back to the client.
Your storage backends, storage drivers, and storage architecture may also affect how you set up your availability zones. If we are using the reference Cinder LVM ISCSI Driver deployed on commodity hardware, and that hardware fits the same availability zone criteria of our computes, then we could setup availability zones to match what we have defined in Nova. We could also do the same if we had a third party storage appliance in each availability zone, e.g.:
|     Binary    |           Host          | Zone | Status  | State |
| cinder-volume | hostname1@StorageArray1 |  AZ1 | enabled |   up  |
| cinder-volume | hostname2@StorageArray2 |  AZ2 | enabled |   up  |
(Note: Notice that the hostnames (hostname1 and hostname2) are still different in this example. The cinder multi-backend feature allows us to configure multiple storage backends in the same cinder.conf (for the same cinder-volume service), but Cinder availability zones can only be defined per cinder-volume service, and not per-backend per-cinder-volume service. In other words, if you define multiple backends in one cinder.conf, they will all inherit the same availability zone.)
However, in many cases if you’re using a third party storage appliance, then these systems usually have their own built-in redundancy that exist outside of OpenStack notions of availability zones. Similarly if you use a distributed storage solution like Ceph, then availability zones have little or no meaning in this context. In this case, you can forgo Cinder availability zones.
The one issue in doing this, however, is that any availability zones you defined for Nova won’t match. This can cause problems when Nova makes API calls to Cinder &; for example, when performing a Boot from Volume API call through Nova. If Nova decided to provision your VM in AZ1, it will tell Cinder to provision a boot volume in AZ1, but Cinder doesn’t know anything about AZ1, so this API call will fail. To prevent this from happening, we need to set the following parameter in cinder.conf on your nodes running cinder-api:
[DEFAULT]
allow_availability_zone_fallback=True
This parameter prevents the API call from failing, because if the requested availability zone does not exist, Cinder will fallback to another availability zone (whichever you defined in default_availability_zone parameter, or in storage_availability_zone if the default is not set). The hardcoded default storage_availability_zone is nova, so the fallback availability zone should match the default availability zone for your cinder-volume services, and everything should work.
The easiest way to solve the problem, however is to remove the AvailabilityZoneFilter from your filter list in cinder.conf on nodes running cinder-scheduler. This makes the scheduler ignore any availability zone information passed to it altogether, which may also be helpful in case of any availability zone configuration mismatch.
Neutron
Availability zone support was added to Neutron in the Mitaka release. Availability zones can be set for DHCP and L3 agents in their respective configuration files:
[AGENT]
Availability_zone = AZ1
Restart the agents, and confirm availability zone settings as follows:
neutron agent-show <agent-id>
+———————+————+
| Field               | Value      |
+———————+————+
| availability_zone   | AZ1        |

If you would like to establish a default availability zone, you can set the this parameter in neutron.conf on neutron-server nodes:
[DEFAULT]
default_availability_zones=AZ1,AZ2
This parameters tells Neutron which availability zones to use if the API call did not specify any. Unlike Cinder, you can specify multiple availability zones, and leaving it undefined places no constraints in scheduling, as there are no hard coded defaults. If you have users making API calls that do not care about the availability zone, then you can enumerate all your availability zones for this parameter, or simply leave it undefined &8211; both would yield the same result.
Additionally, when users do specify an availability zone, such requests are fulfilled as a “best effort” in Neutron. In other words, there is no need for an availability zone fallback parameter, because your API call still execute even if your availability zone hint can’t be satisfied.
Another important distinction that sets Neutron aside from Nova and Cinder is that it implements availability zones as scheduler hints, meaning that on the client side you can repeat this option to chain together multiple availability zone specifications in the event that more than one availability zone would satisfy your availability criteria. For example:
$ neutron net-create –availability-zone-hint AZ1 –availability-zone-hint AZ2 new_network
As with Cinder, the Neutron plugins and backends you’re using deserve attention, as the support or need for availability zones may be different depending on their implementation. For example, if you’re using a reference Neutron deployment with the ML2 plugin and with DHCP and L3 agents deployed to commodity hardware, you can likely place these agents consistently according to the same availability zone criteria used for your computes.
Whereas in contrast, other alternatives such as the Contrail plugin for Neutron do not support availability zones. Or if you are using Neutron DVR for example, then availability zones have limited significance for Layer 3 Neutron.
OpenStack Project availability zone Comparison Summary
Before we move on, it&8217;s helpful to review how each project handles availability zones.

Nova
Cinder
Neutron

Default availability zone scheduling
Can set to one availability zone or None
Can set one availability zone; cannot set None
Can set to any list of availability zones or none

Availability zone fallback
None supported
Supported through configuration
N/A; scheduling to availability zones done on a best effort basis

Availability zone definition restrictions
No more than availability zone per nova-compute
No more than 1 availability zone per cinder-volume
No more than 1 availability zone per neutron agent

Availability zone client restrictions
Can specify one availability zone or none
Can specify one availability zone or none
Can specify an arbitrary number of availability zones

Availability zones typically used when you have &;
Commodity HW for computes, libvirt driver
Commodity HW for storage, LVM iSCSI driver
Commodity HW for neutron agents, ML2 plugin

Availability zones not typically used when you have&8230;
Third party hypervisor drivers that manage their own HA for VMs (DRS for VCenter)
Third party drivers, backends, etc. that manage their own HA
Third party plugins, backends, etc. that manage their own HA

Best Practices for availability zones
Now let&8217;s talk about how to best make use of availability zones.
What should my availability zones represent?
The first thing you should do as a cloud operator is to nail down your own accepted definition of an availability zone and how you will use them, and remain consistent. You don’t want to end up in a situation where availability zones are taking on more than one meaning in the same cloud. For example:
Fred’s AZ            | Example of AZ used to perform tenant workload isolation
VMWare cluster 1 AZ | Example of AZ used to select a specific hypervisor type
Power source 1 AZ   | Example of AZ used to select a specific failure domain
Rack 1 AZ           | Example of AZ used to select a specific failure domain
Such a set of definitions would be a source of inconsistency and confusion in your cloud. It’s usually better to keep things simple with one availability zone definition, and use OpenStack features such as Nova Flavors or Nova/Cinder boot hints to achieve other requirements for multi-tenancy isolation, ability to select between different hypervisor options and other features, and so on.
Note that OpenStack currently does not support the concept of multiple failure domain levels/tiers. Even though we may have multiple ways to define failure domains (e.g., by power circuit, rack, room, etc), we must pick a single convention.
For the purposes of this article, we will discuss availability zones in the context of failure domains. However, we will cover one other use for availability zones in the third section.
How many availability zones do I need?
One question that customers frequently get hung up on is how many availability zones they should create. This can be tricky because the setup and management of availability zones involves stakeholders at every layer of the solution stack, from tenant applications to cloud operators, down to data center design and planning.

A good place to start is your cloud application requirements: How many failure domains are they designed to work with (i.e. redundancy factor)? The likely answer is two (primary + backup), three (for example, for a database or other quorum-based system), or one (for example, a legacy app with single points of failure). Therefore, the vast majority of clouds will have either 2, 3, or 1 availability zone.
Also keep in mind that as a general design principle, you want to minimize the number of availability zones in your environment, because the side effect of availability zone proliferation is that you are dividing your capacity into more resource islands. The resource utilization in each island may not be equal, and now you have an operational burden to track and maintain capacity in each island/availability zone. Also, if you have a lot of availability zones (more than the redundancy factor of tenant applications), tenants are left to guess which availability zones to use and which have available capacity.
How do I organize my availability zones?
The value proposition of availability zones is that tenants are able to achieve a higher level of availability in their applications. In order to make good on that proposition, we need to design our availability zones in ways that mitigate single points of failure.             
For example, if our resources are split between two power sources in the data center, then we may decide to define two resource pools (availability zones) according to their connected power source:
Or, if we only have one TOR switch in our racks, then we may decide to define availability zones by rack. However, here we can run into problems if we make each rack its own availability zone, as this will not scale from a capacity management perspective for more than 2-3 racks/availability zones (because of the &;resource island&; problem). In this case, you might consider dividing/arranging your total rack count into into 2 or 3 logical groupings that correlate to your defined availability zones:
We may also find situations where we have redundant TOR switch pairs in our racks, power source diversity to each rack, and lack a single point of failure. You could still place racks into availability zones as in the previous example, but the value of availability zones is marginalized, since you need to have a double failure (e.g., both TORs in the rack failing) to take down a rack.
Ultimately with any of the above scenarios, the value added by your availability zones will depend on the likelihood and expression of failure modes &8211; both the ones you have designed for, and the ones you have not. But getting accurate and complete information on such failure modes may not be easy, so the value of availability zones from this kind of unplanned failure can be difficult to pin down.
There is however another area where availability zones have the potential to provide value &8212; planned maintenance. Have you ever needed to move, recable, upgrade, or rebuild a rack? Ever needed to shut off power to part of the data center to do electrical work? Ever needed to apply disruptive updates to your hypervisors, like kernel or QEMU security updates? How about upgrades of OpenStack or the hypervisor operating system?
Chances are good that these kinds of planned maintenance are a higher source of downtime than unplanned hardware failures that happen out of the blue. Therefore, this type of planned maintenance can also impact how you define your availability zones. In general the grouping of racks into availability zones (described previously) still works well, and is the most common availability zone paradigm we see for our customers.
However, it could affect the way in which you group your racks into availability zones. For example, you may choose to create availability zones based on physical parameters like which floor, room or building the equipment is located in, or other practical considerations that would help in the event you need to vacate or rebuild certain space in your DC(s). Ex:
One of the limitations of the OpenStack implementation of availability zones made apparent in this example is that you are forced to choose one definition. Applications can request a specific availability zone, but are not offered the flexibility of requesting building level isolation, vs floor, room, or rack level isolation. This will be a fixed, inherited property of the availability zones you create. If you need more, then you start to enter the realm of other OpenStack abstractions like Regions and Cells.
Other uses for availability zones?
Another way in which people have found to use availability zones is in multi-hypervisor environments. In the ideal world of the “implementation-agnostic” cloud, we would abstract such underlying details from users of our platform. However there are some key differences between hypervisors that make this aspiration difficult. Take the example of KVM & VMWare.
When an iSCSI target is provisioned through with the LVM iSCSI Cinder driver, it cannot be attached to or consumed by ESXi nodes. The provision request must go through VMWare’s VMDK Cinder driver, which proxies the creation and attachment requests to vCenter. This incompatibility can cause errors and thus a negative user experience issues if the user tries to attach a volume to their hypervisor provisioned from the wrong backend.
To solve this problem, some operators use availability zones as a way for users to select hypervisor types (for example, AZ_KVM1, AZ_VMWARE1), and set the following configuration in their nova.conf:
[cinder]
cross_az_attach = False
This presents users with an error if they attempt to attach a volume from one availability zone (e.g., AZ_VMWARE1) to another availability zone (e.g., AZ_KVM1). The call would have certainly failed regardless, but with a different error from farther downstream from one of the nova-compute agents, instead of from nova-api. This way, it&8217;s easier for the user to see where they went wrong and correct the problem.
In this case, the availability zone also acts as a tag to remind users which hypervisor their VM resides on.
In my opinion, this is stretching the definition of availability zones as failure domains. VMWare may be considered its own failure domain separate from KVM, but that’s not the primary purpose of creating availability zones this way. The primary purpose is to differentiate hypervisor types. Different hypervisor types have a different set of features and capabilities. If we think about the problem in these terms, there are a number of other solutions that come to mind:

Nova Flavors: Define a “VMWare” set of flavors that map to your VCenter cluster(s). If your tenants that use VMWare are different than your tenants who use KVM, you can make them private flavors, ensuring that tenants only ever see or interact with the hypervisor type they expect.
Cinder: Similarly for Cinder, you can make the VMWare backend private to specific tenants, and/or set quotas differently for that backend, to ensure that tenants will only allocate from the correct storage pools for their hypervisor type.
Image metadata: You can tag your images according to the hypervisor they run on. Set image property hypervisor_type to vmware for VMDK images, and to qemu for other images. The ComputeCapabilitiesFilter in Nova will honor the hypervisor placement request.

Soo… Are availability zones right for me?
So wrapping up, think of availability zones in terms of:

Unplanned failures: If you have a good history of failure data, well understood failure modes, or some known single point of failure that availability zones can help mitigate, then availability zones may be a good fit for your environment.
Planned maintenance: If you have well understood maintenance processes that have awareness of your availability zone definitions and can take advantage of them, then availability zones may be a good fit for your environment. This is where availability zones can provide some of the creates value, but is also the most difficult to achieve, as it requires intelligent availability zone-aware rolling updates and upgrades, and affects how data center personnel perform maintenance activities.
Tenant application design/support: If your tenants are running legacy apps, apps with single points of failure, or do not support use of availability zones in their provisioning process, then availability zones will be of no use for these workloads.
Other alternatives for achieving app availability: Workloads built for geo-redundancy can achieve the same level of HA (or better) in a multi-region cloud. If this were the case for your cloud and your cloud workloads, availability zones would be unnecessary.
OpenStack projects: You should factor into your decision the limitations and differences in availability zone implementations between different OpenStack projects, and perform this analysis for all OpenStack projects in your scope of deployment.

The post The first and final words on OpenStack availability zones appeared first on Mirantis | Pure Play Open Cloud.
Quelle: Mirantis

OpenStack Developer Mailing List Digest January 21-27

SuccessBot Says

dims [1] : Nova now has a python35 based CI job in check queue running Tempest tests (everything running on py35)
markvoelker [2]: Newly published Foundation annual report starts off with interoperability right in the chairman&;s note [3]
Tell us yours via IRC channels with message “ <message>”
All: [4]

Get Active in Upstream Training

There is a continuous effort in helping newcomers join our community by organizing upstream contribution trainings [5][6] before every summit.

1.5 &; 2 days of hands-on steps of becoming an active OpenStack contributor.

Like everything else, this is a community effort.

In preparation for the Boston summit and the upcoming PTG in Atlanta, we are looking for coaches and mentors to help us make the training better.
If you’re interested in helping contact:

Ildiko Vancsa IRC freenode at ildikov or email [7]
Kendall Nelson IRC freenode at diablo_rojo or email [8]

Full thread: [9]

Project Team Gathering Coordination Tool

No central scheduling, beyond assigned rooms to teams and days.

Each team arranges their time in their room.
List of etherpads [10]

We still need centralized communication beyond each room:

An event IRC channel: openstack-ptg on free node IRC

Do public service announcements
Pings from room to room.

An EtherCalc spreadsheet powered dynamic schedule with extra rooms available:

One fishbowl
A few dark rooms with projectors and screens (not all will have a/v equipment due to budget).
Infra is working on setting up EtherCalc

Full thread: [11]

POST /api-wg/news

API Guidelines proposed for freeze:

Add guidelines on usage of state vs. status [12]
Clarify the status values in versions [13]
Add guideline for invalid query parameters [14]

Guidelines currently under review:

Add guidelines for boolean names [15]
Define pagination guidelines [16]
Add API capabilities discovery guideline [17 ]

Full thread: [18]

Lots of Teams Without PTL Candidates

We are reaching close to the end of the PTL nominations (Jan 29, 2017 23:45 UTC), but have projects that are leaderless:
Community App Catalog
Ec2 API
Fuel
Karbor
Magnum
Monasca
OpenStackClient
OpenStackUX
Packaging Prm
Rally
RefStack
Requirements
Senlin
Stable Branch Maintenance
Vitrage
Zun
Full thread [19]

 
Quelle: openstack.org

OpenStack Use Cases – New Analyst Papers and Webinar Now Available

 
As the OpenStack market continues to mature, some organizations have made the move and put OpenStack projects into production. They have done this in a variety of ways for a variety of reasons. However, other organizations have waited to see what these first-movers are doing with it and whether or not they are successful before exploring for themselves.
As such, we&;re pleased to announce the availability of 4 new analyst white papers from 451 Research on how organizations are using OpenStack in production. The information in these papers is based on 451 Research&8217;s own insights as well as interviews with customers who have put OpenStack into production.

Here are the four papers:
OpenStack delivers for private cloud users
OpenStack is the leading open source option for private clouds, according to 451 Research. Learn how two organizations gained efficiency with OpenStack.
 
Service providers embracing OpenStack NFV
Communications services providers are increasingly using OpenStack network function virtualization (NFV) as a more flexible infrastructure. Find out why.
 
OpenStack in support of public cloud
This paper discusses 2 organizations that have adopted OpenStack® public cloud and the wisdom of having a trusted partner. Read the analyst paper.
 
Containers rise to the challenge of hybrid IT
Get enhanced security, greater operational efficiency, and more rapid cloud-aware app development. Read about 2 case studies in this analyst paper.
 
In addition to these new papers, 451 Research will be presenting the findings and additional insights in a free public webinar on February 22, 2017. You can sign up for the webinar here:
How organizations are using OpenStack &; 4 use cases
In this webinar, 451 Research&8217;s Al Sadowski will teach you:

What kinds of organizations are adopting OpenStack.
What drives them to adopt OpenStack in the first place.
What they hope to achieve with OpenStack, and their progress thus far.
What the future holds for organizations adopting OpenStack.

So if you&8217;re one of the organizations who wants to know how others are using OpenStack, I encourage you to download the papers above and join us on February 22 for the webinar.
 
Quelle: RedHat Stack

OpenStack Developer Mailing List Digest January 14-20

SuccessBot Says

stevemar 1 : number of open keystone bugs < 100!
morgan 2 : Good policy meeting, provided history and background that cleared up a lot of confusion
Tell us yours via OpenStack IRC channels with message “ <message>”
All

FIPS Compliance

Previous threads 3 have been discussing enabling Federal Information Processing Standards (FIPS).
Various OpenStack projects make md5 calls. Not for security purposes, just hash generation, but even that blocks enabling FIPS.
A patch has been proposed for newest versions of Python for users to set if these are used for security or not 4 .

Won’t land until next versions of Python, but in place for current RHEL and CentOS versions.
We will create a wrapper around md5 with a useforsecurity=False parameter to check the signature of hashlib.md5.

Steps forward:

Create the wrapper
Replace all md5 calls in OpenStack projects with the wrapper.

Unfortunately the patch 4 has stopped having progress since 2013. We should get that merged first.

Even if this did land, it would be a while before it was adopted, since it would land in Python 3.7.

Full thread

Refreshing and Revalidating API Compatibility Guidelines

In the last TC meeting 5 , a tag was in review for supporting API compatibility 6 .
The tag evaluates projects by using the API guideline which is out of date 7.

A review has been posted to refresh these guidelines 8 .
API compatibility of overtime is a fundamental aspect of OpenStack interoperability. Not only do we need to get it it right, we need to make sure we understand it.

Full thread

Base Services

in open stack all components can assume that a number of external services won&;t be present and available (e.g. A message queue, database).
The Architecture working group has started this effort 9 .
This proposal 10 is a prerequisite in order for us to have more strategic discussions with adding base services.
Review the proposal and/or join the Architecture working group meeting 11
Once solidified the technical committee will have a final discussion and approval.
Full thread

Improving Vendor Discoverability

In previous Technical Committee meetings, it was agreed that vendor discoverability needs to be improved.
This is done today with the OpenStack Foundation marketplace 12 .

This is powered by the community driven project call driver log which is a big JSON file 13.

Various people in the community did not know the marketplace worked and we&8217;re unhappy that the projects themselves weren&8217;t owning it.
The goal of this discussion is to have this process be more community driven than it is today.
Suggestion: Split driver log into smaller JSON files that are inside each project to maintain.

Projects will set how they validate vendors into this list.
There’s a trend today for third party CI’s being a choice of validation 14.

Full thread

Nominations for OpenStack PTLs Are Now Open!

Will remain open until January 29, 2017 23:45 UTC.
Candidates must submit a text file openstack/election repository 15

Filename convention is $cyclename/$projectname/$ircname.txt.
To be eligible, you need to have contributed an accepted patch to one of the corresponding program’s projects 16 during the Neutron-Ocata timeframe (April 11, 2016 00:00 UTC to January 23, 2017 23:59 UTC).

Additional information about the nomination process 17
Approved candidates will be listed 18.
Electorate should confirm their email address in Gerrit 19 in Settings ←Contact Information ←Preferred Email prior to Jan 25, 2017 23:59 UTC.
Full thread

The Process of Creating stable/ocata branches

As previously mentioned 20, it’s possible for teams to setup stable branches when ready.
The release team will not be automatically setting up branches this cycle.

The release liaison within teams will need to inform when ready.
The PTL or release liaison may request a new branch by submitting a patch to the openstack/releases repository specifying the tagged version to be used as the base of the branch.

Guidelines for when projects should branch:

Projects using the cycle-with-milestone release model should include the request for their stable branch along with the RC1 tag request (target week is R-3 week, so use Feb 2 as the deadline)
Library projects should be branched with, or shortly after, their final release this week (use Jan 19 as the deadline)
I will branch the requirements repository shortly after all of the cycle-with-milestone projects have branched. After the   requirements repository is branched and the master requirements list is opened, projects that have not branched will be tested with requirements as the requirements master branch advances and stable/ocata stays stable. Waiting too long to create the stable/ocata branch may result in broken CI jobs in either stable/ocata or master. Don&8217;t delay any further than necessary.
Projects using the cycle-trailing release model should branch by R-0 (23 Feb). The remaining two weeks before the trailing deadline should be used for last-minute fixes, which will need to be backported into the branch to create the final release.
Other projects, including cycle-with-intermediary and independent  projects that create branches, should request their stable branch when they are ready to declare a final version and start working on Pike-related changes. This must be completed before the final release week, use 16 Feb as the deadline.

See the README.rst file in openstack/releases for more details about how to format branch specifications.
Full thread

Why Are Projects Trying To Avoid Barbican, Still?

Some projects are wanting to implement their own secret storage to avoid Barbican or avoid adding a dependency on it.

Some developers are doing this to make the operator’s lives simpler.

Barbican Positives:

Barbican has been around for a few years and deployed by several companies that have probably been audited for security purposes.
Most of the technology involved in Barbican is proven to be secure. This has been analyzed by the OpenStack’s own security group.
Doesn’t have a requirement on hardware TPM, so no hardware cost.
Several services provide the option of using Barbican, but not a hard requirement.

Feedback of problems with Barbican:

Relying on something that cannot be guaranteed will be present in a deployment.

The base service 9 proposal could help with this.

OpenStack specific solution. Some companies are using solutions that integrate with other things:

Keywhiz 21 to work with Kubernetes and their existing systems.

Devstack plugin just sets up Barbican. It’s not actually configuring any existing services to use it.
No fixed key manager for testing. The Barbican team pushed back on maintaining this because it’s not secure.
API stability with version 2 ←3 changes were made without a deprecation path or guarantees.
Tokens are open ended for users. Keystone and Barbican need to be much closer.

Castellan provides an abstraction for key management, but today only Barbican.
Rackspace recently made Barbican available. Maybe it’s easier now to perform an HA deployment.
Full thread

POST /api-wg/news

New guidelines:

Accurate status code vs backwards compatibility 22
Fix no sample file in browser 23

Guidelines proposed for freeze:

Add guidelines on usage of state vs. status 24
Clarify the status values in versions 25
Add guideline for invalid query parameters 26

Under review:

Add guidelines for boolean names 27
Define pagination guidelines 28
Add API capabilities discovery guideline 29

Full thread

Release Countdown for Week R-4 Jan 23-27

Focus:

This week begins feature freeze for all milestone-based projects.
No feature patches should be landed after this point.
PTLs may grant exceptions
Soft string freeze begins.

Review teams should reject any modifications to user-facing strings.

Requirement freeze begins.

Only critical requirements and constraints changes will be allowed.

Release Tasks:

Prepare final release and branch requests for all client libraries.
Review stable branches for unreleased changes and prepare those releases.
Milestone based projects should ensure that membership of $project-release gerri groups is up to date with the team who will finalize the project release.

General Notes:

RC1 target week in R-3 is only one week after freeze.

Important Dates:

Ocata 3 Milestone, with Feature and Requirements Freezes: 26 Jan
Ocata RC1 target: 2 Feb
Ocata Final Release candidate deadline: 16 Feb
Ocata release schedule 30

Full thread

 
[1] &; http://eavesdrop.openstack.org/irclogs/%23openstack-keystone/%23openstack-keystone.2017-01-18.log.html
[2] &8211; http://eavesdrop.openstack.org/irclogs/%23openstack-keystone/%23openstack-keystone.2017-01-18.log.html
[3] &8211; http://lists.openstack.org/pipermail/openstack-dev/2016-November/107035.html
[4] &8211; http://bugs.python.org/issue9216
[5] &8211; http://eavesdrop.openstack.org/meetings/tc/2017/tc.2017-01-17-20.00.log.html
[6] &8211; https://review.openstack.org/#/c/418010/
[7] &8211; http://specs.openstack.org/openstack/api-wg/guidelines/evaluating_api_changes.html
[8] &8211; https://review.openstack.org/#/c/421846/
[9] &8211; https://review.openstack.org/421956
[10] &8211; https://review.openstack.org/421957
[11] &8211; http://eavesdrop.openstack.org/
[12] &8211; https://www.openstack.org/marketplace/drivers/
[13] &8211; http://git.openstack.org/cgit/openstack/driverlog/tree/etc/default_data.json
[14] &8211; https://etherpad.openstack.org/p/driverlog-validation
[15] &8211; http://governance.openstack.org/election/how-to-submit-your-candidacy
[16] &8211; http://git.openstack.org/cgit/openstack/governance/tree/reference/projects.yaml
[17] &8211; https://governance.openstack.org/election/
[18] &8211; https://governance.openstack.org/election/pike-ptl-candidates
[19] &8211; https://review.openstack.org
[20] &8211; http://lists.openstack.org/pipermail/openstack-dev/2016-December/108923.html
[21] &8211; https://github.com/square/keywhiz
[22] &8211; https://review.openstack.org/#/c/422264/
[23] &8211; https://review.openstack.org/#/c/421084/
[24] &8211; https://review.openstack.org/#/c/411528/
[25] &8211; https://review.openstack.org/#/c/411849/
[26] &8211; https://review.openstack.org/417441
[27] &8211; https://review.openstack.org/#/c/411529/
[28] &8211; https://review.openstack.org/#/c/390973/
[29] &8211; https://review.openstack.org/#/c/386555/
[30] &8211; http://releases.openstack.org/ocata/schedule.html
Quelle: openstack.org

A dash of Salt(Stack): Using Salt for better OpenStack, Kubernetes, and Cloud — Q&A

The post A dash of Salt(Stack): Using Salt for better OpenStack, Kubernetes, and Cloud &; Q&;A appeared first on Mirantis | The Pure Play OpenStack Company.
On January 16, Ales Komarek presented an introduction to Salt. We covered the following topics:

The model-driven architectures behind how Salt stores topologies and workflows

How Salt provides solution adaptability for any custom workloads

Infrastructure as Code: How Salt provides not only configuration management, but entire life-cycle management

How Continuous Delivery/ Integration/ Management fits into the puzzle

How Salt manages and scales parallel cloud deployments that include OpenStack, Kubernetes and others

What we didn&;t do, however, is get to all of the questions from the audience, so here&8217;s a written version of the Q&A, including those we didn&8217;t have time for.
Q: Why Salt?
A: It&8217;s python, it has a huge and growing base of imperative modules and declarative states, and it has a good message bus.
Q: What tools are used to initially provision Salt across an infrastructure? Cobbler, Puppet, MAAS?
A: To create a new deployment, we rely on a single node, where we bootstrap the Salt master and Metal-as-a-Service (formerly based on Foreman, now Ironic). Then we control the MaaS service to deploy the physical bare-metal nodes.
Q: How broad a range of services do you already have recipes for, and how easy is it to write and drop in new ones if you need one that isn&8217;t already available?
A: The ecosystem is pretty vast. You can look at either https://github.com/tcpcloud or the formula ecosystem overview at http://openstack-salt.tcpcloud.eu/develop/extending-ecosystem.html. There are also guidelines for creating new formulas, which is very straight-forward process. A new service can be created in matter of hours, or even minutes.
Q: Can you convert your existing Puppet/Ansible scripts to Salt, and what would I search to find information about that?
A: Yes, we have reverse engineered autmation for some of these services in the past. For example we were deeply inspired by the Ansible module for Gerrit resource management.  You can find some information on creating Salt Formulas at https://docs.saltstack.com/en/latest/topics/development/conventions/formulas.html,  and we will be adding tutorial material here on this blog in the near future.
Q: Is there a NodeJS binding available?
A: If you meant the NodeJS formula to setup a NodeJS enironment, yes, there is such a formula. If you mean bindings to the system, you can use the Salt API to integrate NodeJS with Salt.
Q: Have you ever faced performance issues when storing a lot of data in pillars?
A: We have not faced performance issues with pillars that are deliverd by reclass ENC. It has been tested up to a few thousands of nodes.
Q: What front end GUI is typically used with Salt monitoring (e.g., Kibana, Grafana,&;)?
A: Salt monitoring uses Sensu or StackLight for the actual functional monitoring checks. It uses Kibana to display events stored in Elasticsearch and Grafana to visualize metrics coming from time-series databases such as Graphite or Influx.
Q: What is the name of the salt PKI manager? (Or what would I search for to learn more about using salt for infrastructure-wide PKI management?)
A: The PKI feature is well documented in the Salt docs, and is available at https://docs.saltstack.com/en/latest/ref/states/all/salt.states.x509.html.
Q: Can I practice installing and deploying SaltStack on my laptop? Can you recommend a link?
A: I&8217;d recommend you have a look at http://openstack-salt.tcpcloud.eu/develop/quickstart-vagrant.html where you can find a nice tutorial on how to setup a simple infrastructure.
Q: Thanks for the presentation! Within Heat, I&8217;ve only ever seen salt used in terms of software deployments. What we&8217;ve seen today, however, goes clear through to service, resource, and even infrastructure deployment! In this way, does Salt become a viable alternative to Heat? (I&8217;m trying to understand where the demarcation is between the two now.)
A: Think of Heat as part of the solution responsible for spinning up the harware resources such as networks, routers and servers, in a way that is similar to MaaS, Ironic or Foreman. Salt&8217;s part begins where Heat&8217;s part ends &; after the resources are started, Salt takes over and finishes the installation/configuration process.
Q: When you mention Orchestration, how does salt differentiate from Heat, or is Salt making Heat calls?
A: Heat is more for hardware resources orchestration. It has some capability to do software configuration, but rather limited. We have created heat resources that help to classify resources on fly. We also have salt heat modules capable of running a heat stack.
Q: Will you be showing any parts of SaltStack Enterprise, or only FREE Salt Open Source? Do you use Salt in Multi-Master deployment?
A: We are using the opensource version of SaltStack, the enterprise gets little gain given the pricing model. In some deployments, we use the salt master HA deployment setups.
Q: What HA engine is typically used for the Salt master?
A: We use 2 separate masters with shared storage provided by GlusterFS on which the master&8217;s and minions&8217; keys are stored.
Q: Is there a GUI ?
A: The creation of a GUI is currently under discussion.
Q: How do you enforce Role Based Administration in the Salt Master? Can you segregate users to specific job roles and limit which jobs they can execute in Salt?
A: We use the ACLs of the Salt master to limit the user&8217;s options. This also applies for the Jenkins-powered pipelines, which we also manage by Salt, both on the job and the user side.
Q: Can you show the salt files (.sls, pillar, &8230;)?
A: You can look at the github for existing formulas at https://github.com/tcpcloud and good example of pillars can be found at https://github.com/Mirantis/mk-lab-salt-model/.
Q: Is there a link for deploying Salt for Kubernetes? Any best practices guide?
A: The best place to look is the https://github.com/openstack/salt-formula-kubernetes README.
Q: Is SaltStack the same as what&8217;s on saltstack.com, or is it a different project?
A: These are the same project. Saltstack.com is company that is behind the Salt technology and provides support and enterprise versions.
Q: So far this looks like what Chef can do. Can you make a comparison or focus on the &;value add&; from Salt that Chef or Puppet don&8217;t give you?
A: The replaceability/reusability of the individual components is very easy, as all formulas are &;aware&8217; of the rest and share a common form and single dependency tree. This is a problem with community-based formulas in either of the other tools, as they are not very compatible with each other.
Q: In terms of purpose, is there any difference between SaltStack vs Openstack?
A: Apart from the fact that SaltStack can install OpenStack, it can also provide virtualization capabilities. However, Salt has very limited options, while OpenStack supports complex production level scenarios.
Q: Great webinar guys. Ansible seems to have a lot of traction as means of deploying OpenStack. Could you compare/contrast with SaltStack in this context?
A: With Salt, the OpenStack services are just part of wider ecosystem; the main advantage comes from the consistency across all services/formulas, the provision of support metadata to provide documentation or monitoring features.
Q: How is Salt better than Ansible/Puppet/Chef ?
A: The biggest difference is the message bus, which lets you control, and get data from, the infrastructure with great speed and concurrency.
Q: Can you elaborate mirantis fuel vs saltstack?
Fuel is an open source project that was (and is) designed to deploy OpenStack from a single ISO-based artifact, and to provide various lifecycle management functions once the cluster has been deployed. SaltStack is designed to be more granular, working with individual components or services.
Q: Are there plans to integrate SaltStack in to MOS?
A: The Mirantis Cloud Platform (MCP) will be powered by Salt/Reclass.
Q: Is Fuel obsolete or it will use Salt in the background instead of Puppet?
A: Fuel in its current form will continue to be used for deploying Mirantis OpenStack in the traditional manner (as a single ISO file). We are extending our portfolio of life cycle management tools to include appropriate technologies for deploying and managing open source software in MCP. For example, Fuel CCP will be used to deploy containerized OpenStack on Kubernetes. Similarly, Decapod will be used to deploy Ceph. All of these lifecycle management technologies are, in a sense, Fuel. Whether a particular tool uses Salt or Puppet will depend on what it&8217;s doing.
Q: MOS 10 release date?
A: We&8217;re still making plans on this.
Thanks for joining us, or if you missed it, please go ahead and view the webinar.
The post A dash of Salt(Stack): Using Salt for better OpenStack, Kubernetes, and Cloud &8212; Q&038;A appeared first on Mirantis | The Pure Play OpenStack Company.
Quelle: Mirantis

9 tips to properly configure your OpenStack Instance

In OpenStack jargon, an Instance is a Virtual Machine, the guest workload. It boots from an operating system image, and it is configured with a certain amount of CPU, RAM and disk space, amongst other parameters such as networking or security settings.
In this blog post kindly contributed by Marko Myllynen we’ll explore nine configuration and optimization options that will help you achieve the required performance, reliability and security that you need for your workloads.
Some of the optimizations can be done inside a guest regardless of what has the OpenStack Cloud Administrator enabled in your cloud. However, more advanced options require prior enablement and, possibly, special host capabilities. This means many of the options described here will depend on how the Administrator configured the cloud, or may not be available for some tenants as they are reserved for certain groups. More information about this subject can be found on the Red Hat Documentation Portal and its comprehensive guide on OpenStack Image Service. Similarly, the upstream OpenStack documentation has some extra guidelines available.
The following configurations should be evaluated for any VM running on any OpenStack environment. These changes have no side-effects and are typically safe enable even if unused

1) Image Format: QCOW or RAW?
OpenStack storage configuration is an implementation choice by the Cloud Administrator, often not fully visible to the tenant. Storage configuration may also change over the time without explicit notification by the Administrator, as he/she adds capacity with different specs.
When creating a new instance on OpenStack, it is based on a Glance image. The two most prevalent and recommended image formats are QCOW2 and RAW. QCOW2 images (from QEMU Copy On Write) are typically smaller in size. For instance a server with a 100 GB disk, the size of the image in RAW format, might be only 10 GBs when formatted into QCOW2. Regardless of the format, it is a good idea to process images before uploading them to Glance with virt-sysprep(1) and virt-sparsify(1).
The performance of QCOW2 depends on both the hypervisor kernel and the format version, the latest being QCOW2v3 (sometimes referred to as QCOW3) which has better performance than the earlier QCOW2, almost as good as RAW format. In general we assume RAW has better overall performance despite the operational drawbacks (like the lack of snapshots) or the increase in time it takes to upload or boot (due to its bigger size). Our latest versions of Red Hat OpenStack Platform automatically use the newer QCOW2v3 format (thanks to the recent RHEL versions) and it is possible to check and also convert between RAW and older/newer QCOW2 images with qemu-img(1).
OpenStack instances can either boot from a local image or from a remote volume. That means

Image-backed instances benefit significantly by the performance difference between older QCOW2 vs QCOW2v3 vs RAW.
Volume-backed instances can be created either from QCOW2 or RAW Glance images. However, as Cinder backends are vendor-specific (Ceph, 3PAR, EMC, etc), they may not use QCOW2 nor RAW. They may have their own mechanisms, like dedup, thin provisioning or copy-on-write

.
As a general rule of thumb, rarely used images should be stored in Glance as QCOW2, but an image which is used constantly to create new instances (locally stored), or for any volume-backed instances, using RAW should provide better performance despite the sometimes longer initial boot time (except in Ceph-backed systems, thanks to its copy-on-write approach). In the end, any actual recommendation will depend on the OpenStack storage configurations chosen by the Cloud Administrator..
2) Performance Tweaks via Image Extra Properties
Since the Mitaka version, OpenStack allows Nova to automatically optimize certain libvirt and KVM properties on the Compute host to better execute a particular OS in the guest. To provide the guest OS information to Nova, just define the following Glance image properties:

os_type=linux # Generic name, like linux or windows
os_distro=rhel7.1 # Use osinfo-query os to list supported variants

Additionally, at least for the time being (see BZ#), in order to make sure the newer and more scalable virtio-scsi para-virtualized SCSI controller is used instead of the older virt-blk, the following properties need to be set explicitly:

hw_scsi_model=virtio-scsi
hw_disk_bus=scsi

All the supported image properties are listed at the Red Hat Documentation portal as well as other CLI options. 
3) Prepare for Cloud-init
“Cloud-init” is a package used for early initialization of cloud instances, to configure basics like partition / filesystem size and SSH keys.
Ensure that you have installed the cloud-init and cloud-utils-growpart packages in your Glance image, and that the related services will be executed on boot, to allow the execution of “cloud-init” configurations to the OpenStack VM.
In many cases the default configuration is acceptable but there are lots of customization options available, for details please refer to the cloud-init documentation.
4) Enable the QEMU Guest Agent
On Linux hosts, it is recommended to install and enable the QEMU guest agent which allows graceful guest shutdown and (in the future) automatic freezing of guest filesystems when snapshots are requested, which is a necessary operation for consistent backups (see BZ#):

yum install qemu-guest-agent
systemctl enable qemu-guest-agent

In order to provide the needed virtual devices and use the filesystem freezing functionality when needed, the following properties need to be defined for Glance images (see also BZ#):

hw_qemu_guest_agent=yes # Create the needed device to allow the guest agent to run
os_require_quiesce=yes # Accept requests to freeze/thaw filesystems

5) Just in case: how to recover from guest failure

Comprehensive instance fault recovery, high availability, and service monitoring requires a layered approach which as a whole is out of scope for this document. In the paragraphs below we show the options that can be applicable purely inside a guest (which can be thought as being the innermost layer). The most frequently used fault recovery mechanisms for an instance are:

recovery from kernel crashes
recovery from guest hangs (which do not necessarily involve kernel crash/panic)

In the rare case the guest kernel crashes, kexec/kdump will capture a kernel vmcore for further analysis and reboot the guest. In case the vmcore is not wanted, kernel can be instructed to reboot after a kernel crash by setting the panic kernel parameter, for example “panic=1”.
In order to reboot an instance after other unexpected behavior, for example high load over a certain threshold or a complete system lockup without a kernel panic, the watchdog service can be utilized. Other actions than &;reboot&; can be found here. The following property needs to be defined for Glance images or Nova flavors.

hw_watchdog_action=reset

Then, install the watchdog package inside the guest, then configure the watchdog device, and finally, enable the service:

yum install watchdog
vi /etc/watchdog.conf
systemctl enable watchdog

By default watchdog detects kernel crashes and complete system lockups. See the watchdog.conf(5) man page for more information, e.g., how to add guest health-monitoring scripts as part of watchdog functionality checks.
6) Tune the Kernel
The simplest way to tune a Linux node is to use the “tuned” facility. It’s a service which configures dozens of system parameters according to the selected profile, which in the OpenStack case is “virtual-guest”. For NFV workloads, Red Hat provides a set of NFV tuned profiles to simplify the tuning of network-intensive VMs, .
In your Glance image, it is recommended to install the required package, enable the service on boot, and activate the preferred profile. You can do it by editing the image before uploading to Glance, or as part of your cloud-init recipe:

yum install tuned
systemctl enable tuned
tuned-adm profile virtual-guest

7) Improve networking via VirtIO Multiqueuing
Guest kernel virtio drivers are part of the standard RHEL/Linux kernel package and enabled automatically without any further configuration as needed. Windows guests should also use the official virtio drivers for their particular Windows version, greatly improving network and disk IO performance.
However, recent advanced in Network packet processing in the Linux kernel and also in user-space components created a myriad of extra options to tune or bypass the virtio drivers. Below you&;ll find an illustration of the virtio device model (from the RHEL Virtualization guide).
Network multiqueuing, or virtio-net multi-queue, is an approach that enables parallel packet processing to scale linearly with the number of available vCPUs of a guest, often providing notable improvement to transfer speeds especially with vhost-user.
Provided that the OpenStack Admin has provisioned the virtualization hosts with supporting components installed (at least OVS 2.5 / DPDK 2.2), this functionality can be enabled by OpenStack Tenant with the following property in those Glance images where we want network multiqueuing:

hw_vif_multiqueue_enabled=true

Inside a guest instantiated from such an image, the NIC channel setup can be checked and changed as needed with the commands below:

ethtool -l eth0 to see the current number of queues
ethtool -L eth0 combined <nr-of-queues> # to set the number of queues. Should match the number of vCPUs

There is an open RFE to implement multi-queue activation by default in the kernel, see BZ#.

8) Other Miscellaneous Tuning for Guests

It should go without saying that right-sized instances should contain only the minimum amount of installed packages and run only the services needed. Of a particular note, it is probably a good idea to install and enable the irqbalance service as, although not absolutely necessary in all scenarios, its overhead is minimal and it should be used for example in SR-IOV setups (this way the same image can be used regardless of such lower level details).
Even though implicitly set on KVM, it is a good idea to explicitly add the kernel parameter no_timer_check to prevent issues with timing devices. Enabling persistent DHCP client and disabling zeroconf route in network configuration with PERSISTENT_DHCLIENT=yes and NOZEROCONF=yes, respectively, helps to avoid networking corner case issues.
Guest MTU settings are usually adjusted correctly by default, but having a proper MTU in use on all levels of the stack is crucial to achieve maximum network performance. In environments with 10G (and faster) NICs this typically means the use of Jumbo Frames with MTU up to 9000, taking possible VXLAN encapsulation into account. For further MTU discussion, see the upstream guidelines for MTU or the Red Hat OpenStack Networking Guide.
9) Improving the way you access your instances
Although some purists may consider incompatible running SSH inside truly cloud-native instances, especially in auto-scaling production workloads, most of us will still rely on good old SSH to perform configuration tasks (via Ansible for instance) as well as maintenance and troubleshooting (e.g., to fetch logs after a software failure).
The SSH daemon should avoid DNS lookups to speed up establishing SSH connections. For this, consider using UseDNS no in /etc/ssh/sshd_config and adding OPTIONS=-u0 to /etc/sysconfig/sshd (see sshd_config(5) for details on these). Setting GSSAPIAuthentication no could be considered if Kerberos is not in use. In case instances frequently connect to each other, the ControlPersist / ControlMaster options might be considered as well.
Typically remote SSH access and console access via Horizon are enough for most use cases. During development phase direct console access from the Nova compute host may also be helpful, for this to work enable the serial-getty@ttyS1.service, allow root access via ttyS1 if needed by adding ttyS1 to /etc/securetty, and then access the guest console from the Nova compute with virsh console <instance-id> &;devname serial1.

We hope with this blog post you&8217;ve discovered new ways to improve the performance of your OpenStack instances. If you need more information, remember we have tons of documents in our OpenStack Documentation Portal and that we offer the best OpenStack courses of the industry, starting with the free of charge CL010 Introduction to OpenStack Course.
Quelle: RedHat Stack

Docker for Windows Server and Image2Docker

In December we had a live webinar focused on Server Docker containers. We covered a lot of ground and we had some great feedback &; thanks to all the folks who joined us. This is a brief recap of the session, which also gives answers to the questions we didn’t get round to.
Webinar Recording
You can view the webinar on YouTube:

The recording clocks in at just under an hour. Here’s what we covered:

00:00 Introduction
02:00 Docker on Windows Server 2016
05:30 Windows Server 2016 technical details
10:30 Hyper-V and Windows Server Containers
13:00 Docker for Windows Demo &8211; ASP.NET Core app with SQL Server
25:30 Additional Partnerships between Docker, Inc. and Microsoft
27:30 Introduction to Image2Docker
30:00 Demo &8211; Extracting ASP.NET Apps from a VM using Image2Docker
52:00 Next steps and resources for learning Docker on Windows

Q&A
Can these [Windows] containers be hosted on a Linux host?
No. Docker containers use the underlying operating system kernel to run processes, so you can’t mix and match kernels. You can only run Windows Docker images on Windows, and Linux Docker images on Linux.
However, with an upcoming release to the Windows network stack, you will be able to run a hybrid Docker Swarm &8211; a single cluster containing a mixture of Linux and Windows hosts. Then you can run distributed apps with Linux containers and Windows containers communicating in the same Docker Swarm, using Docker’s networking layer.
Is this only for ASP.NET Core apps?
No. You can package pretty much any Windows application into a Docker image, provided it can be installed and run without a UI.
The first demo in the Webinar showed an ASP.NET Core app running in Docker. The advantage with .NET Core is that it’s cross-platform so the same app can run in Linux or Windows containers, and on Windows you can use the lightweight Nano Server option.
In the second demo we showed ASP.NET WebForms and ASP.NET MVC apps running in Docker. Full .NET Framework apps need to use the WIndows Server Core base image, but that gives you access to the whole feature set of Windows Server 2016.
If you have existing ASP.NET applications running in VMs, you can use the Image2Docker tool to port them across to Docker images. Image2Docker works on any Windows Server VM, from Server 2003 to Server 2016.

How does licensing work?
For production, licensing is at the host level, i.e. each machine or VM which is running Docker. Your Windows licence on the host allows you to run any number of Windows Docker containers on that host. With Windows Server 2016 you get the commercially supported version of Docker included in the licence costs, with support from Microsoft and Docker, Inc.
For development, Docker for Windows runs on Windows 10 and is free, open-source software. Docker for Windows can also run a Linux VM on your machine, so you can use both Linux and Windows containers in development. Like the server version, your Windows 10 licence allows you to run any number of Windows Docker containers.
Windows admins will want a unified platform for managing images and containers. That’s Docker Datacenter which is separately licensed, and will be available for Windows soon.
What about Windows updates for the containers?
Docker containers have a different life cycle from full VMs or bare-metal servers. You wouldn’t deploy an app update or a Windows update inside a running container &8211; instead you update the image that packages your app, then just kill the container and start a new container from the updated image.
Microsoft are supporting that workflow with the two Windows base images on Docker Hub &8211; for Windows Server Core and Nano Server. They are following a monthly release cycle, and each release adds an incremental update with new patches and security updates.
For your own applications, you would aim to have the same deployment schedule &8211; after a new release of the Windows base image, you would rebuild your application images and deploy new containers. All this can be automated, so it’s much faster and more reliable than manual patching. Docker Captain Stefan Scherer has a great blog post on keeping your Windows containers up to date.
Additional Resources

Get everything Docker and Microsoft here
Windows and Docker case-study: Tyco’s life-safety applications 
Self-paced labs for Windows Containers from Docker
Packaging ASP.NET 4.5 Applications in Docker
Subscribe to Docker’s weekly newsletter

Our Windows containers webinar is on YouTube for you to To Tweet

The post Docker for Windows Server and Image2Docker appeared first on Docker Blog.
Quelle: https://blog.docker.com/feed/

OpenStack Developer Mailing List Digest January 7-13

SuccessBot Says

dims 1: Rally running against Glance (Both Rally and Glance using py3.5).
AJaegar 2: docs.openstack.org is served from the new Infra file server that is AFS based.
jd 3: Gnocchi 3.1 will be shipped with an empty /etc and will work without any config file by default.
cdent 4 : edleafe found narrowed down an important bug in gabbi.
Tell us yours via OpenStack IRC channels with message “ <message>”
All

Return of the Architecture Working Group

Meeting times Alternate, even weeks Thursday at 20:00UTC, odd weeks Thursday at 01:00UTC
Currently two proposes:

“Base Services” proposal 5 recognizes components leveraging features from external services that OpenStack components can assume will be present. Two kinds:

Local (like a hypervisor on a compute node)
Global (like a database)

“Nova Compute API” proposal 6 breaking nova-compute out of Nova itself.

Full thread

Restarting Service-types-authority / service catalog work

In anticipation of having a productive time in Atlanta for the PTG, various patches have been refreshed 7.
Two base IASS services aren’t in the list yet because of issues:

Neutron / network &; discrepancy between common use of “network” and “networking” in the API reference URL. Other services in the list have the service-type and the URL name for the API reference are the same.
Cinder / volume &8211; Moving forward from using volumev2 and volumev3 in devstack.

Full thread

Feedback From Driver Maintainers About Future of Driver Projects

Major observations

Yes drivers are an important part of OpenStack.
Discoverability of drivers needs to be fixed immediately.
It’s important to have visibility in a central place of the status of each driver.
Both driver developer and a high level person at a company should feel they’re part of something.
Give drivers access to publish to docs.openstack.org.
What constitutes a project was never for drivers. Drivers are part part of the project. Driver developers contribute to OpenStack by creating drivers.

Discoverability:

Consensus: it is currently all over the place 8 9 10.
There should be CI results available.
Discoverability can be fixed independently of governance changes.

Driver projects official or not?

Out-of-tree vendors have a desire to become “official” OpenStack projects.
Opinion: let driver projects become official without CI requirements.
Opinion: Do not allow drivers projects to become official, that doesn’t mean they shouldn’t easily be discoverable.
Opinion: We don&;t need to open the flood gates of allowing vendors to be teams in the OpenStack governance to make the vendors developers happy.
Fact: This implies being placed under the TC oversight. It is a significant move that could have unintended side-effects, it is hard to reverse (kicking out teams we accepted is worse than not including them in the first place), and our community is divided on the way forward. So we need to give that question our full attention and not rush the answer.
Opinion: Consider driver log 11 an official OpenStack project to be listed under governance with a PTL, weekly meetings, and all that it required to allow the team to be effective in their mission of keeping the marketplace a trustworthy resource for learning about OpenStack driver ecosystem.

Driver Developers:

Opinion: A driver developer that ONLY contributes to vendor specific driver code should not have the same influence as other OpenStack developers, voting for PTL, TC, and ATC status.
Opinion: PTLs should leverage the extra-atcs option in the governance repo.

In-tree VS out-of-tree

Cinder has in-tree drivers, but also has out-of-tree drivers when their CI is not maintained or when minimum feature requirements are not met. They are marked as ‘not supported’ and have a single release to get things working before being moved out-of-tree.
Ironic has a single out-of-tree repo 12 &; But also in-tree 13
Neutron has all drivers out-of-tree, with project names like: ‘networking-cisco’.
Many opinions on the “stick-based” approach the cinder team took.
Opinion: The in-tree vs out-of-tree argument is developer focused. Out-of-tree drivers have obvious benefits (develop quickly, maintain their own team, no need for a core to review the patch). But a vendor that is looking to make sure a driver is supported will not be searching git repos (goes back to discoverability).
Opinion: May be worth handling the projects that keep supported drivers in-tree differently that we handle projects that have everything out-of-tree.

Full thread

POST /api-wg/news

Guidelines currently under review:

Add guidelines on usage of state vs. status 14
Add guidelines for boolean names 15
Clarify the status values in versions 16
Define pagination guidelines 17
Add API capabilities discovery guideline 18
Add guideline for invalid query parameters 19

Full thread

New Deadline for PTG Travel Support Program

Help contributors that are not otherwise funded to join their project team gathering 20
Originally the application acceptance was set to close January 15, but it’s now extended to the end-of-day Tuesday January 17th.
Apply now if you need it! 21
Submissions will be evaluated next week and grantees will be notified by Friday, January 20th.
Register for the event 22 if you haven’t yet. Prices will increase on January 24 and February 14.
If you haven’t already booked your hotel yet, do ASAP in the event hotel itself using the PTG room block. This helps us keep costs under control and helps share the most time with the event participants.

Closes January 27
Book now 23

Full thread

Release Countdown For Week R-5

Focus:

Feature work and major refactoring be starting to wrap up as we approach the the third milestone.

Release Tasks:

stable/ocata branches will be created and configured with a small subset of the core review team. Release liaisons should ensure that these groups exist and the membership is correct.

General Notes:

We will start the soft string freeze during R-4 (Jan 23-27) 24
Subscribe to the release calendar with your favorite calendaring software 25

Important Dates:

Final release for non-client libraries: January 19
Ocata 3 milestone with feature and requirements freeze: January 26
Ocata release schedule 26

Full thread

 
[1] &8211; http://eavesdrop.openstack.org/irclogs/%23openstack-glance/%23openstack-glance.2017-01-09.log.html
[2] &8211; http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-01-10.log.html
[3] &8211; http://eavesdrop.openstack.org/irclogs/%23openstack-telemetry/%23openstack-telemetry.2017-01-11.log.html
[4] &8211; http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2017-01-12.log.html
[5] &8211; http://git.openstack.org/cgit/openstack/arch-wg/tree/proposals/base-services.rst
[6] &8211; https://review.openstack.org/#/c/411527/1
[7] &8211; https://review.openstack.org/#/c/286089/
[8] &8211; http://docs.openstack.org/developer/cinder/drivers.html
[9] &8211; http://docs.openstack.org/developer/nova/support-matrix.html
[10] &8211; http://stackalytics.openstack.org/report/driverlog
[11] &8211; http://git.openstack.org/cgit/openstack/driverlog
[12] &8211; https://git.openstack.org/cgit/openstack/ironic-staging-drivers
[13] &8211; http://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers
[14] &8211; https://review.openstack.org/#/c/411528/
[15] &8211; https://review.openstack.org/#/c/411529/
[16] &8211; https://review.openstack.org/#/c/411849/
[17] &8211; https://review.openstack.org/#/c/390973/
[18] &8211; https://review.openstack.org/#/c/386555/
[19] &8211; https://review.openstack.org/417441
[20] &8211; http://www.openstack.org/ptg#
[21] &8211; https://openstackfoundation.formstack.com/forms/travelsupportptg_atlanta
[22] &8211; https://pikeptg.eventbrite.com/
[23] &8211; https://www.starwoodmeeting.com/events/start.action?id=1609140999&key=381BF4AA
[24] &8211; https://releases.openstack.org/ocata/schedule.html#-soft-sf
[25] &8211; https://releases.openstack.org/schedule.ics
[26] &8211; http://releases.openstack.org/ocata/schedule.html
Quelle: openstack.org

DockerCon 2017 first speakers announced

To the rest of the world, 2017 may seem a ways away, but here at we are heads down reading your Call for Papers submissions and curating content to make this the biggest and best DockerCon to date. With that, we are thrilled to share with you the DockerCon 2017 Website with helpful information including ten of the first confirmed speakers and sessions.
If you want to join this amazing lineup and haven’t submitted your cool hack, use case or deep dive session, don’t hesitate! The Call for Papers closes this Saturday, January 14th.
 
Submit a talk
 
First DockerCon speakers
 
Laura Frank
Sr. Software Engineer, Codeship
Everything You Thought You Already Knew About Orchestration
 
 
 

Julius Volz
Co-founder, Prometheus
Monitoring, the Prometheus Way
 
 

 
Liz Rice
Co-founder & CEO, Microscaling Systems
What have namespaces done for you lately?

 
 

 
Thomas Graf
Principal Engineer at Noiro, Cisco
Cilium – BPF & XDP for containers
 
 

 
Brendan Gregg 
Sr. Performance Architect, Netflix
Container Tracing Deep Dive
 
 

 
Thomas Shaw
Build Engineer, Activision
Activision&;s Skypilot: Delivering amazing game experiences through containerized pipelines
 
 

 
Fabiane Nardon
Chief Scientist at TailTarget
Docker for Java Developers
 
 

 
Arun Gupta
Vice President of Developer Advocacy, Couchbase
Docker for Java Developers
 
 

 
Justin Cappos
Assistant Professor in the Computer Science and Engineering department at New York University
Securing the Software Supply Chain
 
 

 
John Zaccone
Software Engineer
A Developer’s Guide to Getting Started with Docker

Convince your boss to send you to DockerCon
Do you really want to go to DockerCon, but are having a hard time convincing your boss on pulling the trigger to send you? Have you already explained that sessions, training and hands-on exercises are definitely worth the financial investment and time away from your desk?
We want you to join the community and us at DockerCon 2017, so we’ve put together the following packet of event information, including a helpful letter you can use to send to your boss to justify your trip. We are confident there’s something at DockerCon for everyone, so feel free to share within your company and networks.

Download Now
More information about DockerCon 2017:

Register for the conference
Submit a talk
Choose what workshop to attend
Book your Hotel room
Become a sponsor

DockerCon 2017 first speakers announced &; still time to submit your docker talksClick To Tweet

The post DockerCon 2017 first speakers announced appeared first on Docker Blog.
Quelle: https://blog.docker.com/feed/