4 characteristics that set blockchain apart

I speak to lots of customers who are using or thinking of using blockchain.
Depending on who you speak to, blockchain is either a new power poised to change the way we do business or the latest IT hype.
I believe blockchain has characteristics which mark it as something transformative, perhaps even more transformative than the web.
At its core, blockchain is just a database, one that is particularly good at dealing with transactions about assets, whether they’re financial assets, physical assets such as cars, or something more abstract like customer data.
But blockchain has four key characteristics which make it different:

It is designed to be distributed and synchronized across networks, which makes it ideal for multi-organizational business networks such as supply chains or financial consortia. It also encourages organizations to come out from behind their firewalls and share data.

You can&;t just do whatever you want to the data. The types of transactions one can carry out are agreed between participants in advance and stored in the blockchain as “smart contracts,” which helps give confidence that everyone is playing by the rules.

Before one can execute a transaction, there must be agreement between all relevant parties that the transaction is valid. For example, if you&8217;re registering the sale of a cow, that cow must belong to you or you won&8217;t get agreement. This process is known as “consensus” and it helps keep inaccurate or potentially fraudulent transactions out of the database.

Immutability of the data. Once you have agreed on a transaction and recorded it, it can never be changed. You can subsequently record another transaction about that asset to change its state, but you can never hide the original transaction. This gives the idea of provenance of assets, which means that for any asset you can tell where it is, where it&8217;s been and what has happened throughout its life.

Taken together, these four characteristics give organizations a high degree of trust in the data and the business network. That level of trust makes blockchain important for the next generation of business applications.
To understand why, one must understand the nature of trust. To do that, I’m going to take a short detour through 25 centuries of human economic history in my next post.
Learn more about the IBM cloud-based blockchain platform.
The post 4 characteristics that set blockchain apart appeared first on news.
Quelle: Thoughts on Cloud

Watson identifies the best shots at the Masters

Golf fans know great shots when they see them. And now, Watson does, too.
For this year&;s Masters Tournament, IBM — which has a long history with the Masters as a technology partner — is making use of Watson&8217;s Cognitive Highlights capability to find those memorable moments and spotlight them at Augusta National Golf Club and in cloud-based streaming video apps. It&8217;s a first for sporting events.
&;This year, they really wanted to take the Masters&8217; digital projects to a new level, so we began thinking about how we can have an immersive video experience and what would make that space even more impressive,&; said John Kent, program manager for the IBM worldwide sports and entertainment partnership group. &8220;That&8217;s how Watson became involved.&8221;
The Watson Cognitive Highlights technology uses factors including player gestures and crowd noise to pinpoint shots worthy of replay.
For more, check out ZDNet&;s full article.
The post Watson identifies the best shots at the Masters appeared first on news.
Quelle: Thoughts on Cloud

Mirantis Cloud Platform: Stop wandering in the desert

The post Mirantis Cloud Platform: Stop wandering in the desert appeared first on Mirantis | Pure Play Open Cloud.
There&;s no denying that the last year has seen a great deal of turmoil in the OpenStack world, and here at Mirantis we&8217;re not immune to it.
In fact, some would say that we&8217;re part of that turmoil. Well, we are in the middle of a sea change in how we handle cloud deployments, moving from a model in which we focused on deploying OpenStack to one in which we focus on achieving outcomes for our customers.  
And then there&8217;s the fact that we are changing the architecture of our technology.
It&8217;s true. Over the past few months, we have been moving from Mirantis OpenStack to Mirantis Cloud Platform (MCP), but there&8217;s no need to panic. While it may seem a little scary, we&8217;re not moving away from OpenStack – rather, we are growing up and tackling the bigger picture, not just a part of it. In early installations with marquee customers, we’ve seen MCP provide a tremendous advantage in deployment and scale-out time. In just a few days, we will publicly launch MCP, and you will have our first visible signpost leading you out of the desert. We still have lots of work to do, but we&8217;re convinced this is the right path for our industry to take, and we&8217;re making great progress in that direction.
Where we started
To understand what&8217;s going on here, it helps to have a firm grasp of where we started.
When I started here at Mirantis four years ago, we had one product, Mirantis Fuel, and it had one purpose: deploy OpenStack. Back then that was no easy feat. Even with a tool like Fuel, it could be a herculean task taking many days and lots of calls to people who knew more than I did.
Over the intervening years, we came to realize that we needed to take a bigger hand in OpenStack itself, and we produced Mirantis OpenStack, a set of hardened OpenStack packages.  We also came to realize that deployment was only the beginning of the process; customers needed Lifecycle Management.
The Big Tent
And so Fuel grew. And grew. And grew. Finally, Fuel became be so big that we felt we needed to involve the community even more than we already had, and we submitted Fuel to the Big Tent.
Here Fuel has thrived, and does an awesome job of deploying OpenStack, and a decent job at lifecycle management.
But it&8217;s not enough.
Basically, when you come right down to it, OpenStack is nothing more than a big, complicated, distributed application. Sure, it&8217;s a big, complicated distributed application that deploys a cloud platform, but it&8217;s still a big complicated distributed application.
And let&8217;s face it: deploying and managing big, complicated, distributed applications is a solved problem.
The Mirantis Cloud Platform architecture
So let&8217;s look at what this means in practice.  The most important thing to understand is that where Mirantis OpenStack was focused on deployment, MCP is focused on the operations tasks you need to worry about after that deployment. MCP means:

A single cloud that runs VMs, containers, and bare metal with rich Software Defined Networking (SDN) and Software Defined Storage (SDS) functionality
Flexible deployment and simplified operations and lifecycle management through a new DevOps tool called DriveTrain
Operations Support Services in the form of enhanced StackLight software, which also provides continuous monitoring to ensure compliance to strict availability SLAs

OK, so that&8217;s a little less confusing than the diagram, but there&8217;s still a lot of &;sales&; speak in there.
Let&8217;s get down to the nitty gritty of what MCP means.
What Mirantis Cloud Platform really means
Let&8217;s look at each of those things individually and see why it matters.
A multi-platform cloud
There was a time when you would have separate environments for each type of computing you wanted to do. High performance workloads ran on bare metal, virtual machines ran on OpenStack, containers (if you were using them at all) ran on their own dedicated clusters.
In the last few years, bare metal was brought into Openstack, so that you could manage your physical machines the same way you managed your virtual ones.
Now Mirantis Cloud Platform brings in the last remaining piece. Your Kubernetes cluster is part of your cloud, enabling you to easily manage your container-based applications in the same environment and with the same tools as your traditional cloud resources.
All of this is made possible by the inclusion of powerful SDN and SDS components. Software Defined Networking for OpenStack is handled by OpenContrail, providing the benefits of commercial-grade networking without the lock-in, with Calico stepping in for the container environment. Storage takes the form of powerful open source Ceph clusters, which are used by both OpenStack and container applications.
These components enable MCP to provide an environment where all of these pieces work together seamlessly, so your cloud can be so much more than just OpenStack.
Knowing what&8217;s happening under the covers
With all of these pieces, you need to know what&8217;s happening &; and what might happen next. To that end, Mirantis Cloud Platform includes an updated version of StackLight, which gives you a comprehensive view of how each component of your cloud is performing; if an application on a particular VM acts up, you can isolate the problem before it brings down the entire node,
What&8217;s more, the StackLight Operations Support System analyzes the voluminous information it gets from your OpenStack cloud and can often let you know there&8217;s trouble &8212; before it causes problems.
All of this enables you to ensure uptime for your users &8212; and compliance with SLAs.
Finally solving the operations dilemma
Perhaps the biggest change, however, is in the form of DriveTrain. DriveTrain is a combination of various open source projects, such as Gerrit and Jenkins for CI/CD and Salt for configuration management, enabling a powerful, flexible way for you to both deploy and manage your cloud.
Because let&8217;s face it: the job of running a private cloud doesn&8217;t end when you&8217;ve spun up the cloud &8212; it&8217;s just begun.
Upgrading OpenStack has always been a nightmare, but DriveTrain is designed so that your cloud infrastructure software can always be up-to-date. Here&8217;s how it works:
Mirantis continually monitors changes to OpenStack and other relevant projects, providing extensive testing and making sure that no errors get introduced, in a process called &8220;hardening&8221;.  Once we decide these changes are ready for general use, we release them into the DriveTrain CI/CD infrastructure.
Once changes hit the CI/CD infrastructure, you pull them down into a staging environment and decide when you&8217;re ready to push them to production.
In other words, no more holding your breath every six months &8212; or worse, running cloud software that&8217;s year old.
Where do you want to go?
OpenStack started with great promise, but in the last few years it&8217;s become clear that the private cloud world is more than just one solution; it&8217;s time for everyone &8212; and that includes us here at Mirantis &8212; to step up and embrace a future that includes virtual machines, bare metal and containers, but in a way that makes both technological and business sense.
Because at the end of the day, it&8217;s all about outcomes; if your cloud doesn&8217;t do what you want, or if you can&8217;t manage it, or if you can&8217;t keep it up to date, you need something better. We&8217;ve been working hard at making MCP the solution that gets you where you want to be. Let us know how we can help get you there.
The post Mirantis Cloud Platform: Stop wandering in the desert appeared first on Mirantis | Pure Play Open Cloud.
Quelle: Mirantis

Let’s Meet At OpenStack Summit In Boston!

The post Let&;s Meet At OpenStack Summit In Boston! appeared first on Mirantis | Pure Play Open Cloud.

 
The citizens of Cloud City are suffering — Mirantis is here to help!
 
We&8217;re planning to have a super time at summit, and hope that you can join us in the fight against vendor lock-in. Come to booth C1 to power up on the latest technology and our revolutionary Mirantis Cloud Platform.

If you&8217;d like to talk with our team at the summit, simply contact us and we&8217;ll schedule a meeting.

REQUEST A MEETING

 
Free Mirantis Training @ Summit
Take advantage of our special training offers to power up your skills while you&8217;re at the Summit! Mirantis Training will be offering an Accelerated Bootcamp session before the big event. Our courses will be conveniently held within walking distance of the Hynes Convention Center.

Additionally, we&8217;re offering a discounted Professional-level Certification exam and a free Kubernetes training, both held during the Summit.

 
Mirantis Presentations
Here&8217;s where you can find us during the summit&;.
 
MONDAY MAY 8

Monday, 12:05pm-12:15pm
Level: Intermediate
Turbo Charged VNFs at 40 gbit/s. Approaches to deliver fast, low latency networking using OpenStack.
(Gregory Elkinbard, Mirantis; Nuage)

Monday, 3:40pm-4:20pm
Level: Intermediate
Project Update &; Documentation
(Olga Gusarenko, Mirantis)

Monday, 4:40pm-5:20pm
Level: Intermediate
Cinder Stands Alone
(Ivan Kolodyazhny, Mirantis)

Monday, 5:30pm-6:10pm
Level: Intermediate
m1.Boaty.McBoatface: The joys of flavor planning by popular vote
(Craig Anderson, Mirantis)

 

TUESDAY MAY 9

Tuesday, 2:00pm-2:40pm
Level: Intermediate
Proactive support and Customer care
(Anton Tarasov, Mirantis)

Tuesday, 2:30pm-2:40pm
Level: Advanced
OpenStack, Kubernetes and SaltStack for complete deployment automation
(Aleš Komárek and Thomas Lichtenstein, Mirantis)

Tuesday, 2:50pm-3:30pm
Level: Intermediate
OpenStack Journey: from containers to functions
(Ihor Dvoretskyi, Mirantis; Iron.io, BlueBox)

Tuesday, 4:40pm-5:20pm
Level: Advanced
Point and Click ->CI/CD: Real world look at better OpenStack deployment, sustainability, upgrades!
(Bruce Mathews and Ryan Day, Mirantis; AT&T)

Tuesday, 5:05pm-5:45pm
Level: Intermediate
Workload Onboarding and Lifecycle Management with Heat
(Florin Stingaciu and Lance Haig, Mirantis)

 

WEDNESDAY MAY 10

Wednesday, 9:50am-10:30am
Level: Intermediate
Project Update &8211; Neutron
(Kevin Benton, Mirantis)

Wednesday, 11:00am-11:40am
Level: Intermediate
Project Update &8211; Nova
(Jay Pipes, Mirantis)

Wednesday, 1:50pm-2:30pm
Level: Intermediate
Kuryr-Kubernetes: The seamless path to adding Pods to your datacenter networking
(Ilya Chukhnakov, Mirantis)

Wednesday, 1:50pm-2:30pm
Level: Intermediate
OpenStack: pushing to 5000 nodes and beyond
(Dina Belova and Georgy Okrokvertskhov, Mirantis)

Wednesday, 4:30pm-5:10pm
Level: Intermediate
Project Update &8211; Rally
(Andrey Kurilin, Mirantis)

 

THURSDAY MAY 11

Thursday, 9:50am-10:30am
Level: Intermediate
OSprofiler: evaluating OpenStack
(Dina Belova, Mirantis; VMware)

Thursday, 11:00am-11:40am
Level: Intermediate
Scheduler Wars: A New Hope
(Jay Pipes, Mirantis)

Thursday, 11:30am-11:40am
Level: Beginner
Saving one cloud at a time with tenant care
(Bryan Langston, Mirantis; Comcast)

Thursday, 3:10pm-3:50pm
Level: Advanced
Behind the Scenes with Placement and Resource Tracking in Nova
(Jay Pipes, Mirantis)

Thursday, 5:00pm-5:40pm
Level: Intermediate
Terraforming OpenStack Landscape
(Mykyta Gubenko, Mirantis)

 

Notable Presentations By The Community
 
TUESDAY MAY 9

Tuesday, 11:15am-11:55am
Level: Intermediate
AT&;T Container Strategy and OpenStack&8217;s role in it
(AT&038;T)

Tuesday, 11:45am-11:55am
Level: Intermediate
AT&038;T Cloud Evolution : Virtual to Container based (CI/CD)^2
(AT&038;T)

WEDNESDAY MAY 10

Wednesday, 1:50pm-2:30pm
Level: Intermediate
Event Correlation &038; Life Cycle Management – How will they coexist in the NFV world?
(Cox Communications)

Wednesday, 5:20pm-6:00pm
Level: Intermediate
Nova Scheduler: Optimizing, Configuring and Deploying NFV VNF&8217;s on OpenStack
(Wind River)

THURSDAY MAY 11

Thursday, 9:00am-9:40am
Level: Intermediate
ChatOpsing Your Production Openstack Cloud
(Adobe)

Thursday, 11:00am-11:10am
Level: Intermediate
OpenDaylight Network Virtualization solution (NetVirt) with FD.io VPP data plane
(Ericsson)

Thursday, 1:30pm-2:10pm
Level: Beginner
Participating in translation makes you an internationalized OpenStacker &038; developer
(Deutsche Telekom AG)

Thursday, 5:00pm-5:40pm
Level: Beginner
Future of Cloud Networking and Policy Automation
(Cox Communications)

The post Let&8217;s Meet At OpenStack Summit In Boston! appeared first on Mirantis | Pure Play Open Cloud.
Quelle: Mirantis

Ten Ways a Cloud Management Platform Makes your Virtualization Life Easier

I spent the last decade working with virtualization platforms and the certifications and accreditation’s that go along with them.  During this time, I thought I understood what it meant to run an efficient data center. After six months of working with Red Hat CloudForms, a Cloud Management Platform (CMP), I now wonder what was I thinking.  I encountered every one of the problems below, each are preventable with the right solution. Remember, we live in the 21st century&;shouldn’t the software that we use act like it?

We filled up a data store and all of the machines on it stopped working. 
It does not matter if it is a development environment or the mission critical database cluster, when storage fills up everything stops!  More often than not it is due to an excessive number of snapshots. The good news is CloudForms can quickly be set up with a policy to recognize and prevent this from happening.For example we can check the storage utilization and if it is over 90% full take action, or better yet, when it is within two weeks of being full based on usage trends. That way if manual action is required, there is enough forewarning to do so.  Another good practice is to setup a policy to disable more than a few snapshots. We all love to take snapshots, but there is a real cost to them, and there is no need to let them get out of hand.
I just got thousands of emails telling me that my host is down.The only thing worse than no email alert is receiving thousands of them. In CloudForms it is not only easy to set up alerts, but also to define how often they should be acted upon. For example, check every hour, but only notify once per day.
Your virtual machines (VMs) cannot be migrated because the VM tools updater CD-ROM image was not un-mounted correctly. 
This is a serious issue for a number of reasons.  First it breaks Disaster Recovery (DR) operations and can cause virtual machines to be out of balance. It also disables the ability to put a node into maintenance mode, potentially causing additional outages and delays.Most solutions involve writing a shell script that runs as root and attempts to periodically unmount the virtual CD-ROM drives. These scripts usually work, but are both scary from a security standpoint and indiscriminately dangerous, imagine physically ejecting the CD-ROM while the database administrator is in the middle of a database upgrade!  With CloudForms we can setup a simple policy that can unmount drives once a day, but only after sanity checking that it is the correct CD-ROM image and that the system is in a state where it can be safely unmounted.
I have to manually ensure that all of my systems pass an incredibly detailed and painful compliance check (STIGS, PCI, FIPS, etc.) by next week! 
I have lost weeks of my life to this and if you have not had the pleasure, count yourself lucky.  When the “friendly” auditors show up with a stack of three-ring binders and a mandate to check everything, you might as well clear your calendar for the next few weeks. In addition, since these checks are usually a requirement to continuing operations, expect many of these meetings to involve layers of upper management you did not know existed, and this is definitely not the best time to become acquainted.The good news is CloudForms allows for you to run automatic checks on VMs and hosts. If you are not already familiar with its OpenSCAP scanning capability, you owe yourself a look. Not only that, but if someone attempts to bring a VM online that is not compliant, CloudForms can shut it right back down. That is the type of peace of mind that allows for sleep-filled nights.
Someone logged into a production server as root using the virtual console and broke it.  Now you have to physically hunt down and interrogate all the potential culprits &; as well as fix the problem. 
Before you pull out your foam bat and roam the halls to apply some “sense” to the person who did this, it is good to know exactly who it was and what they did. With CloudForms you can see a timeline of each machine, who logged into what console, as well as perform a drift analysis to potentially see what changed.  With this knowledge you can now not only fix the problem, but also “educate” the responsible party.
The developers insist that all VM’s must have 8 vCPU’s and 64GB of RAM. 
The best way to fight flagrant waste or resources is with data.  CloudForms provides the concept of “Right-Sizing” where it will watch VMs operate and determine what resource allocation is the ideal size. With this information in hand CloudForms can either automatically adjust the allocations, or spit out a report to be used to show what the excessive resources are costing.
Someone keeps creating 32bit VM’s with more than 4GB of RAM! 
As we know there is no “good” way that a 32bit VM can possibly use that much memory and it is essentially just waste.  A simple CloudForms policy to check for “OS Type = 32bit” and “RAM > 4GB”, can be a very interesting report to run. Or better yet, put a policy in place to automatically adjust the memory to 4GB and notify the system owner.
I have to buy hardware for next year, but my capacity-planning formula involves a spreadsheet and a dart board. 
Long term planning in IT is hard, especially with dynamic workloads in a multi-cloud environment.  Once CloudForms is running, it automatically collects performance data and executes trend line analysis to assist with operational management. For example, in 23 days you will be out of storage on your production SAN. If that does not get the system administrator&;s attention nothing will. It can also perform simulations to see what your environment would look like if you added resources. So you can see your trend lines and capacity if you added another 100 VMs of a particular type and size.
For some reason two hosts were swapping VMs back and forth, and I only found out when people complained about performance. 
As an administrator there is no worse way to find out that something is wrong than being told by a user. Large scale issues such as this can be hard to see from the logs since they consist of typical output. With CloudForms, a timeline overview of the entire environment highlights issues like this and the root cause can be tracked down.
I spend most of my day pushing buttons, spinning up VMs, manually grouping them into virtual folders and tracking them with spreadsheets. 
Before starting a new administrator role it is always good to ask for the “Point of Truth” system that keeps track of what systems are running, where they are, and who is responsible for them.  More often than not the answer is, “A guy, who keeps track of the list, on his laptop”.This may be how it was always done, but now with tools such as CloudForms, you can automatically tag machines based on location, projects, users, or any other combination of characteristics, and as a bonus, can provide usage and costing information back to the user. Gary could only dream of providing that much helpful information.

Conclusion
There is never enough time in the day, and the pace of new technologies is accelerating. The only way to keep up is to automate processes. The tools that got you where you are today are not necessarily the same ones that will get you through the next generation of technologies. It will be critical to have tools that work across multiple infrastructure components and provide the visibility and automation required. This is why you need a cloud management platform and where the real power of CloudForms comes into play.
Quelle: CloudForms

We installed an OpenStack cluster with close to 1000 nodes on Kubernetes. Here’s what we found out.

The post We installed an OpenStack cluster with close to 1000 nodes on Kubernetes. Here&;s what we found out. appeared first on Mirantis | Pure Play Open Cloud.
Late last year, we did a number of tests that looked at deploying close to 1000 OpenStack nodes on a pre-installed Kubernetes cluster as a way of finding out what problems you might run into, and fixing them, if at all possible. In all we found several, and though in general, we were able to fix them, we thought it would still be good to go over the types of things you need to look for.
Overall we deployed an OpenStack cluster that contained more than 900 nodes using Fuel-CCP on a Kubernetes that had been deployed using Kargo. The Kargo tool is part of the Kubernetes Incubator project and uses the Large Kubernetes Cluster reference architecture as a baseline.
As we worked, we documented issues we found, and contributed fixes to both the deployment tool and reference design document where appropriate.  Here&8217;s what we found.
The setup
We started with just over 175 bare metal machines, allocating 3 of them to be used for Kubernetes control plane services placement (API servers, ETCD, Kubernetes scheduler, etc.), others had 5 virtual machines on each node, where every VM was used as a Kubernetes minion node.
Each bare metal node had the following specifications:

HP ProLiant DL380 Gen9
CPU &; 2x Intel(R) Xeon(R) CPU E5-2680 v3 @ .50GHz
RAM &8211; 264G
Storage &8211; 3.0T on RAID on HP Smart Array P840 Controller, HDD &8211; 12 x HP EH0600JDYTL
Network &8211; 2x Intel Corporation Ethernet 10G 2P X710

The running OpenStack cluster (as far as Kubernetes is concerned) consists of:

OpenStack control plane services running on close to 150 pods over 6 nodes
Close to 4500 pods spread across all of the remaining nodes, at 5 pods per minion node

One major Prometheus problem
During the experiments we used Prometheus monitoring tool to verify resource consumption and the load put on the core system, Kubernetes, and OpenStack services. One note of caution when using Prometheus:  Deleting old data from Prometheus storage will indeed improve the Prometheus API speed &; but it will also delete any previous cluster information, making it unavailable for post-run investigation. So make sure to document any observed issue and its debugging thoroughly!
Thankfully, we had in fact done that documentation, but one thing we&8217;ve decided to do going forward to prevent this problem by configuring Prometheus to back up data to one of the persistent time series databases it supports, such as InfluxDB, Cassandra, or OpenTSDB. By default, Prometheus is optimized to be used as a real time monitoring / alerting system, and there is an official recommendation from the Prometheus developers team to keep monitoring data retention for only about 15 days to keep the tool working in a quick and responsive manner. By setting up the backup, we can store old data for an extended amount of time for post-processing needs.
Problems we experienced in our testing
Huge load on kube-apiserver
Symptoms
Initially, we had a setup with all nodes (including the Kubernetes control plane nodes) running on a virtualized environment, but the load was such that the API servers couldn&8217;t function at all so they were moved to bare metal.  Still, both API servers running in the Kubernetes cluster were utilising up to 2000% of the available CPU (up to 45% of total node compute performance capacity), even after we migrated them to hardware nodes.
Root cause
All services that are not on Kubernetes masters (kubelet, kube-proxy on all minions) access kube-apiserver via a local NGINX proxy. Most of those requests are watch requests that lie mostly idle after they are initiated (most timeouts on them are defined to be about 5-10 minutes). NGINX was configured to cut idle connections in 3 seconds, which causes all clients to reconnect and (even worse) restart aborted SSL sessions. On the server side, this it makes kube-apiserver consume up to 2000% of the CPU resources, making other requests very slow.
Solution
Set the proxy_timeout parameter to 10 minutes in the nginx.conf configuration file, which should be more than long enough to prevent cutting SSL connections before te requests time out by themselves. After this fix was applied, one api-server consumed only 100% of CPU (about 2% of total node compute performance capacity), while the second one consumed about 200% (about 4% of total node compute performance capacity) of CPU (with average response time 200-400 ms).
Upstream issue status: fixed
Make the Kargo deployment tool set proxy_timeout to 10 minutes: issue fixed with pull request by Fuel CCP team.
KubeDNS cannot handle large cluster load with default settings
Symptoms
When deploying an OpenStack cluster on this scale, kubedns becomes unresponsive because of the huge load. This end up with a slew of errors appearing in the logs of the dnsmasq container in the kubedns pod:
Maximum number of concurrent DNS queries reached.
Also, dnsmasq containers sometimes get restarted due to hitting the high memory limit.
Root cause
First of all, kubedns often seems to fail often in this architecture, even even without load. During the experiment we observed continuous kubedns container restarts even on an empty (but large enough) Kubernetes cluster. Restarts are caused by liveness check failing, although nothing notable is observed in any logs.
Second, dnsmasq should have taken the load off kubedns, but it needs some tuning to behave as expected (or, frankly, at all) for large loads.
Solution
Fixing this problem requires several levels of steps:

Set higher limits for dnsmasq containers: they take on most of the load.
Add more replicas to kubedns replication controller (we decided to stop on 6 replicas, as it solved the observed issue &8211; for bigger clusters it might be needed to increase this number even more).
Increase number of parallel connections dnsmasq should handle (we used &8211;dns-forward-max=1000 which is recommended parameter setup in dnsmasq manuals)
Increase size of cache in dnsmasq: it has hard limit of 10000 cache entries which seems to be reasonable amount.
Fix kubedns to handle this behaviour in proper way.

Upstream issue status: partially fixed
and 2 are fixed by making them configurable in Kargo by Kubernetes team: issue, pull request.
Others &8211; work has not yet started.
Kubernetes scheduler needs to be deployed on a separate node
Symptoms
During the huge OpenStack cluster deployment against Kubernetes, scheduler, controller-manager and kube-apiserver start fighting for CPU cycles as all of them are under a large load. Scheduler is the most resource-hungry, so we need a way to deploy it separately.
Solution
We moved the Kubernetes scheduler moved to a separate node manually; all other schedulers were manually killed to prevent them from moving to other nodes.
Upstream issue status: reported
Issue in Kargo.
Kubernetes scheduler is ineffective with pod antiaffinity
Symptoms
It takes a significant amount of time for the scheduler to process pods with pod antiaffinity rules specified on them. It is spending about 2-3 seconds on each pod, which makes the time needed to deploy an OpenStack cluster of 900 nodes unexpectedly long (about 3h for just scheduling). OpenStack deployment requires the use of antiaffinity rules to prevent several OpenStack compute nodes from being launched on a single Kubernetes minion node.
Root cause
According to profiling results, most of the time is spent on creating new Selectors to match existing pods against, which triggers the validation step. Basically we have O(N^2) unnecessary validation steps (where N = the number of pods), even if we have just 5 deployment entities scheduled to most of the nodes.
Solution
In this case, we needed a specific optimization that speeds up scheduling time up to about 300 ms/pod. It’s still slow in terms of common sense (about 30m spent just on pods scheduling for a 900 node OpenStack cluster), but it is at least close to reasonable. This solution lowers the number of very expensive operations to O(N), which is better, but still depends on the number of pods instead of deployments, so there is space for future improvement.
Upstream issue status: fixed
The optimization was merged into master (pull request) and backported to the 1.5 branch, and is part of the 1.5.2 release (pull request).
kube-apiserver has low default rate limit
Symptoms
Different services start receiving “429 Rate Limit Exceeded” HTTP errors, even though kube-apiservers can take more load. This problem was discovered through a scheduler bug (see below).
Solution
Raise the rate limit for the kube-apiserver process via the &8211;max-requests-inflight option. It defaults to 400, but in our case it became workable at 2000. This number should be configurable in the Kargo deployment tool, as bigger deployments might require an even bigger increase.
Upstream issue status: reported
Issue in Kargo.
Kubernetes scheduler can schedule incorrectly
Symptoms
When creating a huge amount of pods (~4500 in our case) and faced with HTTP 429 errors from kube-apiserver (see above), the scheduler can schedule several pods of the same deployment on one node, in violation of the pod antiaffinity rule on them.
Root cause
See pull request below.
Upstream issue status: pull request
Fix from Mirantis team: pull request (merged, part of Kubernetes 1.6 release).
Docker sometimes becomes unresponsive
Symptoms
The Docker process sometimes hangs on several nodes, which results in timeouts in the kubelet logs. When this happens, pods cannot be spawned or terminated successfully on the affected minion node. Although many similar issues have been fixed in Docker since 1.11, we are still observing these symptoms.
Workaround
The Docker daemon logs do not contain any notable information, so we had to restart the docker service on the affected node. (During the experiments we used Docker 1.12.3, but we have observed similar symptoms in 1.13 release candidates as well.)
OpenStack services don’t handle PXC pseudo-deadlocks
Symptoms
When run in parallel, create operations of lots of resources were failing with DBError saying that Percona Xtradb Cluster identified a deadlock and the transaction should be restarted.
Root cause
oslo.db is responsible for wrapping errors received from the DB into proper classes so that services can restart transactions if similar errors occur, but it didn’t expect the error in the format that is being sent by Percona. After we fixed this, however, we still experienced similar errors, because not all transactions that could be restarted were properly decorated in Nova code.
Upstream issue status: fixed
The bug has been fixed by Roman Podolyaka’s CR and backported to Newton. It fixes Percona deadlock error detection, but there’s at least one place in Nova that still needs to be fixed.
Live migration failed with live_migration_uri configuration
Symptoms
With the live_migration_uri configuration, live migrations fails because one compute host can’t connect to a libvirt on another host.
Root cause
We can’t specify which IP address to use in the live_migration_uri template, so it was trying to use the address from the first interface that happened to be in the PXE network, while libvirt listens on the private network. We couldn’t use the live_migration_inbound_addr, which would solve this problem, because of a problem in upstream Nova.
Upstream issue status: fixed
A bug in Nova has been fixed and backported to Newton. We switched to using live_migration_inbound_addr after that.
The post We installed an OpenStack cluster with close to 1000 nodes on Kubernetes. Here&8217;s what we found out. appeared first on Mirantis | Pure Play Open Cloud.
Quelle: Mirantis

Red Hat Summit 2017 – Planning your OpenStack labs

This year in Boston, MA you can attend the Red Hat Summit 2017, the event to get your updates on open source technologies and meet with all the experts you follow throughout the year.
It&;s taking place from May 2-4 and is full of interesting sessions, keynotes, and labs.
This year I was part of the process of selecting the labs you are going to experience at Red Hat Summit and wanted to share here some to help you plan your OpenStack labs experience. These labs are for you to spend time with the experts who will teach you hands-on how to get the most out of your Red Hat OpenStack product.
Each lab is a 2-hour session, so planning is essential to getting the most out of your days at Red Hat Summit.
As you might be struggling to find and plan your sessions together with some lab time, here is an overview of the labs you can find in the session catalog for exact room and times. Each entry includes the lab number, title, abstract, instructors and is linked to the session catalog entry:

L103175 &; Deploy Ceph Rados Gateway as a replacement for OpenStack Swift
Come learn about these new features in Red Hat OpenStack Platform 10: There is now full support for Ceph Rados Gateway, and &;composable roles&; let administrators deploy services in a much more flexible way. Ceph capabilities are no longer limited to block only. With a REST object API, you are now able to store and consume your data through a RESTful interface, just like Amazon S3 and OpenStack Swift. Ceph Rados Gateway has a 99.9% API compliance with Amazon S3, and it can communicate with the Swift API. In this lab, you&8217;ll tackle the REST object API use case, and to get the most of your Ceph cluster, you&8217;ll learn how to use Red Hat OpenStack Platform director to deploy Red Hat OpenStack Platform with dedicated Rados Gateways nodes.
Instructors: Sebastien Han, Gregory Charot, Cyril Lopez
 
L104387 &8211; Hands on for the first time with Red Hat OpenStack Platform
In this lab, an instructor will lead you in configuring and running core OpenStack services in a Red Hat OpenStack Platform environment. We&8217;ll also cover authentication, compute, networking, and storage. If you&8217;re new to Red Hat OpenStack Platform, this session is for you.
Instructors: Rhys Oxenham, Jacob Liberman, Guil Barros
 
L102852 &8211; Hands on with Red Hat OpenStack Platform director
Red Hat OpenStack Platform director is a tool set for installing and managing Infrastructure-as-a-Service (IaaS) clouds. In this two-hour instructor-led lab, you will deploy and configure a Red Hat OpenStack Platform cloud using OpenStack Platform director. This will be a self-paced, hands-on lab, and it&8217;ll include both the command line and graphical user interfaces. You&8217;ll also learn, in an interactive session, about the architecture and approach of Red Hat OpenStack Platform director.
Instructors: Rhys Oxenham, Jacob Liberman
 
L104665 &8211; The Ceph power show—hands on with Ceph
Join our Ceph architects and experts for this guided, hands-on lab with Red Hat Ceph Storage. You&8217;ll get an expert introduction to Ceph concepts and features, followed by a series of live interactive modules to gain some experience. This lab is perfect for users of all skills, from beginners to experienced users who want to explore advanced features of OpenStack storage. You&8217;ll get some credits to the Red Hat Ceph Storage Test Drive portal that can be used later to learn and evaluate Red Hat Ceph Storage and Red Hat Gluster Storage. You&8217;ll leave this session having a better understanding of Ceph architecture and concepts, with experience on Red Hat Ceph Storage, and the confidence to install, set up, and provision Ceph in your own environment.
Instructors: Karan Singh, Kyle Bader, Daniel Messer
As you can see, there is plenty of OpenStack in these hands-on labs to get you through the week and hope to welcome you to one or more of the labs!
Quelle: RedHat Stack

Intelligent NFV performance with OpenContrail

The post Intelligent NFV performance with OpenContrail appeared first on Mirantis | Pure Play Open Cloud.
The private cloud market has changed in the past year, and our customers are no longer interested in just getting an amazing tool for installing OpenStack; instead, they are looking more at use cases. Because we see a lot of interest in NFV cloud use cases, Mirantis includes OpenContrail as the default SDN for its new Mirantis Cloud Platform. In fact, NFV has become a mantra for most service providers, and because Mirantis is a key player in this market, we work on a lot of testing and performance validation.
The most common value for performance comparison between solutions is bandwidth, which shows how much capacity a network connection has for supporting data transfer, as measured in bits per second. In this domain, the OpenContrail vRouter can reach near line speed (about 90%, in fact). However, performance also depends on other factors, such as latency, or packets-per-second (pps), which are as important as bandwidth. Packets per second rate is a key factor for VNF (firewalls, routers, etc.) instances running on top of NFV clouds. In this article, we&;ll compare PPS rate for different OpenContrail setups so you can decide what will work best for your specific use case.
The simplest way to test PPS rate is to run a VM to VM test. We will provide a short overview of OpenContrail low-level techniques for NFV infrastructure, and perform a comparative analysis of different approaches using simple PPS benchmarking. To make testing fair, we will use only a 10GbE physical interface, and will limit resource consumption for data plane acceleration technologies, making the environment identical for all approaches.
OpenContrail vRouter modes
For different use cases, Mirantis supports several ways of running the OpenContrail vRouter as part of Mirantis Cloud Platform 1.0 (MCP). Let&8217;s look at each of them before we go ahead and take measurements.
Kernel vRouter
OpenContrail has a module called vRouter that performs data forwarding in the kernel. The vRouter module is an alternative to Linux bridge or Open vSwitch (OVS) in the kernel, and one of its functionalities is encapsulating packets sent to the overlay network and decapsulating packets received from the overlay network. A simplified schematic of VM to VM connectivity for 2 compute nodes can be found in Figure 1:

Figure 1: A simplified schematic of VM to VM connectivity for 2 compute nodes
The problem with a kernel module is that packets-per-second is limited by various factors, such as memory copies, the number of VM exits, and the overhead of processing interrupts. Therefore vRouter can be integrated with the Intel DPDK to optimize PPS performance.
DPDK vRouter
Intel DPDK is an open source set of libraries and drivers that perform fast packet processing by enabling drivers to obtain direct control of the NIC address space and map packets directly into an application. The polling model of NIC drivers helps to avoid the overhead of interrupts from the NIC. To integrate with DPDK, the vRouter can now run in a user process instead of a kernel module. This process links with the DPDK libraries and communicates with the vrouter host agent, which runs as a separate process. The schematic for a simplified overview of vRouter-DPDK based nodes is shown in Figure 2:

Figure 2: The schematic for a simplified overview of vRouter-DPDK based nodes
vRouter-DPDK uses user-space packet processing and CPU affinity to dedicate poll mode drivers being served by a particular CPU. This approach enables packets to be processed in user-space during the complete life time &; from physical NIC to vhost-user port.
Netronome Agilio Solution
Software and hardware components distributed by Netronome provide an OpenContrail-based platform to perform high-speed packet processing. It’s a scalable, easy to operate solution that includes all server-side networking features, such as overlay networking based on MPLS over UDP/GRE and VXLAN. The Agilio SmartNIC solution supports DPDK, SR-IOV and Express Virtio (XVIO) for data plane acceleration while running the OpenContrail control plane. Wide integration with OpenStack enables you to run VMs with Virtio devices or SR-IOV Passthrough vNICs, as in Figure 3:

Figure 3:  OpenContrail network schematic based on Netronome Agilio SmartNICs and software
A key feature of the Netronome Agilio solution is deep integration with OpenContrail and offloading of lookups and actions for vRouter tables.
Compute nodes based on Agilio SmartNICs and software can work in an OpenStack cluster based on OpenContrail without changes to orchestration. That means it’s scale-independent and can be plugged into existing OpenContrail environments with zero downtime.
Mirantis Cloud Platform can be used as an easy and fast delivery tool to set up Netronome Agilio-based compute nodes and provide orchestration and analysis of the cluster environment. Using Agilio and MCP, it is easily to setup a high-performance cluster with a ready-to-use NFV infrastructure.
Testing scenario
To make the test fair and clear, we will use an OpenStack cluster with two compute nodes. Each node will have a 10GbE NIC for the tenant network.
As we mentioned before, the simplest way to test the PPS rate is to run a VM to VM test. Each VM will have 2 Virtio interfaces to receive and transmit packets, 4 vCPU cores, 4096 MB of RAM and will run Pktgen-DPDK inside to generate and receive a high rate of traffic. For each VM a single Virtio interface will be used for generation, and another interface will be used for receiving incoming traffic from the other VM.
To make an analytic comparison of all technologies, we will not use more than 2 cores for the data plane acceleration engines. The results of the RX PPS rate for all VMs will be considered as a result for the VM to VM test.
First of all, we will try to measure kernel vRouter VM to VM performance. Nodes will be connected with Intel 82599 NICs. The following results were achieved for a UDP traffic performance test:
As you can see, the kernel vRouter is not suitable for providing a high packet per second rate, mostly because the interrupt-based model can’t handle a high rate of packets per second. With 64 byte packets we can only achieve 3% of line rate.
For the DPDK-based vRouter, we achieved the following results:

Based on these results, the DPDK based solution is better at handling high-rated traffic based on small UDP packets.
Lastly, we tested the Netronome Agilio SmartNIC-based compute nodes:

With only 2 forwarder cores, we are able to achieve line-rate speed on Netronome Agilio CX 10GbE SmartNICs on all size of packets.
You can also see a demonstration of the Netronome Agilio Solution here.
Since we have achieved line-rate speed on the 10GbE interface using Netronome Agilio SmartNICs we wanted to have the maximum possible PPS rate based on 2 CPUs. To determine the maximum performance result for this deployment, we will upgrade existing nodes with Netronome Agilio CX 40GbE SmartNIC and repeat the maximum PPS scenario one more time. We will use direct wire connection between 40GbE ports and will set up 64-bytes UDP traffic. Even with hard resources limitations, we achieved:

Rate
Packet size, Bytes

Netronome Agilio Agilio CX 40GbE SmartNIC
19.9 Mpps
64

What we learned
Taking all of the results together, we can see a pattern:

Based on 64 byte UDP traffic, we can also see where each solution stands compared to 10GbE line rate:

Rate
% of line rate

Netronome Agilio
14.9 Mpps
100

vRouter DPDK
4.0 Mpps
26

Kernel vRouter
0.56 Mpps
3

OpenContrail remains the best production-ready SDN solution for OpenStack clusters, but to provide NFV-related infrastructure, OpenContrail can be used in different ways:

The Kernel vRouter, based on interrupt model packet processing, works, but does not satisfy the high PPS rate requirement.
The DPDK-based vRouter significantly improves the PPS rate, but due to high resource consumption and because of defined limitations, it can’t achieve the required performance. We also can assume that using a modern DPDK library will improve performance and optimise resource consumption.
The Netronome Agilio SmartNIC solution significantly improves OpenContrail SDN performance, focusing on saving host resources and providing a stable high-performance infrastructure.

With Mirantis Cloud Platform tooling, it is possible to provision, orchestrate and destroy high performance clusters with various networking features, making networking intelligent and agile.
The post Intelligent NFV performance with OpenContrail appeared first on Mirantis | Pure Play Open Cloud.
Quelle: Mirantis

You Can Play Ms. Pac-Man In Google Maps

April Fool&;s came a day early.

Google Maps just released a Ms. Pac-Man game for April Fool’s Day.

Google Maps just released a Ms. Pac-Man game for April Fool's Day.

To play, make sure your app is updated, then open it and hit the Ms. Pac-Man button on the side.

To play, make sure your app is updated, then open it and hit the Ms. Pac-Man button on the side.

Then, just run away from the ghost thingees while chomping up little balls. You know, play Ms. Pac-Man —but in Google Maps. Enjoy.

Then, just run away from the ghost thingees while chomping up little balls. You know, play Ms. Pac-Man —but in Google Maps. Enjoy.

media.giphy.com


View Entire List ›

Quelle: <a href="You Can Play Ms. Pac-Man In Google Maps“>BuzzFeed

Momentum mounts for Kubernetes, cloud native

For any new technology, there are few attributes more valuable then momentum. In the open tech space, few projects have as much momentum as Kubernetes and cloud native application development.
The Cloud Native Computing Foundation (CNCF) kicked off the European leg of its biannual CloudNativeCon/ event in Berlin by welcoming five new member organizations and two new projects.
CNCF has pulled in rkt and as its eighth and ninth open projects, joining Kubernetes, Fluentd, Linkerd, Prometheus, OpenTracing, gRPC and CoreDNS,
IBM senior technical staff member Phil Estes is one of the open source maintainers for containerd. He explained a bit about the project and the role of IBM in the video below:

This week, containerd joined the @CloudNativeFdn. @estesp explains what it means for the community. Details: https://t.co/AQigsrXzqY pic.twitter.com/oC9XAOjO9D
— IBM Cloud (@IBMcloud) March 30, 2017

Meanwhile, CNCF announced that SUSE, HarmonyCloud, QAware, Solinea and TenxCloud have joined as contributing member organizations.
&;The cloud native movement is increasingly spreading to all parts of the world,&; CNCF executive director Dan Kohn told a sellout crowd of 1,500. That number tripled from CloudNativeCon in London a year prior.
We reported last fall that Kubernetes adoption was on the cusp of catching a giant wave. That wave has evolved into a groundswell among developers. There are now 4,000 projects based on Kubernetes, more than 50 products supporting it and more than 200 meetups around the world.
Even more significant has been the IBM announcement in March that Kubernetes is available on IBM Bluemix Container Service.
Linux Foundation Vice President Chris Aniszczyk and IBM Fellow, VP and Cloud Platform CTO Jason McGee discussed the move by IBM to Kube (and much more) on a podcast recoded from the venue. You can listen to it here:

A few more highlights from Berlin:
• 17-year-old Lucas Käldström, the youngest core Kubernetes maintainer, wowed the crowd with his talk on autoscaling a multi-platform Kubernetes cluster built with kubeadm.

Listening to Lucas talk about multi-architecture cluster support for containers/k8s. Oh, he&;s in high school too! pic.twitter.com/V8G3qAylzz
— Phil Estes (@estesp) March 30, 2017

• Docker’s Justin Cormack delivered one of the conference’s most popular sessions with his talk on containerd:

Now @justincormack from @Docker talking containerd in SRO room @CloudNativeFdn Kubecon Berlin. Hey @chanezon open a window, it&8217;s hot! pic.twitter.com/SlVHCyTwH6
— Jeffrey Borek (@jeffborek) March 30, 2017

• An update on the Open Container Initiative from Jeff Borek (IBM), Chris Aniszczyk (Linux Foundation), Vincent Batts (Red Hat) and Brandon Philips (CoreOS)

An update on @OCI_ORG and container standards from @Cra, @JeffBorek, @vbatts, @sauryadas_ & @BrandonPhilips. … https://t.co/MqqBKxwjBU
— Kevin J. Allen (@KevJosephAllen) March 29, 2017

More information about Bluemix.
The post Momentum mounts for Kubernetes, cloud native appeared first on news.
Quelle: Thoughts on Cloud