Devops Mastery

Brian Wagner, Jason Didonato

0.0 (0)
Technology

This podcast is all about doing the DevOps thing. We are here to help you get from a DevOps newbie to being a DevOps Master.

01/10/2014

Episode 22 - 5 tips to taking the fear out of automation projects

A lot of people new to automation are very nervous about letting a script or program like Chef change a production system. That fear can really slow the advance of a DevOps movement. But ignoring those people's fears is not the answer. So here are some tips and tricks to get everyone, including yourself, to be more confident about writing world changing scripts. Whenever possible create virtual environments. The use of virtual machines or cloud servers gives you considerably more options and abilities than bare metal hardware. Simply the ability to snapshot a server and revert back to it in minutes is a game changing concept. It let's you run any automation from beginning to end virtually without fear. I say virtually only because you occasionally run scripts that affect systems that cannot easily be virtualized. This is now more the exception than the rule. If you take a snapshot of the server, run the automation, find a mistake/error/bug you simply revert the server back to the starting point, fix the problem and try again. The only thing you have lost is the time it takes to run the script. This might be a lot of time but still less than having to rebuild the environment between each test run. If you have the luxury of multiple environments to use you can and should follow the same procedure used in the development of the script for future deployments. So you just take the script that worked in Dev and run it in Test after taking a snapshot there. Then rinse and repeat what you did all the way to production. Test it over and over and over again. It sounds crazy but I now run the completely new scripts I am ready to bless for migration to the next environment multiple times in each environment. This gives me the confidence that they really are working. I want to know I didn't do something weird that made the script work when I fixed the bug. Once the script runs cleanly multiple times I am all ready to go and I move it to the next tier. Ask someone to do a peer review of the script. This is just a good practice in general. When possible I suggest walking the person through the script and letting them ask questions the first time through. This has often turned up bugs and other issues all on it's own. Having to explain something uses a different part of your brain and opens your mind up to see the problems or possible problems. Ask someone else to run the scripts for you. I normally do this in a Dev/Test environment. I do this for two reasons 1) Validate my documentation 2) validate the procedure I am doing. Often my teammates have asked questions that point out things I forgot to account for or code in. They also tend to be less stressed than if they are deploying to a production environment. Doing this earlier is better. Repeating it when you have to make a change makes you a better programer/scripter. Your teammate finding that one thing that would have driven you nuts during a deploy is, in a word, priceless. Communicate what it is you are trying to automate. Be as specific as possible in the communications. Making everyone involved aware of what it is you are doing goes a long way towards building their trust. It is the most basic way to be inclusive in your process. Do your best not to make it sound like you are asking for permission to do the automation. You are doing this to make the environment better and no one should ever question that. This doesn't mean that you shouldn't be open to feedback. You should actually be ready for it because it will happen. While everyone in any given IT organization may not be expert communicators you need to take all feedback without feeling persecuted. Remember this is just as scary for them as it is for you at the beginning. So they may need time to adjust and gain confidence also. The worst thing you can do is start or continue a flame war about your automation process. Respond respectfully and with clarifications about things like how you are planning to test, why you chose to do it the way you have, and that it is an always evolving process. What you start out with in the first round of automating anything will almost never be what it is by it's Nth revision. Finally, don't let the communications process stop or stall your automation of anything completely. I normally do this by focusing on the fact that this is just one step in an every progressing path of enhancements to your current processes. So to summarize start using virtualization, test and retest, do peer reviews, have someone validate your scripts and documentation, and communicate early and often. With each piece of automation that follows you will refine how these things need to happen in your organization to get the highest level of trust. Finally remember that what works for one application or service will almost always need to be tweaked for the next one's specific needs.

18 min
14/09/2014

Episode 21 - You did know there is a script for that?

Bob from Minneapolis sent us some great feedback and a couple of great questions. In this post we are going to tackle communicating what scripts are available and what they do. I spend a lot of my time as a consultant writing up documentation. At most sites I am pretty sure the work is lost before I get out the door on my last day. So how did I try to handle this before as a Tech Lead? To be completely honest it always seems to be hit or miss. I have never found a single solution that works for all purposes and with all types of people. Let's face it we as WetWare are the most difficult to communicate with. Hardware wants power and bits to process. Software wants data and other inputs. But humans, a.k.a WetWare, all want something different. There are all kinds of factors from Gender to skill level to happiness with their current job that can affect and require different types of communications. Since we can't solve all of these problems, and at least one is unsolvable, we need to focus on what we can. Let's start with the one thing I know doesn't work and that is network file shares. It's been my experience that network file shares full of Word or Text documents are where good documentation goes to die. There is no automatic version tracking, you can never find a structure everyone will agree on, and unless you ingest it into a database somehow it's basically unsearchable. Since no one can ever find the document they are looking for or be sure it's the most current it quickly becomes a thing we do because Management says we have to not because we find it at all useful. Wiki's can work well and solve at least a few of these issues but they still aren't a cure all. They tend to be search-able and track versions. They still have the structure issue though. If Google has taught us nothing it is that we don't have to care how things are stored in the great database that is the Internet as long as it can be searched we can find anything. With that being said the structure is probably less of an issue with a Wiki because they are generally speaking searchable. So it's always going to stay on my list. There are so many options from your own version of the Wikipedia to all javascript single page solutions. It's still not perfect but it works better than a file share and generally looks better too. Version Control systems like Git believe it or not when combined with something like gitlabs, gitblit, or github(public or private) can also work as a suitable solution. It keeps the documentation with the code and is search-able. Since the code and documentation are kept together it works a lot better for scripts and programs than it does for something like the procedure to shutdown your whole data center. It doesn't mean it can't or shouldn't use it just that it may not work great for all use cases. So again we are close but still not a home run. Knowledge-base/Content Management Systems/anything like it all work to varying degrees. This concept is really more implementation and software dependent. For the most part the difference between these and a Wiki is generally the flexibility to document changes to the documents and who made them. These systems tend to be very rigid and generally require defining a taxonomy to make them work efficiently. There is nothing wrong with defining a taxonomy but it normally provides very little reward for the time and frustration put into them to get everyone on the same page. They generally just become difficult to manage over time and then grow less useful as people stop putting the effort in. What seems to work in a lot of environments is a blending of the Wiki and Version Control(GIT) concepts. Yes your documentation ends up somewhat scattered but if everyone knows where to look for which kind of document then it works pretty well. I generally suggest putting things like procedure and policy items go in a Wiki so they are accessible to everyone. Then for things that are focused on an applications whether purchased or internally developed being kept in a Version Control system with the code or binaries. This keeps the specifics of applications which tend to change more often with the related applications. The wiki can then be used for the things that should be more accessible and slower changing like the procedures for deploying that application. At the end of the day the best system is the one that wins the popularity contest in your organization or team. If the whole team hates wiki's then don't use them. At the same time if it's just the one user who doesn't want to use a wiki then maybe it's time to talk to them about leaving your Company. So go now and document something and put it where you think everyone should find it. Then ask everyone to send you an e-mail with the first place they looked. If the majority were looking in the right place then you found your location.

19 min
08/09/2014

Episode 20 - Devopsmastery.com - Don't be a tool about tools...

Everyone loves a new toy. New tools help us do our jobs better, faster and more accurately. Which is great when you understand exactly what you want the tool to do for you. How often does that really happen though? As an Enterprise Architect I am asked on a regular basis to help companies deploy tools. The problem I run into the most is they don't completely understand what the tools can do for them. They know it will fix one problem but may not realize that it could be fixing others. Then there is the situation where the competing tool could have solved even more problems and easier. So why don't my large enterprise clients go through a process that prevents this? Every company is different, but sometimes it's a time factor. Other times it's not understanding the problem they need to solve. Each company is unique on what their specific issue is. Usually the core issue is that the selection process didn't have enough parameters or ignored parameters altogether. In a lot of situations and evaluations it's a problem of not communicating with other members of the evaluation team or with other teams. When people love a tool and have their heart set on using it, very often it's hard to get them to hear criticism about it. To avoid those situations they exclude people or groups that may have a different opinion or a different tool they are passionate about. People may also be trying to avoid paralysis through analysis. No matter what the reason the result will be the same. You will end up with a tool that you either never use all of the features or doesn't have enough features. This will require you to get a second tool when an alternative tool that covers both problems exists. The best way I have found to avoid this is too do selections in three phases. The first is to solicit the wish list of features from both the tools target users and the operations teams supporting it. Also ask them for suggestions of tools they like or hates using in the past. The second is to research the tools then start looking for feature lists. Evaluate each against the wishlist and not each other. Try to narrow the set down to two or three at most. The final step is to do a Proof of Concept with your top choices. In the Proof of Concept Phase Install them, set it up in a development environment, and actually make it do something. Ask the Users and support teams if each tool is doing what you think you need it to. Once you have things working then demo each environment. This will show how well, if at all, they are answering the items on the wish list for a small subset of the tools users and operations staff. If the Demo's go well then your choice should be clear. You will have done what you can to make people happy. Just don't expect everyone to love the choice. you can never make everyone happy.

17 min
04/08/2014

Episode 19 - Devopsmastery.com - Six tips for more effectively DevOps communications

If you are really going to master DevOps you need to master communications. At least the ability to clearly motivate and pull people into the discussion and the goal you are trying to achieve. Spending an extra fifteen or twenty minutes writing an E-Mail can save you days and sometimes weeks of discussions, arguments and apologies. I have a lot of tips about how to be more effective in communicating ideas but today I am going to give you my top six. What is the goal of your communications? A goal is essential to forming any effective communications. If you don't really know what you want to achieve with the message how will anyone else. Some common goals for me when I was an operations lead were getting help with a Memory Leak, let everyone know about upcoming patches to the middle ware, or getting someone to help figure out why a build or deploy is failing. Once you know what the goal is you can easily expand on it and form a complete thought and translate it into a communications. What is your motivation for writing the communications? Sounds weird I know. Communicating why you are trying to solve the problem often goes a long way towards getting people's sympathy or at least an understanding about why you care about the problem. If you try to hide this or omit it you run the risk of pushing people away. That will also draw people in. In those cases where you are trying to help someone else it will also draw them into the communications and compel them to read on. So state it clearly but don't go and on about it. I normally try to keep my motivations locked up in one sentence. To much discussion can come off as bragging, whining, or self-righteousness depending on what is being said. What are you trying to motivate others to do? This one sounds super obvious but it is really more about how you say it then what is being asked or said. This is really the sales pitch part of any communications. I know that for a lot of people this is the most difficult part. We all normally hate being sold things so the idea of selling things to others can be difficult. The thing that really good sales people know is that the best way to sell is not to be a pushy sales person. Instead focus on the benefits of the person you are -selling to- asking for help from. Everyone wants to look good doing what they do best. Making people feel important by asking them for help in areas you aren't the best in is also helpful. A little humility will go a long way towards winning people over to your way of working and thinking. Stay away from certain exclusive words and use more inclusive words. I only speak English so I am not sure if this part holds true in all languages. In English though words like I, me, my, us, Department/Group Names, them and you tend to imply an exclusion of the reader or you as the writer. This exclusion while a minor thing really can put people off and make them restive to your ideas for various reasons. Some people will think you are blaming them, others may think you are dumping work on them, and still others just may not know you so they will use these words as an excuse to ignore you. Instead focus on using words or phrases that imply inclusion like we, our, as a team, etc. Depending on the context these words will have different levels of affect. They will always be positive it's just that some situations are just naturally inclusive so the effect of this is just less visible. Leave the door open to Dialog. Often times the communications I receive are worded as statements of fact and not statements open for discussion. They seem to be written by people who have already decided what the best approach is and they want me to rubber stamp it. It's my job as a Solutions Architect and previously as a Team Lead to resist this. If you are talking about my area of expertise and haven't asked me for input you might even piss me off. So while stating your plan may be acceptable make sure you are always asking for feedback in the communications. This could be a requested meeting, a response to the message, or flat out asking what issues the readers see in what you are presenting or proposing. Like a good beer let it ferment. That's right the best thing to do in most cases is just to let the communications sit for a few hours, overnight, or longer if possible. This is especially good for those times when you are really upset by the actions or lack of actions around a problem. If the communication is an E-Mail save it and go take a walk. I will often write an E-mail, save it as a draft, and then go to lunch. Then when I get back open it up read it I will be fresher and calm. Normally I re-write about half of it to incorporate the other items on this list. Even if I wasn't upset when I wrote it I will normally find places where I can make it better. Let's look at an example. As an example here is a less than optimal communication: "Hey Guys, I really need you guys to look at this memory leak. It's effecting the sites performance and the up time stats. When can your team fix it? Thanks, DevOps Master" Something like this would probably work better: "Hey Guys, We are all really starting to see the effects the slow performance and a growing number of outages that appears to be caused by a memory leak from the look of the stats. Let me know when we can get together today to discuss a strategy to identify the real problem and fix it. Thanks, DevOps Master" The second one pulls the reader in, states the problem I need help with, and asks for a meeting to plan the approach. Remember to always ask for what you want like a meeting today. Expect that your requests are not always realistic and be accepting of those responses offering a different solution or meeting time. Never forget that everyone is busy. The less time everyone spends being irritated with how things are written the more time we all have to fix problems. So make a effort today to write something better. Ask others to proof what you write looking for the things in this article. Look for me to post more tips for improving your communications in future articles.

22 min
28/07/2014

Episode 18 - Devops Mastery Choosing what to automate first, second, and so on.

What follows are the criteria I use to determine what to script or automate first, second, and so on. It's another of the "it depends" questions that never have an easy answer. I will try to help you make some rational decisions but ultimately it's your world to live and you need to decide. The thing I see everyone want to do first is automate and that takes the longest to do. That task takes the better part of a week to complete because you are constantly getting pulled in a hundred different directions. This is not the first thing you need to automate or should automate. Not that the duration of the task isn't important but the number of times you have to do it, is more important. I may only spend 2-3 minutes compressing some logs on a development web server. If I am doing it on 10-20 various servers every week it's not only annoying but it's sucking a lot more time because of the context switching it's causing. There is also the cost to your company in the time spent by the users of the system trying to troubleshoot the full disk. So how do I figure out what's sucking up all my time? Do I track everything I do all the time? Heck no I don't. I use a simple spreadsheet to track my work for 3-4 weeks at a time. I repeat this every 3 months or so. Then look at what I spent my time on. I track smaller items with just tick marks and an average time to complete the task. I don't worry about what time of day it is. If you want to note the time it takes you for longer tasks it will give you better data but not necessarily better results for this purpose. Remember the time you are spending on individual tasks isn't as important as the number of times you have to repeat it. Now let's flash forward to 3 weeks from now after you have some data collected and answer the following questions. What are the top 10 issues you resolve on a regular basis? Are any of the tasks on the list affecting mission critical parts of your organization? Do you think you should add them to the list? You will likely want to add them and pay close attention to the next item on these tasks. How many steps are there to automate in the top 20 tasks you do to resolve these issues? (Exact numbers are not needed, just a really good guess.) Your top ten list plus the items affecting Mission critical parts of the organization should be your starting point. To further narrow it down start with the items with the least number of steps. When doing this though make sure none of the steps are "Install the OS" or "Install Oracle". Those task steps are not really one step but multi-step tasks on their own. Now that you have your list ask someone else for input. Be sure to provide them with the complete list. I have often described myself and co-workers as being on a mission when they start automating. This is great, shows focus and determination. It also can in some cases cause mission blindness. You want so badly to automate that really hard thing that you miss the quick wins lying at your feet. Asking someone else to look at your lists will hopefully help you keep from being blind. Take your own time if you need too complete the list. As much as the company will benefit from your automation efforts, your work life balance will be the most improved. So spending a little of your time up front will give you more of your back time later. If you can reduce a series of tasks you do daily that takes 30 minutes into a single script that runs in 30 seconds over the course of a month you will have 9 more hours to do new scripts. You will also gain from the lack of typos the script will make on Monday mornings. If you can take that same script and have it run as a scheduled tasks (think cron or at commands), then you can get back the whole 10 hours to put back into scripting the next thing. Remember the goal of automation is to free you from the mundane tasks so that you can work on the more complex and proactive ones. You will never stop feeling like you are spinning your wheels if you can't get out of the daily repetitive mud. Automation is the way to push yourself back out onto the proactive route you have been looking to reach.

19 min
18/07/2014

Episode 17 - Automated Build and Deploy DevOps Tools - Devops Mastery

Automated Deployment or Automated Build tools are all complicated to give a true evaluation. Before you even try to evaluate the tools, working through the complete process manually is normally the best place to start. You need to understand how the deployment process works before you can figure out what tool will meet your needs. Things to take note of is the tests you will need to run, the operating systems involved, and how you plan to do the deployments to production. You are trying to achieve consistency across all your environments so deploying a tool designed for web applications may not have the features you need to deploy smart phone apps. This isn't to say that most of the tools can't do both just that you need to probe deeply to make sure all your requirements are met. Below is my top list of things to look at when choosing a tool of this class. As with all the other articles on tools I have written this list probably isn't complete. You will need to assure that you are meeting all of the "it depends" requirements for your environment. Taking a disciplined approach in choosing the tool can save you months of time trying to configure it for your companies needs. *Do you need to host the tool yourself or can you use one of the many PaaS options available? Tools like TravisCI are wonderful tools which will let you scale your costs of the tool and infrastructure as your company grows. However, if you are an established company with a lot of existing applications it may be cost prohibitive. So be sure to look at the cost of the solution before committing to the cloud. How are you going to test your applications or infrastructure as part of this process? Does the tool support all of your testing tools? Or do they support it? Some testing tools are better than others at inter-operating with this class of tools than others. In some cases, PaaS options most notably, may not be an option because you can't connect to one or more pieces of infrastructure in your test environment. * Does the tool have hooks into your version control system that will adequately support your needs? This is less and less of a problem with most systems but it should be validated before a final decision is made. If you use SVN and the tool only supports GIT it's going to knock it out completely. Also where does the code have to be stored? If it only supports GitHub and Bitbucket but you store your code in a local gitlab server it is not going to work for you. * How much and what kinds of work flow do you need? Some of these tools are only designed to do the most basic of work flows like push the code to the next environment. Others can do what would be considered full orchestration like building complete systems from scratch. From a DevOps perspective not being able to do full orchestration is an issue. We want the system to never need human interaction. So only doing things half way isn't helping us to reach the goal. Luckily most of the systems while they may not directly support full orchestration can call on other tools like Chef via Build/Deploy scripts to accomplish the task. At the same time if you are only going to be developing desktop and smart phone apps a manual process may be your only option. Not all distribution mechanisms have API's and ways to automatically deploy them. While these are becoming more rare every day it's something to keep in mind. What kinds of reporting do you need it to do? This is normally a rather broad set of requirements. Generally they will create a web report of the progress of the current builds, last successful build, and last failed builds. What is contained in those reports can vary far more than you might expect. So be sure to pay attention to what the sample reports show you. Be sure to discuss this with everyone to make sure that all of the teams needs are being met. How much does the tool do out of the box versus needs to be developed to handle? Most of the tools rely on at least some development to make the system meet all of your needs. This is generally in the form of a wrapper shell script of some type. Some tools however may require you write in things like Java, Python or Ruby. So make sure you understand what is included and supported versus capable but needs development. Once you have narrowed your choices do a pilot with your top one or two choices. Piloting your top choices with an app or two will help you find it's rough spots. What sounds great on a website could be more marketing hype than reality. While I don't think any of the vendors or projects are trying to deceive us they just may not understand that having to learn Java, Python or Ruby may be a deal breaker for you. The setup on these tools can be time consuming. So allow for that time in your estimates. Your pilot will help you understand most of the full complexities the setup and configuration. Do you need Role based access control? In some companies only certain people are allowed to push buttons to move to new environments. This could be for any number of reasons. One example is that production deployments may only be allowed after all the sign-offs have been given and the change manager blesses the build. Not all of the tools support this feature. To get you started here are three examples I have used in the past: TravisCI - A cloud based CI suite which gives it's services away for free to FLOSS projects. Jenkins - An open source solution with tons of plugins and possibilities if your not afraid of a little scripting. UrbanCode's Anthill Pro - A closed source and not free tool recently acquired by IBM. This has a lot of out of the box integrations and a fully featured work flow engine that includes Role Based Access Control.

20 min
10/07/2014

Devops Mastery - Episode 16 - Code Management DevOps Tools

Before we begin this discussion remember this one thing while you read this; what you use to manage code is less important than making sure you manage it. Management in this context means version tracking, an ability to roll back changes in code, and being able to compare two versions of the code in question. The time people spend trying to track which is the latest version of a backed up file far exceeds the time it takes to learn any repository management tool. Wasting time arguing the merits of the tools is just that a waste of time. While I am not a fan of certain tools in this class they all can handle the basics I have outlined above. I believe so strongly in this that I am not going to share my personal choice or make a recommendation in this article. This tool set has been the domain of developers in most organizations since the dawn of computing. With the focus on software defined infrastructure operation teams need to use these tools. They need to become as adept at using repository management tools as they are with configuring an interface or building out a database. The older systems all worked and did the basics with varying degrees of ease and success. More modern tools like the FLOSS worlds favorite GIT try to advance the tools. That doesn't mean that GIT doesn't have it's issues or drawbacks. These tools generally fall into two sets. The first is a centrally managed system where a server is required to do the basic commands like checking in code. The second is a distributed system which allows everyone to manage locally without a server. This second type then allows you to push the local copy up to a central server for distribution to others. The modern approach and trend is towards distributed systems. They give the people writing the code the most control and flexibility over what they are working on by not requiring a network connection to the central server. I can be completely disconnected from any network and still get all the benefits of the code management system. If I write a new piece of code on a flight from Cleveland to Dallas I can test it locally and then check it in. Then when I get a connection back to the central server I can pass the code off to other people and complete integration testing and other tasks. So how do you decide which one to use? In most cases people in a company will have experience with a tool set. It may sound like a cope out but using the one the majority of people are familiar with in this case is the best place to start. Other things to consider are: * Cost of the tool - Free, cost per user, cost per X floating/concurrent users * Operation system requirements of the tool - Is enough of the functionality available to all the OS platforms in your company * Consistent functionality across operating systems * Integration with other tools in the environment * Compatibility with other tools - Support or Plug-in availability with development, Continuous Integration/Deployment systems and testing suites * How large a learning curve is there for your entire team? How many different tools in this class should you choose? Normally a company standard is chosen. This does not always work. Most notably if your company is split between Windows and Linux development. Tools like Microsoft's Team Foundation Server(TFS) work great for .net developers. Linux developers are left with less than optimal tool sets to interact with the Server. Microsoft and other tool providers have recognized this issue and implemented ways to use different tools in a single management interface. In Microsoft's case, this gives people the option to use native TFS and GIT for instance. This may not work for everyone. So having more than one option may make sense. Going beyond two options normally costs exponentially more from an on-boarding and employee training perspective. So one is optimal but two is tolerable. Who should own this class of tools? Everyone in IT. Ownership is different than management. Owners have a vested interest in the health of the system they own. In this case both Developers, Testers, and Operations Teams can be affected by the choices and processes used to do things like upgrades and maintenance. So everyone should be represented when decisions about the tool(s) are made. Should you only keep code in the repositories? Because repositories don't have to be limited to code they make it easy to create work flow procedures based on them. Often it makes a lot of sense to keep even the requirements documentation in a repository so that everyone can see the history of what and when they changed. Most of these tools have a method for kicking off other processes when a change has been made to some or all of the repository. These are called hooks and can do everything from kicking off Continuous Integration/Deployment work flows to sending a message to the next group in the process work flow for review. Remember the tool you choose is less important than that you use one. Discussions about these tools often come with a lot of passion. So take the time to talk through the options and listen to everyone before making a final decision. I would also recommend that larger companies choose a small team of people who are passionate about this class of tools to represent the larger groups. It will help to manage the process and hopefully build consensus about the final decision. The people on the team should also agree to be advocates of the chosen tool(s) to the larger group.

18 min
12/06/2014

Devops Mastery - Episode 15 Monitoring DevOps Tools

If you think there are a lot of tools to configure your systems you haven't looked at the tools available to monitor your stuff. The set is so large that it is easy to get overwhelmed. So again in this article I am going to give you a list that I use to narrow the field. Then I am going to give you a list of my favorites. Is it agent, agentless, or hybrid? As with most configuration management tools this question cuts both ways. The best in this class of tools with agents have well documented deployment paths that use various configuration management tools. For instance, they will have Chef or Puppet packages that cut down your time to deploy them tremendously. The choice on this question is how much time you have to deploy it and how fast a response do you need from the tool. Agent based tools are faster in most cases. Agentless tools rely on some form of remote execution tool like ssh or remote powershell and an SNMP(Simple Network Monitoring Protocol) agent. Because the server in an agentless system has to do all the work polling they tend to have more complex to scale. They also can require more risks to take because you have to allow more ports through your firewall. Hybrids allow you to deploy the tool in different ways dependent on the security requirements. So for medium and large companies they tend be a better choice. How does reporting work? This is what you need the tool to do so paying attention to it is critical. The tools vary widely with the number and type of standard reports they have. They also vary on how easy it is to do custom reports. In more and more cases monitoring tools are pared with a reporting system to handle this issue. Writing custom reports can be as simple as a gui interface. They can be as difficult as a DSL(Domain Specific Language) or traditional programming language to create the reports. If your business needs reports from your systems be sure to confirm that you can create the reports that your business needs to meet it's needs. For instance, can you easily get a report to tell you how many people failed to sign up for your mailing list? Can you tell where people are stopping or failing to complete an action. Does it do alerting and if so how can you be alerted? Alerting sounds like a no brainer but not all tools do it. Some tools are just built to display a set of stats for people to analyze. Which means they are normally easier to deploy and configure. At the other end of the scale are tools that will try to predict failures and alert you before the problem happens. This sounds great, and is cool, but you need to know a lot about your environment so that you can set the boundaries around good and bad events. That means that it will require a lot of time to tune properly to remove false positives. Also can an alert trigger an action? If it can then you can automate simple things and free a human to sleep or do something more productive. If it can tell you that the disk is filling up can it do the steps your team would do to free up space? This will help a lot with the work life balance. What dashboards are available out of the box? Can you customize them simply? Most of the tools come with a set of what are called canned dashboards. When you are starting off with monitoring tools it's best to choose a tool with as more than less as long as they tell you something. If the tool has a great dashboard for monitoring Java Applications but your company writes it's apps in Ruby then what good is it to you? All of them will let you customize these dashboards. You can roll up the stats so you can show the entire set of a stat in an environment(Development, Test, Production) in one chart. Be careful though as Techs we love our data and chart customization can get out of hand. Over time you will want to add custom dashboards to make it easier to troubleshoot your devices. How resource intensive is it on both the server, network, and clients being monitored? This is another one of those it depends discussion. If you are only monitoring a small number of things(computer, network equipment, etc) then this is less of a concern. You always need to be concerned about this because your first instinct is going to be to monitor everything. We can both from a device and data point perspective. I have seen, and caused, situations where we have monitored ourselves to death. For instance, at one company where we had limited bandwidth to our remote sites, we overwhelmed the network with just monitoring traffic. There was no bandwidth left for little things like file transfers and pulling up the company intranet. The problem with this is often until you do a proof of concept with the tool you may not be able to answer it. As a general rule agent based tools can help a lot because they only need to send changes and not everything. In all cases though you need to be sure that you are going to get something from the data you collect. It also makes it harder to filter when the time comes to create dashboards and reports. The best way to handle most of these issues is to define a set of things you know you need to monitor, things you think you want to monitor, and things you know you don't need to monitor. Then apply that list to the questions above. It's a simple flow of can I get the data, can I report on it, can I make a dashboard for it and finally will we have enough resources for all of it. The problem I have with this set of tools is that they all have a high level of complexity during the implementation phase. Even the simplest of them can talk several person weeks to get setup correctly and start returning on the investment. Once you do have it setup you will be amazed at how much it will help you become more efficient. You should also be open to deploying multiple tools in this class. Monitoring a multi-tiered applications completely may seem easy at first but it is difficult to do accurately. Keep in mind that monitoring anything is a complex process. It is not uncommon for companies to deploy two or three tools to meet all of their monitoring needs at a complexity level that makes sense for their company. You may have one tool for monitoring the basic information like disk space and cpu utilization, another to monitor application health, and a third to monitor user behavior. Ok so what are my recommendations? Here are a few of my favorites. Nagios(http://www.nagios.org/) This is where a lot of the following tools started. It is a great tool but as the number of your systems starts to increase configuration can be difficult to manage effectively. Which is why the next two started trying to do. Zenoss(http://www.zenoss.com/documents/datasheet_core_commercial_compare.pdf) - This a freemium modeled application with a community version that does the basics well. The Commercial version adds more analysis and optimization information. Check the linked PDF for a more detailed explanation. Groundwork (http://www.gwos.com/pricing/enterprise/) This tool takes a lot of the hard edges off of Nagios. They continue to add features and let you monitor your first 50 hosts for free with their enterprise tool. We will be adding a link in the coming months to the site listing as many as we can find and giving our opinion. These are a great starting point but before you make a decision look at other and make sure they don't better serve your monitoring needs. Our sister site has a review of Zenoss and Groundwork that are worth a look even if they are somewhat old at this point.

23 min
07/06/2014

Devops Mastery - Episode 14 Configuration Management - DevOps Tools

Configuration Management DevOps Tools are plentiful. So we will start this primer off with what I look for in a tool and why. Then we will talk about my current top paid and open source choices. When I am evaluating new tools in this class look at the following things: * Is it OS restricted? If I need to manage Windows Machines and it only supports Linux then the solution obviously won't work. * Does it support the application platform we need to support? This is generally more of an Enterprise problem where you are deploying Application Servers. * Can I write custom scripts? If we have a team that can or is willing to learn how to script we can fill in any minor missing pieces. * How much OS needs to be in place before I can start using it? Will the solution allow me to go from Bare Metal(i.e. no Windows or Linux) to fully installed? Or does it require that we have some basic level of an operating system. * Is it Agent Based, Agentless, or a Hybrid? I tend to lean towards Agentless or Hybrid solutions because it removes the requirement to monitor or restart the agent when they fail. * Does the tool have a DSL or Domain Specific Language or does it use a standard method for describing work to be done? This will tell you how steep an adoption curve you are going to have. DSL products generally require more training than ones based on YAML or XML. This list narrows the field for me. The how much OS needs to be in place is one most people miss in their lists. But if you need to roll out machines for something like a Disaster Recovery Drill it can be critical to your timing and person power requirements. A tool that will go from bare metal to fully functional would be better for that solution. No solution will be a 100% fit for your environment. So you need to make trade offs. If I have a team that can script then customization may not be a problem. If I don't or if they are all new to it I may want to choose a tool that limits what we can do but does more out of the box. How the tool works for me is also important. How do I scale this tool as we grow. Can I set up more than one master server for failover/load balancing? There is nothing worse than a fully deployed management tool with not management server. What happens when the Configuration Management tool and server cannot talk for an extended amount of time. Can the tool maintain the configuration the last known good state? Does the server alert/log that the server cannot connect to the remote machine? Once I have figured out a spreadsheet with the names of DevOps tools down one side and my critical items listed across the top. Below is an example one I filled out for WagsWorld.com(one of my other sites) If the item isn't a simple yes/no answer I have often used a numeric grading system. This let's me further refine my rankings of the tools. If you are a small company working with Linux based systems any of the open sourced tools should be more than enough. Small companies with Windows systems will have a slightly harder time because that will reduce your Open Source options and may require a small investment in a paid for solution. The larger the organization the more people that have to see how it all works out and the more likely you will choose to pay for a tool. The good news is that most of them are pretty cheap at this point. The bottom line: * Take your time do your research * Make sure it will do at least your list of must haves * Make sure you understand what it will take to extend it's operations. * Test it out in a lab before agreeing to spend any money for a tool * Invest in training if it's available for two or more people to get familiar on the tool

18 min

9 Episodes

This podcast is all about doing the DevOps thing. We are here to help you get from a DevOps newbie to being a DevOps Master.

Creator

Brian Wagner, Jason Didonato
Years Active

2k
Episodes

9
Rating

Clean
Copyright

© All rights reserved
Show Website

Devops Mastery