Microsoft's System Center Operations Manager is excellent for infrastructure monitoring. Here are some key terminology and concepts you should know.
Monitoring plays an important role in an IT infrastructure. Drives run out of space, databases crash, and servers begin to crawl for no apparent reason. It’s important to be notified of these issues -- hopefully before they happen, or at least sooner rather than later. Infrastructure monitoring is a huge space with many participating vendors. One of those vendors is a little-known company called Microsoft. Their flagship monitoring solution is called System Center Operations Manager, or SCOM. I recently had the opportunity to design and architect a SCOM 2012 R2 solution, and I ran into many concepts and terminology that seemed foreign to me. I decided that an article from the perspective of a newcomer to this product would be valuable to others just getting introduced to SCOM.
SCOM Applications/Services
We’ve got SCOM set up on a server with a default install. Where does a newbie go from there? The first concept you should learn isn’t how to set up monitors -- instead, you should take a step back and understand the concept of a model. If you don't, the terminology that SCOM uses will throw you for a loop. First, SCOM groups everything by a particular application. You can’t (or shouldn’t) simply create a ping monitor and point to a server or to a disk space monitor at another server. These monitors must be defined to a particular application or service. The application/service could be a Microsoft service -- like Dynamic Host Configuration Protocol (DHCP), Domain Name System (DNS), Active Directory, and so on -- or it could be a line of business software application. The idea is that your entire IT infrastructure is essentially a whole bunch of applications or services you’re providing to customers.
SCOM Management Packs
Each application is represented by a management pack. By default, SCOM is bare bones, a skeleton. Microsoft created SCOM, as a new install, to simply be a monitoring framework to hold management packs. Everything that makes up an application is embedded in the management pack. This includes all logic to discover objects that make up an application, all monitors, all rules, and so on.
You can find just about any major application represented by a management pack out of the box. Obviously, Microsoft has management packs for all their applications, but you’ll also find hundreds of other vendors have released management packs as well for their products. One of the major benefits of using SCOM is that you no longer have to figure out for yourself what’s “best practice” when setting up a monitor. You no longer have to figure out what’s important and what’s not. All of this has already been defined by the vendors, which saves considerable time. It also prevents the admin from having to simply guess what an appropriate disk queue length is, or when there should be an alert on low disk space, for example.
SCOM Models
SCOM uses the concept of a model. Each management pack is designed by two theoretical models: a service model and a health model. You can think of a model as a computer representation of the knowledge of an application. A model is a set of logical rules that define what components make up an application and what it takes to be defined as “healthy.”
A service model may represent:
- How a Windows service relates to the hardware on a server;
- How a CPU relates to the datacenter as a whole;
- How multiple servers in a web farm come together to form a particular web service.
Service models don’t define what a “healthy” application looks like, but rather what makes up the application.
Health models, on the other hand, are the next step in the model hierarchy. A health model represents what "healthy" looks like for the application as a whole as well as its individual components. “Healthy” simply means that the application is providing an expected level of service. A health model includes:
- The monitors that are created to check whether a server responds to a ping,
- Rules that might be created to collect ongoing information about a particular application, or perhaps dependencies and relationships between monitors.
SCOM Classes and Objects
Everything in SCOM is an object; and an object is anything in your environment. It could be a physical, spinning disk, a logical ‘C’ partition on a server, a virtual machine, or just that virtual machine’s operating system. Each management pack contains many objects and also contains discoveries. Discoveries are the mechanism that enables objects to be discovered by each SCOM agent that’s installed on servers. For example, the ‘C’ logical disk on a server is an object. It is up to the discovery to run on each server to determine things like how big the ‘C’ logical disk is, how much space is left, which physical disk it’s located on, and more.
Classes are instances of objects. If an object is the ‘C’ logical disk, then the class that object instantiated from is simply logical disk. There can be multiple instances of logical disks on a single server. A server can have a ‘C,’ ‘D,’ and ‘E’ drive -- each representing different partitions. Logical disk is what the ‘C’ disk was “born” from. It’s a way of getting more specific with each entity.
These are just a few of the hundreds of concepts that go along with a large enterprise product like SCOM. I highly recommend looking through this Operations Manager Key Concepts article on TechNet or purchasing a book, such as System Center 2012 Operations Manager Unleashed. These resources, among many others, are great help in getting accustomed to the terminology and concepts of SCOM.