NMS Primer 6: Ongoing Feeding

This article is Part 6 in a 7-Part Series.

We’ve covered what an NMS does, selecting an NMS, and some tips on implementing an NMS. But how do we keep an NMS up-to-date and relevant?

Signs of a well-maintained NMS:

  • Most Operations Engineers have the console up at their desktop. It’s just part of the routine to look at it each morning.
  • When people report network issues, the first place the network engineers look is the NMS. It’s frequently used to prove that no, it’s not the network’s fault.
  • You check the NMS both before and after planned work, to check that everything is behaving as expected.
  • There’s a handful of alarms, and they’ve all been acknowledged, with a note added.
  • There’s more than one person that knows how to drive it

Signs of an NMS that’s losing the battle:

  • No-one uses the system, except perhaps on a big screen that’s only there because the boss wanted something to look at. When there is a fault, people start logging into routers first.
  • There’s a long list of alarms, including “Node Down” alarms for sites that have been decommissioned.
  • The person who installed it has left, and no-one else really knows how to manage it.

If your NMS regularly displays a long list of alarms like this, something’s going wrong:

Too Many Alarms

Too Many Alarms

No More Shelfware!

How can we ensure that our NMS is like the first one, and doesn’t become shelfware? Several things have to happen - some of them are straightforward technical maintenance, but the more important issues are cultural: NMS management needs to become core to your network management. It’s not an adjunct, a side thing done by the junior - it needs to be something that everyone does. Any network changes should include updates to the NMS.

What You Must Do

  • Assign responsibility: You may choose to use external resources for some elements of maintenance, but you need an internal person who is responsible.
  • Keep the NMS platform itself up to date. You must stay on top of your patching. The NMS can’t magically know about all new devices that get released - you need to keep it up to date. Plan on patching the application at least every 6 months. Don’t let it get more than one patch cycle behind.
  • Auto-discovery helps with maintaining your inventory, but you should still verify that all new devices show up in your system. Most networks don’t have a high rate of device change - it’s not that hard to check that new devices show up, and old ones are removed. Auto-discovery only works if your templates are up to date, and allow SNMP.
  • Look at it every day, and take action on all alarms. Never simply acknowledge an alarm without knowing why it occurred. Change the threshold, or suppress the alert - preferably at the source, or within the NMS if you must. If you’re repeatedly acknowledging the same alarm, you’re doing it wrong - and heading towards insanity.

Next up, we’ll take a look at how to take your NMS beyond the basics, giving you ideas on extending it to become a truly valuable platform.