influxdb-logo

Using InfluxDB + Grafana to Display Network Statistics

I loathe MRTG graphs. They were cool in 2000, but now they’re showing their age. We have much better visualisation tools available, and we don’t need to be so aggressive with aggregating old data. I’ve been working with InfluxDB + Grafana recently. Much cooler, much more flexible. Here’s a walk-through on setting up InfluxDB + Grafana, collecting network throughput data, and displaying it.

Background – InfluxDB + Grafana

There’s three parts to this:

  • Grafana: This is our main UI. Grafana is a “…graph and dashboard builder for visualizing time series metrics.” It makes it easy to create dashboards for displaying time-series data. It works with several different data sources such as Graphite, Elasticsearch, InfluxDB, and OpenTSDB.
  • InfluxDB: This is where we store the data that Grafana displays. InfluxDB is “…an open-source distributed time series database with no external dependencies.” It’s a relatively new project, and is not quite at 1.0 yet, but it shows a lot of promise. It can be used in place of Graphite. It is very flexible, and can store events as well as time series data.
  • Influxsnmp: We need to get data from the network into InfluxDB. There are a few options for doing this, but none of them are particularly good right now. Influxsnmp is a simple utility for collecting data via SNMP and storing it in InfluxDB.

You might ask why I’m using separate collection, persistent store and presentation layers, rather than using one system that does it all. The advantage of this model is that is much more flexible – I can easily add collectors to get data from different sources, or I can use other data stores that I already have. If I later decide I don’t like Grafana, I can query InfluxDB from something else.

Installation

All steps are taken on an Ubuntu 14.04 system. Steps may vary for other systems. Note also that I’m using the nightly builds for InfluxDB + Grafana, so expect to see the outputs change slightly in future.

InfluxDB

Download, install and start InfluxDB:

Now use the InfluxDB CLI client to create two databases. We’re going to use one for SNMP data, and another for storing events. Note that we don’t need to declare schemas.

We now have InfluxDB running. You can connect to http://localhost:8083/ for a simple Web UI to InfluxDB, or query it with the CLI client.

Influxsnmp

Influxsnmp is written in Go. Today it is only distributed as source, so we need to build it. The build process includes some components that require Go >= v1.3. The default repositories for Ubuntu 14.04 have Go v1.2.1. So for this setup we’re going to download the latest Golang binary package, and use that.

First install git, then get Go 1.5.1, and setup our environment:

Now download the source for influxsnmp and build it:

The packaging for influxsnmp isn’t quite right yet, so we have to manually copy over the oids.txt file. There are sample config.gcfg & ports.txt files that ship with influxsnmp. They’re easy to understand. Here’s what mine look like, for polling a few interfaces on my SRX. Note the location of ports.txt – it needs to be in the same directory as the binary file.

Finally we’ll start the poller:

(Yes, it really should be properly installed as a service, with proper config file locations, logging, etc.)

There’s a simple http interface listening on port 9501. That has some basic stats on influxsnmp operations. We can also take a look at InfluxDB to check that we’re receiving data:

So far so good – we’re collecting data, storing it, and we can query it with InfluxDB. Now we need to create visualisations.

Grafana

First download, install & start Grafana. Note that I’m using a nightly build again.

Grafana is now listening on port 3000. Open a web browser, and login using the default admin/admin. Click on Data Sources on the left:

grafana_data_source

Click Add new and fill in these details:

  • Name: snmp
  • Default: selected
  • Type: InfluxDB 0.9.x
  • Http settings: – Url http://localhost:8086
  • InfluxDB Details: – set Database to ‘snmp’. We’re not using a username & password aren’t used, but you need to put something in there to be able to save this. So use admin/admin

grafana_new_datasource

After you’ve clicked Add, you’ll be able to click Test Connection – try it, and make sure that it’s OK.

Repeat the above process for a new Data Source, with two differences – set the Database as “events” and leave “Default” unchecked. We’ll use this later for annotations.

Now we’re ready to go with creating dashboards.

Visualising the Data

Dashboards

Grafana lets you have multiple dashboards. Each dashboard is a set of panels, where a panel could be a graph, a single metric, or some text. You could have a dashboard with 10 separate graphs, or maybe just one huge number. We want to set up a simple dashboard that shows us a graph of the current traffic on my ADSL link.

Click on Dashboards on the left, then the drop-down arrow beside Home. Click New at the bottom:

new_dashboard

On the new blank dashboard, click the small bar at the top left. That will expand out. Choose Add Panel -> Graph

add_graph

Now you’ll get a graph with test data. Click on the title, where it says “no title (click here).” On the pop-up box, click Edit.

Click on the General tab, and set the title to “ADSL Throughput.” Click on Metrics. Grafana’s query editor works well for regular measurements, but it doesn’t currently work for derivative functions. We’ll use the editor to build up our query, and then switch to raw mode to finish off.

Next to FROM, click “select measurement” – this will change to a drop-down that should list the different measurements we’re collecting. We want “ifHCInOctets.” On the WHERE line, click +, select “column” = “ADSL” You’ll need to choose one of your own interfaces, as defined in ports.txt. On the ALIAS BY line, enter “Input.”

Click the “+ Query” button, and add a new query with the same settings as above, except select “ifHCOutOctets.”

Now we need to edit the queries in ‘raw’ mode. Your query screen should look like this:

query_editor

Click the menu button in the top left, and click “Switch editor mode.” We need to add the ‘derivative()‘ function, because we’re collecting a counter, not a rate. We also need to multiply the result by 8, to get bps. Your queries should look like this:

One more thing to do – click on Axes & Grid and click “short” next to Unit. From the drop-down, go data rate -> bits/sec. If you like, you can also enable Legend values: Min, Max, Avg, Current.

Click Back to dashboard. Click on the Gear icon near the top centre of the screen, and choose Settings. Give the dashboard a title, and click the Save dashboard icon.

You should now have a screen that looks something like this:

adsl_throughput

Play around with the time picker in the top-left of the screen, and try selecting & zooming into a time window. It will get more interesting as you collect more data.

Templating

That graph is OK, but what about having a graph per interface? Or having a drop-down list of interfaces, so we can choose which ones get displayed? This is where the Grafana Template functionality is useful. Click on the Dashboard Gear icon, and click Templating. Click “+New”

Set Name: Interface, leave it at Type: query, Data Source: snmp. Set the Query to  SHOW TAG VALUES WITH KEY = "column" and enable Multi-value selection. Click Add. Close the Templating box.

Edit the original graph. Change each query to that it looks for the variable “$Interface” instead of ‘ADSL.’ On the General tab, change the title to “$Interface Throughput.” Set Repeat Panel to “Interface.” Save the Dashboard.

There will be a drop-down list of Interfaces at the top of the Dashboard. Choose a different interface, and watch the graph update. Select two interfaces at once, and see how you get two graphs:

repeating_graphs

Annotations

One of the nice things about InfluxDB + Grafana is that it is easy to add events to a graph. These are called annotations. They could be anything – e.g. a link state change, or maybe a production code deployment notification. Here I’ve stored my events in InfluxDB, but you could pull them from somewhere else – e.g. Elasticsearch. Ideal if you have your syslogs going into an ELK stack.

For this example I want to add annotations when I start & stop watching a TV show on Netflix. This should show up in my traffic stats.

Go to Dashboard settings -> Annotations. Click “+New” and call it “Netflix.” Make sure the Datasource is set to “events.” The query should be:  SELECT * FROM "alert" WHERE $timeFilter  and for the Column mappings, set Title to “show”, and Text to “text.” Close the Annotations box.

Go to the CLI, and manually stuff in some events:

Now we get annotations on our graph. Hover over them, and the tool-tip displays some more info:

annotations_graph

One small niggle is that you can’t disable annotations on a per-graph basis, only on a  per-dashboard basis. Still, it’s very handy.

Future improvements

Obviously there’s a lot that could be improved upon here:

  • Sort out the SNMP data capture – either package up influxsnmp properly, or (preferred answer): write a plugin for Telegraf.
  • Write Ansible + Vagrant config to package up the above
  • Even better: Do it with Docker, based upon Brent’s work.
  • Add more network data sources – e.g. system-level stats, or NOS data collected via API.

The point is that it’s a flexible framework, and it doesn’t need to just be network stats. It gets much more interesting once you start adding in other data sources.

, ,

42 Responses to Using InfluxDB + Grafana to Display Network Statistics

  1. codekoala November 2, 2015 at 7:14 pm #

    Thanks for this guide. I’ve been using InfluxDB, Grafana, and collectd for several months now. Network usage has been a particular pain point for me, and your ‘derivative(mean(“value”), 1s)’ is exactly what I’ve been missing all this time. Although for me, the graph was more satisfying by replacing “mean” with “last”.

  2. thedonfanning December 20, 2015 at 9:57 am #

    Excellent guide to bootstrap this. Now I’m pondering how I can graph traffic totals on a daily/weekly/monthly basis especially at home where I have to contend with data caps.

    • Lindsay Hill January 23, 2016 at 11:29 am #

      I wonder if it would be possible to do daily totals using difference(). Last time I looked that wasn’t wired up though.

  3. kmg January 10, 2016 at 6:31 am #

    Getting Below error if I’m adding templating

    {
    “error”: “error parsing query: found y, expected ) at line 1, char 145”,
    “message”: “error parsing query: found y, expected ) at line 1, char 145”
    }

    Query:
    SELECT 8 * derivative(mean(“value”),1s) AS “value” FROM “ifHCInOctets” WHERE “column” = ‘INT24PortLinkSys’ AND time > now() – 6h GROUP BY time(1y) fill(none) SELECT 8 * derivative(mean(“value”),1s) AS “value” FROM “ifHCOutOctets” WHERE “column” = ‘INT24PortLinkSys’ AND time > now() – 6h GROUP BY time(1y) fill(none)

  4. Peter January 18, 2016 at 7:00 am #

    Great article – InfluxDB and Grafana are useful tools that the network monitoring community should be aware of. For readers whose network gear supports sFlow, you can eliminate SNMP by streaming sFlow based telemetry (which includes the interface counters described in the article) to InfluxDB:
    http://blog.sflow.com/2014/12/influxdb-and-grafana.html

  5. Cullen Jennings January 22, 2016 at 6:19 pm #

    Great article. This all worked for me – thank you kindly.

    • Lindsay Hill January 23, 2016 at 11:31 am #

      Thanks, glad it worked for you. I’m thinking about doing an updated version, using telegraf with the new SNMP plugin for data collection.

      • Oleksii January 29, 2016 at 10:52 am #

        ThanX a lot, very good article. And i have one question. Why you don’t use collectd(with snmp module)? As i know collectd can write directly into influxDB.

        • Lindsay Hill January 29, 2016 at 12:11 pm #

          I can’t remember the exact reason I discounted collectd. I’m sure it would work, but someone put me off it for some reason.

          I’m looking at using Telegraf in future, now it has an SNMP plugin added.

  6. tux February 2, 2016 at 10:21 pm #

    Why 8* DERIVATIVE since the SNMP unit is in octet ?

    • Lindsay Hill February 2, 2016 at 10:39 pm #

      It’s because it is in octets that I multiply by 8. It gives me a speed in bytes/sec, but I want to know bits per second.

  7. zomper February 18, 2016 at 5:50 am #

    Hello , sorry for my English , I executed the line “go get github.com/paulstuart/influxsnmp” but with the following error:

    work/src/github.com/paulstuart/influxsnmp/influx.go:10:2: code in directory /root/work/src/github.com/influxdb/influxdb/client expects import “github.com/influxdata/influxdb/client”

    Can anybody help me?

    Thanks

    • erikespo February 24, 2016 at 5:22 am #

      Same Error for myself
      [email protected]:~$ go get github.com/paulstuart/influxsnmp
      package github.com/influxdb/influxdb/client: code in directory /home/erik/work/src/github.com/influxdb/influxdb/client expects import “github.com/influxdata/influxdb/client”
      [email protected]:~$ go install github.com/paulstuart/influxsnmp
      work/src/github.com/paulstuart/influxsnmp/influx.go:10:2: code in directory /home/erik/work/src/github.com/influxdb/influxdb/client expects import “github.com/influxdata/influxdb/client”

    • erikespo February 24, 2016 at 6:49 am #

      To fix this errors change line /work/src/github.com/paulstuart/influxsnmp/influx.go – “github.com/influxdb/influxdb/client” to “github.com/influxdata/influxdb/client”

      Same with /work/src/github.com/paulstuart/influxsnmp/main.go

      zstyblik recommended this but it hasn’t been implemented in the src yet

      • tofu00 February 24, 2016 at 6:52 am #

        Or much easy : use telegraf with snmp input.

        • Lindsay Hill February 24, 2016 at 7:06 am #

          My plan is to re-do this with Telegraf. Telegraf didn’t support SNMP when I was looking at this last year.

          • tofu00 February 24, 2016 at 8:04 am #

            If you want, I can help you.

          • Lindsay Hill February 24, 2016 at 8:21 am #

            That would be helpful, if you could write up the steps. I’ve been keeping an eye on Telegraf, but I haven’t had time to work with it. I’m sure it’s not too hard, it just needs some time spent on it.

          • tofu00 February 24, 2016 at 9:18 am #

            1. Install Influxdata repo (https://docs.influxdata.com/influxdb/v0.10/introduction/installation/ and check Installation part)
            2. apt-get install telegraf -y
            3. vim /etc/telegraf/telegraf.conf :
            [[inputs.snmp]]

            # SNMP request interval
            interval = “10s”
            # Use ‘oids.txt’ file to translate oids to names
            # To generate ‘oids.txt’ you need to run:
            # snmptranslate -m all -Tz -On | sed -e ‘s/”//g’ > /etc/telegraf/oids.txt
            # Or if you have an other MIB folder with custom MIBs
            # snmptranslate -M /mycustommibfolder -Tz -On -m all | sed -e ‘s/”//g’ > /etc/telegraf/oids.txt
            snmptranslate_file = “/etc/telegraf/oids.txt”

            # Declare your host
            [[inputs.snmp.host]]
            address = “192.168.138.1:161”
            community = “public”
            version = 2
            timeout = 2.0
            retries = 0
            collect = [“if_out_octets”, “if_in_octets”]

            # Interface output (octets)
            [[inputs.snmp.bulk]]
            name = “if_out_octets”
            max_repetition = 127
            oid = “.1.3.6.1.2.1.31.1.1.1.10”

            # Interface input (octets)
            [[inputs.snmp.bulk]]
            name = “if_in_octets”
            max_repetition = 127
            oid = “.1.3.6.1.2.1.31.1.1.1.6”
            4. systemctl start telegraf
            5. Grafana InfluxDB request : SELECT NON_NEGATIVE_DERIVATIVE(MEAN(“ifHCInOctets”)) FROM “ifHCInOctets” WHERE “host” = ‘192.168.138.1’ AND “instance” = ‘9’ AND $timeFilter GROUP BY time($interval) fill(null)
            6. Group by time interval : >10s

            Here the instance is the port number.

            To check snmp-input default config : telegraf -sample-config -input-filter snmp

          • Lindsay Hill February 24, 2016 at 9:31 am #

            Thanks!

      • Lindsay Hill February 24, 2016 at 7:07 am #

        Thanks for that. I haven’t had a chance to look at this yet, but figured it was something changing in the upstream code.

        • Sander Boele March 8, 2016 at 11:26 pm #

          replying to tofu00

          please adjust your query, it was driving me crazy

          SELECT non_negative_derivative(mean(“ifHCInOctets”), 1s) *8 FROM “1.3.6.1.2.1.31.1.1.1.10.557” WHERE $timeFilter GROUP BY time($interval) fill(null)

          • tofu00 March 9, 2016 at 7:24 am #

            yep, sorry, i’m using octets not bits..

          • Sander Boele March 10, 2016 at 9:19 am #

            Network admins only deal in bits. Link speeds are all declared in bits/s, so for me it would make no sense to display speeds in bytes/10s or some other metric.

            Also there is no declaration of the period over which you want to SELECT NON_NEGATIVE_DERIVATIVE(MEAN(“ifHCInOctets”)) or is this done through grouping?

  8. tux March 18, 2016 at 12:21 am #

    The NON_NEGATIVE_DERIVATIVE function use the default value of 1s and the GROUP BY time($interval) is a setting of Grafana.

  9. Tales Santos March 24, 2016 at 3:17 pm #

    I dont know…. Whats the difference between Zabbix history and this stuff? I dont see so advantages

  10. alfredocambera April 5, 2016 at 7:52 am #

    Templating is the key feature of Grafana. Being able to create panels dynamically gives great power to the users. I took me a while to understand how to use it but now I’m creating reports a lot of services that didn’t have any visibility.

    • Jason Hicks April 19, 2016 at 3:48 am #

      Agreed, templates are fantastic, but I’m still working on how to make them work best for my data…

      For diverse data sets, (i.e. linux host metrics vs network switch metrics), is it best to store each set in different InfluxDB databases? That way the template would only apply to, for example, the hosts that collect such metrics. Or is there some better way to filter based on other tags? (note: I’m not suggesting a different database per device, just per device type – linux hosts, windows hosts, routers, switches, access points, etc)

      For those collecting switch data via SNMP, is there an easy way to tell Grafana to plot all interfaces (that are active)? An example dashboard would automatically display appropriate graphs, whether an 8-port, 24-port, or 48-port switch…

  11. ganbold April 20, 2016 at 9:10 pm #

    current telegraf is buggy, graphs will become flat with high spikes, better use influxsnmp

    • Lindsay Hill April 20, 2016 at 9:15 pm #

      Ah, that’s a shame. I was just thinking about doing some Telegraf testing. Maybe I’ll give it a week or two

    • Andreas Schultz April 21, 2016 at 9:15 pm #

      Do you have a Telegraf bug report for that problem for tracking?

  12. Steve May 31, 2016 at 3:41 pm #

    Hi Lindsay,
    Im using InfluxDB for storage and Grafana for visualize, We have a few different Interfaces but they are on the same location. I can get statistic for each interface individually but how can i get the statistic for those which are on the same location.
    I use InfluxDBClient Python to insert performance data to InfluxDB
    For ex: These 2 interfaces are on the same location
    Interface1: Inbound/Outbound – 1555.52 /1862.34
    interface2: Inbound/Outbound – 100002 /180000
    What i wanna have is the graph(same style as individual interface) shows:
    Inbound = Inbound of Int1 + Inbound of Int2 = (1555.52 + 100002)
    Outbound = Outbound of Int1 + Outbound of Int2 = (1862.34+180000)
    and many numbers rest….

    • Lindsay Hill May 31, 2016 at 4:06 pm #

      I think you could either do something with continuous queries, or you could configure the display to stack specific series, e.g. those matching a specific regex.

      • Steve June 1, 2016 at 7:41 pm #

        Hi,
        How to configure the display to stack specific series? For ex: I have 4 lines(2 series) in one graph, 2 for inbound of int1+2 and 2 for outbound of int1+2. Thanks

        • Lindsay Hill June 1, 2016 at 8:51 pm #

          Edit the Graph panel. On the Display tab, there’s an option to Stack series. You can also choose series-specific over-rides. Use that to group series.

          • Steve June 1, 2016 at 9:31 pm #

            Thanks i got it, but it seems to aggregate value to one of series i stacked. Can i make it to a new line separately?

          • Lindsay Hill June 1, 2016 at 9:33 pm #

            You can group different series. On my Grafana 3.x system there’s options for grouping as A, B, C, D

  13. phalek June 15, 2016 at 7:05 pm #

    Hi,

    If you want a nice interface for collecting the SNMP ( or WMI, web or other data ) you can also use Cacti and the CereusTransporter plugin. The plugin will send all the data gathered by Cacti to an influxdb where you can process it with Grafana as described here.

    https://www.urban-software.com/products/nmid-plugins/cereustransporter/

  14. Robert December 20, 2016 at 2:45 pm #

    Hey I used your post as inspiration, but I FINALLY got the SNMP plugin built into Telegraf to work. Took me all day toying around with it. Thanks for taking the time to share.

    I was curious how you came to your derivative/mean of the HCInOctets chart? Is there some formula for that? Is it based on the polling interval? I noticed if I changed the derivative value I would get wildly different numbers, but I was using a different polling interval.

    • Lindsay Hill December 21, 2016 at 7:56 am #

      Ah, good to hear it can get working with Telegraf. I’ve been meaning to do it for ages but keep finding higher priority things to do…

      The whole derivative/mean thing was a bit complicated as I recall. I’m a bit rusty, but here goes: The first bit of it “derivative(mean(“value”),1s)” says “give me the rate of change per “, where in this case is 1s. So it gives us the bits per second.

      This should not require anything specific for polling interval, since influxdb should be able to figure out the differential between values, since it knows the values & the times they were stored at. I think the more important bit is the “GROUP BY time ($interval)” clause. I think you can run into problems there, depending upon the relationship between your collection interval, and your display time interval.