Lindsay Hill

Advertising Bogons (Or Was I?)

2025-11-08T23:00:00+00:00

Been a while since I did a “War Stories” post - here’s one about a routing policy I screwed up recently. Gave me a fright that I’d really messed something up, but in the end it was no big deal, and it taught me something about who uses route collector info.

Uh-oh…we’re announcing bogons?

While looking at bgp.he.net/AS32590 for something unrelated, I saw this:

Investigating more, it tells me this:

What the hell is going on? We should never be announcing bogon ranges to any peer. I rushed off to check some of our peering sessions, e.g

lindsayh@rtr> show route advertising-protocol bgp 86.104.125.69

inet.0: 1009955 destinations, 8974886 routes (1008431 active, 2 holddown, 2770 hidden)
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 155.133.226.0/24        Self                                    I
* 155.133.229.0/24        Self                                    I
* 155.133.250.0/24        Self                                    I
* 162.254.197.0/24        Self                                    I

We’re just advertising the normal set of prefixes I expect at that site. Defintely not advertising anything unusual to HE. So why do they think we’re advertising bogons?

Hmmm…Cloudflare Radar also says we’re announcing junk. Must be a clue there.

A NOC that responds

I reached out to HE, and their NOC is terrific - they responded very quickly on a Friday evening, even though we’re not a paying customer. Kudos to them.

They pointed out that their Super Looking Glass gives you an indication of where they’re learning those prefixes from. For example, if you look at 155.133.226.0/24, you’ll see “Learned from: route-views.sg”

This was the bit I’d missed - bgp.he.net is not just using information from what they see advertised to them. They are also looking at data received by route collector services, such as RouteViews. I had recently added sessions with RouteViews. They wanted to see our full tables, to get another view of what a well-peered edge network sees. I had somewhat carelessly advertised them our full tables…which includes some prefixes that we use internally. We of course strip those out for advertisements to regular peers, but I had reused a policy that should only be used for our internal sessions.

No Real Harm

No real harm done, since we didn’t advertise those prefixes to any “real” peers. Plus of course everyone else is stripping bogons on received prefixes, right? Right?

But it was interesting for me to learn more about how sites like bgp.he.net and radar.cloudflare.com get their data.

This article is Part 13 in a 13-Part Series.

Part 1 - War Stories: Loops that Permanently Broke the Network
Part 2 - War Stories: Switches Lying about Duplex Mismatches
Part 3 - War Stories: Check Point Meltdown
Part 4 - War Stories: Dual-Vendor Firewall Strategy
Part 5 - War Stories: Proxy ARP Auto-Configuration
Part 6 - War Stories: Gratuitous ARP and VRRP
Part 7 - War Stories: Cursed VLANs
Part 8 - War Stories: Unix Security
Part 9 - War Stories: ITIL Process vs Practice
Part 10 - War Stories: Closing out Projects
Part 11 - War Stories: Backup NICs, DNS and AD
Part 12 - War Stories: Always Check Your Inputs
Part 13 - This Article

Juniper Release Process 2024 Redux

2024-10-28T03:00:00+00:00

I’ve written before about choosing a Juniper version. Juniper has a new release process. Well, two actually - the new official process, and what they’re actually doing…

First the good bits. Juniper started a new release process in 2023. Key points:

Numbering format remains the same - “..R-S”
New feature releases are only twice a year, in June & December - “YY.2” and “YY.4”. Not quarterly.
No more “R3” maintenance releases - just the initial R1 release, then a later R2 release.
Service Releases “-Sx” continue.

I like the new process. It simplifies the versions they have to maintain. We used to say that you should wait for the R3 release, but really there’s no difference between R3 and R2-S3. Now Juniper doesn’t have to maintain the quarterly releases, and all the maintenance and service releases below them. It avoids the confusion that happened when they kept patching -R2, even after releasing R3.

But here’s the thing with a simplified release process: you’ve got no excuses for not delivering. I have no issue with 6-monthly feature releases. But it feels like they’re doing annual releases these days.

Look at the current download page for the QFX5120-48Y, a very common ToR switch:

Hmmm, what’s what warning icon telling us? “Use for Lab Qualification only”:

OK, how about 24.2R1-S1, that update should have fixed it? Nope.

It is similar for other platforms - the 24.2R1 and 24.2R1-S1 releases are marked for “lab only.”. One of the platforms that it is not marked lab only is ACX7100-48L. But in that case I tried using 24.2R1 in production, hit multiple issues, and then JTAC slapped my hand for daring to use that version. That platform does not have any 24.2R1-S1 release published, lab only or not.

The most recent release for QFX5120-48Y is 23.4R2-S2. That’s also the “suggested” release, so it should be good, right?

Well no, not really. Amongst other bone-headed problems, it logs thousands of messages an hour like this about Unexpected TLVs. All Juniper customers know the standard TAC response “just filter the log messages” - the problem is the rate is so high that you still get a lot of syslog messages like EVENTD_KERNEL_LOGS_DROPPED [junos@2636.1.1.1.4.82.30 count="2"] Due to excessive logging, (2) kernel events dropped by eventd

I do not understand how Juniper QA continues to ship code that has huge increases in syslog messages, often at INFO and higher. These are not debug messages. This pattern happens again and again. You’d think “1000x increase in syslog volume” would be cause to investigate, but no.

The most obvious reason for this is the looming HPE acquisition. I know from firsthand experience how tough it is. The official word is always that everything in sunshine and rainbows, but it’s never that easy on the ground. The top execs get big payouts to stay, or payouts to leave. But the engineers on the ground have a big fat pile of uncertainty. Many will find a new job, to get greater certainty. Those that stay are distracted, waiting to hear if they still have a job.

I hope it gets better soon.

Why Single-Port LAGs?

2023-09-24T22:00:00+00:00

I recommend always using LACP for external connections. It will make your life easier, even when you only have a single connection. Here’s why we do it.

If you set up a PNI with AS32590, we will strongly recommend the use of LACP, even for a single link. If you have two PNIs with us, they will each be separate single-member LAGs, because they will be on different routers on our side.

It’s only once you have more than 2 links that we start using LACP in the way most people think of it.

It’s not just us. In Google’s Peering Policy, under “Private peering physical connection requirements”, it states

Link aggregation via LACP is required for all links, including single links

Ever wondered why that is? What’s the point in setting up a LAG if I only have one link? What does it give me? More lines of config for no operational enhancement? And I thought we should use L3 everywhere anyway?

I can’t speak for Google, only for the way we operate our network. But I’m pretty sure their reasons are similar to ours. The obvious reason is for future growth, but there are operational benefits too.

Easy Expansion

Traffic volumes only go one way: up and to the right. If I have an existing 2 x 1 x 100G PNIs to you, and we need more capacity, it’s easy - add another port to each LAG. No new IPs needed, no BGP changes.

I can order the cross-connect and patching, and preconfigure my port to be part of the LAG. Then when you patch your end, the new link starts working straight away, no further changes needed.

LAGs can also help when going from n x 10G to n x 100G. link-speed mixed seems like the Devil’s Work the first time you see it, but it is very useful. Set that option on a 2 x 10G LAG today. When you need to upgrade to 100G, you can add a 100G link to the bundle. It will start balancing traffic, and then you can remove the 10G links, no BGP flaps or changes.

Operational Benefits

The growth part is obvious. But there’s other operational benefits. Consider a real example I am dealing with today. I have a flapping 100G port on a Juniper PTX. The issue only starts with 21.4R3, and only when the remote end is a CFP2 optic, but not all CFP2 optics.

ATAC wants me to move the connection from port 0/0/4 (a QSFP28 port) to 0/0/3 (a QSFP56-DD port). Yes, we are clutching at straws, but this one has been hard to reproduce in the lab, so we’re eliminating one more thing.

The router is at a remote site. I need to log a ticket to get remote hands to move the cross-connect. How can I do it with the shortest outage? I’d like to copy the IP address from port 0/0/4 to 0/0/3. That way when the patch cable gets moved, everything comes up:

lindsayh@ptx# show interfaces et-0/0/4
description "Transit: Potatotel AS64497 [100Gbit]"
unit 0 {
    family inet {
        address 198.51.100.122/31;
    }
    family inet6 {
        address  2001:db8:aa00:15::1/127;
    }
}

{master}[edit]
lindsayh@ptx# copy interfaces et-0/0/4 to et-0/0/3

{master}[edit]
lindsayh@ptx# show | compare
[edit interfaces]
+   et-0/0/3 {
+       unit 0 {
+           family inet {
+               address 198.51.100.122/31;
+           }
+           family inet6 {
+               address  2001:db8:aa00:15::1/127;
+           }
+       }
+   }

But I can’t do that:

{master}[edit]
lindsayh@ptx# commit check
re0:
[edit interfaces et-0/0/4 unit 0 family inet]
  'address 198.51.100.122/31'
     identical local address found on rt_inst [default], intfs [et-0/0/3.0 and et-0/0/4.0], family [inet].
error: configuration check-out failed

{master}[edit]
lindsayh@ptx#

I need to schedule a time to co-ordinate the move with Equinix, or accept it will be down between when they move it and I’m next online.

If we had used a single-member LAG, it would be easy. Just run set interfaces et-0/0/3 ether-options 802.3ad ae4. When they move the cable, everything will work. Later I can clean up the LAG config from the old port.

For your own internal links between devices, you might choose to make them all independent L3 links, and that’s OK. That may be the best choice. But any connection to a third party, e.g. PNIs, IXPs, Transit links, you should default to always using LACP, even if you only have one link. A couple of extra lines of config today will save you time later.

Enforcing First AS in BGP

2023-04-17T01:00:00+00:00

The BGP RFCs state that external BGP peers should insert their own AS into the AS PATH advertised to eBGP peers. Some peers strip their AS, generally for commercial gain. Juniper and Cisco have opposite default behaviors for handling this. Make sure you set bgp enforce-first-as on Juniper routers. Caveats apply.

Background: Traffic Anomalies

A few years ago I was looking at some traffic reporting anomalies. My IPFIX data said that traffic with next-hop AS was around 3Gb. But my SNMP data showed that a PNI to that peer was doing 8-10Gb.

I first doubted my router, because I had issues with IPFIX in the past on that specific platform. I also wondered about sampling rates. I have high flow rates, and need to set the sampling to be more coarse. But it was a big anomaly.

Slicing & dicing the data different ways, and chatting to colleagues about it, we saw what was going on. IPFIX showed the right volumes when reporting on destination interface. But some prefixes received from the peer did not contain the peer’s AS. We still accepted them.

Huh? Isn’t it normal behavior, to insert your own AS into any prefixes you advertise to external peers? It is a key part of BGP loop prevention. Why did my router accept those prefixes? What gives?

Always Check the RFC

When in doubt, always start with the RFC. They are very readable, and this is exactly the sort of behavior they should define.

RFC 4271 section 5.1.2 states that

When a BGP speaker originates a route then:

a) the originating speaker includes its own AS number in a path segment, of type AS_SEQUENCE, in the AS_PATH attribute of all UPDATE messages sent to an external peer.

Note that there is no “SHOULD”, “MAY” or “OPTIONAL” about it.

Legitimate Exception: Route Servers

Route Servers are a specific, legitimate exception to the above. RFC 7947 Section 2.2.2.1 states:

As a route server does not participate in the process of forwarding data between client routers, and because modification of the AS_PATH attribute could affect the route server client BGP Decision Process, the route server SHOULD NOT prepend its own AS number to the AS_PATH segment nor modify the AS_PATH segment in any other way.

Almost all IXPs operate this way today. I peer with a handful that don’t, and they annoy me. HKIX is one. PIT Chile changed default behavior this year.

Why would you strip your AS?

Route servers have a legitimate reason to not insert their AS. Why else would a network do it?

There are other use-cases where you need to manipulate the advertised path, e.g. AS migration. See Daniel’s blog for examples.

What about less than legitimate use-cases?

Imagine that I operate a CDN with extensive peering and transit connections.

And let’s say that you operate an eyeballs ISP, with two upstream providers. Your upstream providers charge you on a traffic volume basis. They in turn have transit agreements with other operators, and peer at IXPs. They might have bi-lateral peering at IXPs or PNIs with me.

All else being equal, if I have identical relationships with those networks, I will split traffic to you across them.

Disclaimer: BGP is a suggestion framework, not a proscribed routing protocol like OSPF. I can and do route traffic according to my business needs. Your routing suggestions are just that: suggestions.

Now what if one of your transit networks is a bit shady, and wants to maximize traffic going via their network? They have two levers to pull: Either announce more specifics for your prefixes, or strip their own AS. I ignore any other BGP attributes. Announcing more specifics has other issues, and may not be possible. But they can strip their AS, and hope that I don’t notice.

Now I have two paths, with everything equal except the AS path length. Default BGP best path selection will make me send traffic via that provider.

Or what if I peer with both you and your transit provider at an IX? I see two paths, with the same AS path length. I will split my traffic between direct to you, and via that transit. That is not cool. If you’re at the IX, I should send everything direct to you.

AFAIK, this only works with bi-lateral sessions. All route servers drop announcements where the first AS in the path is not the advertising peer’s AS.

Juniper vs Cisco behaviour

Juniper and Cisco take different approaches to this. By default, Cisco will only accept prefixes where the first AS matches the eBGP peer AS. You can disable this using the bgp enforce-first-as command.

Juniper allows peers to strip their AS by default. You must explicitly set enforce-first-as.

For a typical IXP scenario, if you have a Cisco router, you need to configure “no bgp enforce-first-as” for route server sessions. With a Juniper router you must set “enforce-first-as” for all sessions except the route server sessions. There is no equivalent to “no enforce-first-as” on Junos.

Effects of enforcing first-AS

When enforced, both Cisco and Juniper will discard any prefixes received where the first AS in the path is not the peer’s AS. They maintain the BGP session, and they will accept any other valid prefixes received.

Summary: If you peer at IXPs, and use Cisco gear, you’re OK. If you’re using Juniper, check that your config is enforcing first AS for all sessions except those from route servers.

Juniper Version Selection

2023-04-06T03:00:00+00:00

Picking the right Junos version is important. If you’re not familiar with Juniper, finding and downloading the right software package is confusing. Here’s some guidance on picking the right version.

TLDR: Check the Suggested Releases page, find the latest service release in the suggested version for your platform. Almost never use the very latest version. Never use the version the box shipped with. Pay attention on the Downloads page, there are traps.

It’s useful to understand Junos version numbering, and the upgrade policy. Then check the Suggested Releases page to see what they recommend, check if that makes sense, and figure out how to get from here to there.

Understanding Version Numbering

These days Juniper publishes a new release train every quarter. Versioning is simple “..R”. So 21.4R1 is released in the 4th quarter of 2021. New releases add new features and support new hardware. Configs may break

They then publish “service releases” on top of that, for example 21.4R1-S1 and 21.4R1-S2. These are supposed to only be bugfixes, but complacency breeds contempt. So sometimes they throw in throw in breaking changes that may render your existing config non-bootable, because why the hell not? Be grateful if they’re documented, like the payload-protocol vs next-header change.

A few months later they publish an “R2” release that rolls up bugfixes, and may have some small changes in behavior. No more service releases to the R1 train after that. A few more service releases, then they introduce an R3 release. Again rolls up bugfixes, perhaps with small behavior changes. They might add some more service releases on R2 after R3 comes out. I wish they wouldn’t.

The R3 release will see service releases over the course of its lifetime, becoming further and further apart. No R4. Many engineers look for R3 releases before considering upgrades.

Pre 2017 versions, e.g. 12.x and 15.x had their own thing going on with release numbering, but you can ignore that if you’re working on supported hardware.

Extended End of Life

Juniper went a bit funny with “Extended End of Life” versions for a while. Their docs were full of references to those versions, but they were all years out of date. I couldn’t predict which versions would be considered extended. They fixed this in the last couple of years. Now it’s obvious - the “even” numbered releases such as 21.2, 21.4 are “Extended” - they have 3 years of Engineering support after first release. The odd-numbered releases like 21.1 and 21.3 have 2 years of support.

This is 3 years from the first R1 release. Given that the R3 release comes out around 9 months after R1, and you’ll wait for a couple of service releases on top of that, it’s often a year from release when you install it. So you’ll want to be on the extended end of life releases, to give yourself a couple of years of support.

The Junos OS Dates & Milestones is your go-to page to check support lifecycles. Some releases will get even longer support, e.g. for specific EoL hardware.

JTAC Suggested Releases

Juniper has a great page “Junos Software Versions - Suggested Releases to Consider and Evaluate.” That page has a section for each of the main product lines, and model-specific guidance. Find your model, and check the listed versions, and “last updated” date.

They used to call it “Recommended” versions rather than “Suggested,” toning it down a little. It’s still your first and best place to start when picking a version.

For example, if you’re updating an EX3400, it says “Latest 21.4R3-Sx”, last updated 14 Nov 2022. Straightforward - find the latest available service release for the 21.4R3 release train. Last updated a few months ago, so it’s solid advice.

Yes, these versions will still have bugs. There’s no guarantee they are perfect. But if you don’t have any specific requirements, they are always a solid choice. They are mostly conservative, but there will be no surprises from JTAC if you log a case against them. They will almost always be extended end of life releases, except for specific hardware support.

You should bookmark that page, and sign up for change notifications.

OK, But What About

What if I don’t like the suggested release? What if I need feature ? Or what if they have multiple suggested versions, e.g. for MX80 today they suggest 20.2R3, 20.4R3, 21.2R3. Not helpful.

In those cases you’ll need to make your own decision. Remember that Juniper is offering suggested releases - as long as you are still running a supported release, they will still support you. They might question your choice, but if you have a solid rationale, it’s fine.

If you really need feature that is only in the very latest code, that’s what you have to do. If running brand new hardware, you might need a brand-new software release.

My general advice in those situations is:

Pick an even-numbered release if possible
Pick the oldest even-numbered version that has the features you need
Take the latest service release in that train

If there are multiple suggested versions, look at the future support lifetime, and pick something you’re comfortable with. E.g. 20.2R3 is about to go end of support. That’s a poor choice if you’re upgrading today. In my case I would pick the newest branch, unless I had wide deployment of an older branch and was sticking with it for a while.

What’s this about hardened releases?

I saw a reference to hardened releases somewhere - what about those?

My advice is to ignore those. As best as I can tell, Juniper decided to do “hardened” releases for specific use-cases, e.g. EVPN-VXLAN, or VCF. But they did a poor job of explaining what a hardened release is, and have not kept their documentation up to date. You can see traces of it in this PDF, but I have not seen any clear public definitions. If you go to Pathfinder and click on Data Center, then select an architecture, e.g. MC-LAG, you’ll see a “hardended release.” At the time of writing, it says that is 20.2R3. Installing a release today that goes End of Engineering in June this year is a bad move. Don’t do that.

The “Suggested Releases” page has recommendations for use-cases for the QFX platforms. At 2023/03/30, for MC-LAG (QFX5K) it says “Latest 21.4R3-Sx / 22.2R3”. Those are better choices. At the time of writing, they suggest the same version for all use-cases. Probably tells us something about what happened to their plans for “hardened” releases.

Juniper Upgrade Policy

If you’re trying to get to there, I wouldn’t start from here

When I upgrade a standalone Arista switch, I copy the new software to it, tell it to use that image, and reload. I don’t care what version I’m on today. With Junos, your current version impacts how you get to your destination. Depending on your current version, you might officially need to go via interim steps to get to your target.

The official policy says:

For both EOL and EEOL releases, you can upgrade to the next three subsequent releases or downgrade to the previous three releases. For example, you can upgrade from 20.4 to the next three releases – 21.1, 21.2 and 21.3 or downgrade to the previous three releases – 20.3, 20.2 and 20.1.

For EEOL releases only, you have an additional option - you can upgrade directly from one EEOL release to the next two subsequent EEOL releases, even if the target release is beyond the next three releases…For example, 20.4 is an EEOL release. Hence, you can upgrade from 20.4 to the next two EEOL releases – 21.2 and 21.4

If you do annual upgrades, it’s easy. Going 20.4 to 21.4? Straight shot. What if you have 18.4? Do you really want to go 18.4 -> 19.4 -> 20.4 -> 21.4? Tedious. What if you’ve found an old box on the shelf and want to use that? Junos upgrades and reboots are very slow, do I have to go through all that? Can’t I skip some steps?

If you do a USB/TFTP install, you can go straight to any version. If you’re doing this online…it is very version- and platform-dependent. I have my own experience for what steps I can do. For example, I know I can take a QFX5110 from 18.4R2 -> 21.4R3 in one step. I also know that EX3400 18.4R2 -> 20.4 is impossible if it’s too early a service release on the EX3400. I’ve hit similar issues on MX, where some steps were too large. If you have a large fleet to upgrade, try it out on a some low risk systems. Generally it fails at install time. If you have a handful of systems, then sticking to the guidance is safer.

Some platforms specifically call out supported large jumps, e.g. SRX 15.1 -> 19.4R3. If Juniper says it’s OK, then go for it. Otherwise test first.

Downloading the right software

OK, we’ve figured out what version we want, and any interim releases we need. Let’s download it. Should be simple right? Yes and no. It’s easy once you know the tricks, but hard at first.

Let’s say we have a QFX5100, and we want to run the latest 21.4R3-Sx release, as suggested.

Start with the Downloads page, and put QFX5100 into the search box:

uh…do I want QFX 5e Series Switch, QFX 5e Series Switch with Enhanced Automation, Limited - MacSec Enabled QFX 5 Series, Limited - QFX 5 Series Switch with Enhanced Automation, QFX 5 Series Switch, QFX 5 Series Switch with Enhanced Automation?

Turns out you don’t want any of those. The first page of results is a trap. It only has the “R” releases. You almost never want a base “R” release, you want a Service Release, e.g. R3-S4.

Scroll up, go to the drop-down, select “Junos SR”, and make sure 21.4 is the selected version. Now we see the service releases, in reverse chronological order. 21.4R3-S3 is the latest at release time. But there’s 6 variations. Which one do I want?

As a general rule, you never want a “Limited” release unless you’re in specific restricted countries. If you’re in one of those places you’ll know about this, if you’ver never heard of it you can ignore it. So now the choices are 5 vs 5e, and with or without “Enhanced Automation.” If you don’t know, choose “Enhanced Automation” - it will help you later. Read this for more info. Last thing, 5 vs 5e? For QFX5110 and QFX5120, you must use the “5e” image, and it will be the only option you see. For QFX5100 you can install 5 or 5e images. This is a trap. Do not install 5e unless you know exactly why you’re doing it. Get the “5” image.

Other products like the PTX10001-36MR are much simpler. There are just one or two download variations.

One last thing before downloading that file. Make sure you are selecting the file from the “Install Package” section

Do not download the file from the “Install Media” section unless you are trying to create a USB image. For a while the default page for some products opened to the Install Media section. Made me download the wrong file quite a few times.

What about Junos vs Junos Evolved?

Go to the downloads page for PTX10003, and you’ll get this dropdown:

Can I choose if I want Junos Artisanal or Junos Evolved? No, this is just an artifact of the way they publish OpenConfig models. Those are listed under Junos, but the software you want to download is under “Junos Evolved SR.” Do not ask me why.

Once you’ve finally tracked down the right software, click on the package, login if required, accept the license, and download it. Then copy it to your switch, and install it. Exact methods vary by platform, start here for common methods. Install it, reboot, and cross your fingers!

New Juniper Rack Mount Kit

2023-04-03T03:00:00+00:00

Juniper has a new enhanced four-post rack mount kit “JNP-4PST-RMK-1U-E” for their 1RU datacenter devices. It works with devices like the QFX5120 and PTX10001-36MR. It is much improved over the legacy rack mount kit. It are not as good as some competitors, but it is backwards compatible. It makes switch installation quicker and safer.

Background: Current 4-post rail kit

Juniper has used the same 4-post kit for their 1RU datacenter switches and routers for many years. The same kit works on QFX5100, QFX5110 and QFX5120-48Y switches. The MX204 uses a slight variation, but is almost identical. Oddly, the QFX5120-32C uses something completely different. Devices are secured to the front and rear posts. 2-post mounting is unwise for modern deep devices with heavy PSUs. You can still get away with 2-post mounting for lighter, shallower access switches. Modern servers and deep switches/routers need 4-post mounting, or some sort of shelf.

The current kit “EX-4PST-RMK” has 2 parts per side:

One piece screws in to each side of the switch. Note that there are 8 holes per side, but Juniper supplies a total of 12 very small screws. As you can imagine, installing 12 very small screws per switch is no fun when you have a stack of 50. The other pieces of the rail kit mounts from the rear, to connect the switch to the rear posts.

The switch installs from the front, and screwed into…oh. Wait. Yes, you will need to install 8 cage nuts first (not supplied). Make your blood sacrifice to the networking gods.

Having installed the cage nuts, now you can supply your own screws, and screw in the front…no…wait. What’s going to hold this heavy switch up while the screws are going in? The documentation says you need two people for this step. But we all know that installation doesn’t work that way.

Can we pre-mount the rear rails, then slide the switch in, so it’s supported from the rear while we screw in the front? No chance. The rear rails are flimsy pieces of metal that wilt when you take them out of the box. Any slight bend and they bind up when sliding into the front rails. There’s no safe way to lift a switch in by yourself, slide it onto the rear rails, then screw in the front.

Your choices are: find a helper, install a temporary server below the switch, or: Patchbox setup.exe:

This acts as another pair of hands. Now we can screw in the front, then install the rails at the rear. Note how easy they flex. Force them in, screw in to the rack, go back and tighten the front screws, and remove the setup.exe.

This kit works, but it is dangerous for one person, and it wastes time and money. Juniper uses the same kit for the new PTX10001-36MR, a very dense 1RU router with 3kW PSUs. Even better, for the PTX10001-36MR, 4 of the little screws are different to the others. No guidance on which ones to use where. Those are very small screws and very flimsy rails for a system that weighs 18kg.

The Industry and the Competition

If you’ve spent your career installing network gear, you might assume “that’s the way it is.” Or it’s a challenge, see who can do the difficult task as quickly as possible. Or you take any suggested improvement as some sort of challenge to your engineering chops. Check this NANOG thread. Yes, I know, be very careful reading NANOG. A few quotes:

A 30-minute time to install a regular 1U ToR switch seems a bit excessive. Maybe the very first time a tech installs any specific model switch with a unique rail configuration. After that one, it should be around 10 minutes for most situations.

I’m an idiot for being so slow:

30 minutes to pull a switch from the box stick ears on it and mount it in the rack seems like a realllllly long time. I think at tops that portion it that’s a 5-10 minute job if I unbox it at my desk

Can someone explain to me how to install 8 x cage nuts, 12 x small screws, mount the device, and install the 8 x cage nut screws in 10 minutes? I’m sorry, I’m calling bullshit. Someone else claims to be even faster:

it usually takes about 3 minutes

And somehow, the tool-less kits are slower and more difficult? Why has no-one told the server engineers this?

Those speed rails as well are a bit of a challenge to install

I’m sorry what? No they are not. Not even close to a challenge to install compared to typical network vendor rails. I can install and remove Dell server rails from the front of the rack without even needing walk around the back. They are explicitly designed to be quick and easy to work with, not “as cheap as possible.”

Other comments were along the lines of “considering overall lifetime TCO, it’s not a deal-breaker.” That I can understand. It’s not going to be the sole factor deciding on vendor X over Y. Some folks said they only rack a few switches once a year. Yes, for you, it doesn’t matter. For those of us doing this more than a few times, it does matter. “My switches run 15 years with no hardware replacements.” Some of have let go of CatOS. There was no network hardware on the market 15 years ago that meets my needs today.

I can install an entire 384lb 21U core router in 30 minutes.

Yeah, well, good for you pal. My father walked uphill both ways to school, and on the day it snowed he still had to go to school. But here’s the thing: it doesn’t need to be that way. Just because you had to live with something doesn’t mean the rest of us have to. It doesn’t have to involve installing tiny fiddly screws and delicate balancing acts. There’s no reason other than stubbornness to resist improvement.

Server vendors solved this problem years ago. Dell, HPE, Lenovo have all had 4-post rail kits that work well for years. Yes, they’ve gone through iterations, and yes, they all have little tricks you need to learn to operate them. Yes I have caused a ball bearing explosion in DC11. But they are solid, and much quicker and safer than what network vendors do.

The key difference is that the rails are first installed into the rack, front and rear. They have clips and square lugs to fit into the posts, no cage nuts or screws needed. The server then slides into the rails. Quick, no tools needed.

Not all networking vendors have ignored progress. Arista rail kits work the same as server rail kits. Clip one piece to the side of the switch, install the rails into the rack, clip in, then slide the switch in. They also have adapters for 2-post rail kits. So it can be done. Dell does something similar.

Juniper’s New Rack Mount Kit

I have been telling Juniper this is a problem for years. Other customers too. Juniper has listened, and developed a new rack mount kit “JNP-4PST-RMK-1U-E.” No public documentation yet, but it is on the pricelist and orderable.

The first thing you’ll notice is there are 3 metal pieces per side. One piece screws onto the switch. Yes, you do still need to use some tools. The mounting points are the same as the old rails, so you can use this on all the same hardware that the old kit works on. The small bag there contains 16 screws! Install them all, or keep some for spares, leave a couple empty, for old time’s sake.

The other front and rear parts attach to the rack with no cage nuts required. The square lugs fit through the hole, and the retainer clip holds it in. The retainer clip design is a little suspect to me, but it is good enough.

Then slide the switch in, and tighten the thumbscrews. Done! No apprentice needed to hold the heavy switch while you faff about dropping cage nut screws.

Note there are no catches stopping the switch coming out once you undo the front thumbscrews. Most server rails let you slide the server almost all the way out, then you need to undo a catch to remove them all the way. It’s OK to not have one here, where you are not going to be opening the top of the switch while it’s in the rack.

Verdict

This new system is a real improvement over the old design.

Pros:

Safe for solo installs
No cage nuts needed
Saves time and money
Should handle higher weight devices better

Cons:

Still need to put the damn little screws in
Watch your fingers, the edges are a little sharp
Need to remember to add to order

I’m glad Juniper has listened to feedback. They were falling behind competitors. This will save me time and money. I plan to order these, and hope they are the default option soon. I’d like to see more improvements with new hardware, offering true tool-less install.

And for the curmudgeons who say I’m wrong, the old ways are the best, I am a poor excuse for an engineer, they will stick to the old ways? Good luck to you. You do that. I’ll move with the times. While you’re faffing around in the DC, I’ll go do something more useful.

EX3400 Disk Space and Upgrades

2021-12-21T04:00:00+00:00

The Juniper EX3400 switch series is a decent access switch. But a Product Manager chose to save $0.50 on COGS by choosing a 2GB disk. That’s just not enough space to handle normal Junos upgrades. This has wasted untold engineer hours on busywork. I hope that person (A) got a bonus, and (B) is never allowed to under-spec hardware again.

Here’s some tips I’ve learnt for manual and automated upgrades for EX3400s.

Manual Upgrades

Search for “Juniper EX3400 disk space” and you’ll find plenty of people complaining about this, and some suggestions. Juniper KB31198 looks like a good place to start. But it starts with request system storage cleanup and request system snapshot delete snap*.

Those might work if you’re upgrading from 15.1X -> 18.2. Maybe if you’re lucky it will be enough for upgrades within the 18.4 train. But it almost certainly won’t work if you’re going from 18.4.x -> 20.2.x.

There have been PRs that are supposed to fix this, and they might help around the edges, but they don’t help a lot.

With certain version combinations, you could get away with copying the new verson to /mfs, and installing from there. It was dependent on your source & target image.

The only method I have found that works is this:

lindsayh@ex3400> start shell user root
Password:
pkg setop rm previous
pkg delete old
exit

This will completely remove the oldest installed image, freeing up plenty of space. These commands have not caused me any problems on the hundreds of systems I have run it on. It would mean that you can’t do a rollback from the current version to the previous version, but this is not a problem. You’re about to install a new version.

It’s not quite as terrifying as my early days of upgrading Cisco 3500s that only enough space for one single image. If the upgrade failed, things went very bad.

ZTP/Automated Upgrades

The previous method is OK if you only have a handful of systems. It’s not practical if you have a large number of systems, or if you are using ZTP to set up new switches.

Juniper switches that have been zeroized will use DHCP to retrieve their config file, and the image to install. So you could have some config like this on your DHCP server:

  if ( substring (option vendor-class-identifier, 0,14) = "Juniper-ex3400") {
     option ztp-ops.config-file-name "ztp/conf/ex3400.txt";
     option ztp-ops.image-file-name "ztp/images/ex3400.tgz";
  }

With 15.1X systems, this would work. But new EX3400s ship with at least 18.2 code, and you want to run 18.4 or 20.2. Even on a brand new, out of the box system with no logs or other config, there is not enough space to run a regular upgrade. (Does that speak more to Juniper QA, or just how criminal the under-provisioning of hardware was?)

You need a different approach for ZTP. I use a modified version of the juniper-ztp scripts here. It is a little more complicated, with a few moving parts, but here’s how it works:

1/ The new (or newly zeroized) switch boots up. DHCP gives it an IP address, and a configuration file. 2/ The configuration file contains some basic config info, and this section:

event-options {
    generate-event {
        staging time-interval 300;
    }
    policy staging {
        events staging;
        then {
            execute-commands {
                commands {
                    "op url ftp://10.0.0.1/slax/ztp.slax";
                }
            }
        }                               
    }
}

3/ That says ‘every 5 minutes, generate an event called “staging”’. Whenever you see that event, run the script at that URL 4/ It is a SLAX script, which is a bit of a shit to work with. The script defines a target version. It then checks “am I already running that version? If so, remove the event policy, and quit.”. It will never run again. If it is not running the right version, it will free up disk space using the earlier pkg commands, pull down the new version, install it and reboot.

Once the switch reboots, it re-runs the script every 5 minutes. Assuming the upgrade worked, it should now clean itself up.

Downloading & installing the image takes more than 5 minutes, so the script also sets a flag when it runs, and checks for the presence of that flag, aborting if it is already running.

I use a variation of this script for updating existing systems. All I need to do is push out the event-options config, and the switches will free up space, pull down the new image, and reboot.

It sucks that we have to do this because someone saved $0.50 on a $2,000 switch. This is what happens when people don’t think through the total lifecycle of a device. But using these commands and/or this script will make that hoop-jumping a bit easier.

It does make me wonder though: What’s Juniper Mist doing for EX3400 software upgrades? There’s no magic to what Mist does, underneath it all it probably runs some very similar commands.

Juniper ARP Policer on PTX

2021-08-17T06:00:00+00:00

I’ve written before about the default ARP policer on Juniper MX. It can create some odd failure conditions when you’re connected to noisy networks such as large Internet Exchanges. Junos OS Evolved, as used on platforms like the PTX10003 has low default values for ARP and ICMPv6 ND DDoS protections. It will cause the same problems, but is easier to diagnose and mitigate.

Juniper DDoS Protection

Platforms like MX, QFX, PTX have Control Plane DDoS protections built in. These will automatically rate-limit various traffic types that hit the CPU. This is generally a Good Thing. Certain packet types get punted from the ASIC to the CPU, but the CPU can’t handle anywhere near the traffic levels that the forwarding ASIC can. Send enough special packets to a router, choke the CPU, and you might be able to knock things offline. So having default policies to rate-limit traffic makes sense.

Platform Defaults

Juniper might have “One Junos” but we know it’s not that simple. Behavior varies between platforms. Check these default values for some DDoS protections for different platforms:

Protocol	MX	QFX	PTX
ARP	20,000	500	500
NDPv6	20,000	N/A	500
ICMP	20,000	N/A	500
BGP	20,000	3,000	5,000

Note how the PTX values are much closer to the QFX values than the MX.

Diagnosis and Mitigation

Those PTX ARP & NDPv6 values will cause you problems on a busy IX. This behavior shows up as flapping BGP sessions, especially IPv6 BGP sessions. It can be confusing at first, as you appear to have working connectivity. Most peers are unaffected. You might not pick up on it if you’re not looking at your logs. Exact symptoms will vary, and you will see some neighbors flap more than others.

show ddos-protection violations will show currently violating ddos-protections.

Run show ddos-protection protocols ndpv6 to see current traffic values, if/when it last triggered, and the number of times it has triggered.

Check your syslog server for DDOS_PROTOCOL_VIOLATION_SET entries.

Raising the limits is easy:

lindsayh@ptx> show configuration system ddos-protection
protocols {
    arp {
        aggregate {
            bandwidth 8000;
            burst 8000;
        }
    }
    ndpv6 {
        aggregate {
            bandwidth 8000;
            burst 8000;
        }
    }
}

lindsayh@ptx>

If you have an active violation, it’s useful to run clear ddos-protection protocols states after making changes. Otherwise you have to wait a bit longer for the timer to expire.

But Why so Low?

The PTX platform started life as a high-bandwidth, low-featureset device. Typical use-case was an LSR, where you have P2P links, and low levels of ARP traffic. Picking 500 pps was a reasonable default.

But the PTX featureset has evolved, and now it’s suitable for edge peering. For 100G/400G platforms, the price is much better than MX. So people are starting to deploy it in new places in the network. Being on the leading edge means you’ll hit a few rough edges, bugs, or in this case simply defaults that don’t make sense.

No big deal. I expect that Juniper will change these defaults in the near future. With luck, this will be resolved before you ever hit it.

Juniper i40e NVM Firmware Upgrade

2021-05-20T04:30:00+00:00

Juniper Routing Engines with VM Host need an i40e NVM firmware upgrade. The procedure is a pain in the ass, and documentation is not great. But you can’t avoid the upgrade any more. New Junos versions need the firmware upgrade, and replacement REs will ship with it already installed. Here’s some tips on doing the upgrade.

Background

Newer Juniper Routing Engines use a Linux-based hypervisor, and Junos (still BSD-based) runs as a guest VM. This is mostly transparent for day to day operations. When you do a Junos upgrade, it will upgrade the underlying hypervisor if required.

Upcoming Junos versions ship with a new version of Wind River Linux that needs i40e firmware version 6.01. Older versions used v4.26. You need the new i40e firmware installed first, before you can install the latest Junos versions. You can’t put this upgrade off forever. Sooner or later you’ll want to ugprade to a Junos version that only supports the new firmware. Or you’ll get a replacement RE delivered with new firmware, and you can’t downgrade it.

For the last couple of years, Juniper has been shipping Junos versions that will work with both old & new firmware versions. You need one of these to do the upgrade.

So you need to do something like this:

Upgrade to a recent-ish Junos version (e.g. 18.4R3) that supports old & new firmware.
Install the new firmware
Now you can upgrade to future versions that only support the new firmware.

Firmware Upgrade Overview

As mentioned, the upgrade process is a hassle, especially for dual-RE systems, since you need to do at least 3 reboots per RE. Juniper tells you that you need console access and remote power control. It’s nice to have console access, but you can get away without it. You definitely don’t need remote power control for a dual-RE system, since you can power cycle the CB from the other RE.

Here’s a bit more detailed look at the steps:

Log a TAC case to get the correct jfirmware package for your Junos version. Copy that and the LLDP package (available at TSB17603) to your router, and copy them to the backup RE.
Disable GRES.
Install the jfirmware package, and start the firmware upgrade.
Re-enable GRES and wait for sync to complete.
Fail over to RE1, and reboot RE0. You’ll need to go through 3 reboot cycles. Use power controls to speed it up, or wait a bit longer.
Disable GRES to allow you to install the jfirmware package on RE1. After install, re-enable GRES
Fail over to make RE0 primary. Reboot RE1, go through 3 reboot cycles.
Install the LLDP package on each RE, and reboot them again.
Done - now you’re set up for future Junos upgrades

Tedious, right? There are some ways to make it slightly less painful, and incorporate a Junos upgrade along the way, with just one traffic interruption due to FPC reloads. But it’s a stupid process, and I hope I never have to go through this cycle again.

jfirmware install and reboot

Install the jfirmware package:

root@netboot> request vmhost software add /var/tmp/jfirmware-vmhost-x86-64-...
Verified jfirmware-vmhost-x86-64-18.4R3-S4.2 signed by PackageProductionEc_2018 method ECDSA256+SHA256
[ platform = ; re_name = RE-S-2X00x6 ]
Pushing /packages/db/jfirmware-vmhost-x86-64-18.4R3-S4.2/contents/vmhost-firmware-x86-64-18.4R3-S4.2.tgz to host ...done.
Extracting /packages/db/jfirmware-vmhost-x86-64-18.4R3-S4.2/contents/vmhost-firmware-x86-64-18.4R3-S4.2.tgz ...done.
Preparing... ##################################################
supported for kernel version: 3.10.100-ovp-rt110-WR6.0.0.38_preempt-rt
i40e_pkg ##################################################
Installation of /tmp/i40e_pkg-2.0-0.x86_64.rpm ... done
Installing i40e pkg on host ... done.
Preparing... ##################################################
supported for kernel version: 3.10.100-ovp-rt110-WR6.0.0.38_preempt-rt
bios_pkg ##################################################
Installing /tmp/bios_pkg-1.0-0.x86_64.rpm ... done.
Installing bios pkg on host ... done.
 
 
root@netboot>

This just makes it available for install - you still need to do kick off the install. Note that request system firmware upgrade is a hidden command, and you need to type it out. You can tab-complete the last part:

root@netboot> request system firmware upgrade re i40nvm
Part Type Tag Current Available Status
version version
Routing Engine 0 RE i40e-NVM 7 4.26 6.01 OK
Perform indicated firmware upgrade ? [yes,no] (no) yes
 
Firmware upgrade initiated, use "show system firmware" after vmhost reboot to verify the firmware version
 
root@netboot>

Now flip the master routing engine over, and reboot RE0 from RE1 with request vmhost reboot routing-engine other.

If you watch the console on RE0, you’ll see this:

Initializing platform services:              2.4.3
NVM_version: 4.26
DRV_version: 2.4.3
Need Manual procedure to upgrade i40e firmware revision from 4.26 to  6.01
upgrading i40e firmware .....
 
Intel(R) Ethernet NVM Update Tool
NVMUpdate version 1.30.2.11
Copyright (C) 2013 - 2017 Intel Corporation.
 
Unsupported device found - DeviceId: 153A
Config file read.
Inventory
[00:005:00:00]: Intel(R) Ethernet Controller XL710 for 40GbE backplane
    Flash inventory started
    Shadow RAM inventory started
Warning: VPD is not valid
Alternate MAC address is not set
    Shadow RAM inventory finished
    Flash inventory finished
    OROM inventory started
    OROM inventory finished
[00:005:00:01]: Intel(R) Ethernet Controller XL710 for 40GbE backplane
    Device already inventoried.
Update
[00:005:00:00]: Intel(R) Ethernet Controller XL710 for 40GbE backplane
    Flash update started
|======================[100%]======================|
    NVM image verification started
    Shadow RAM image verification started
|======================[100%]======================|
    Shadow RAM image verification finished
    Flash image verification started
|======================[100%]======================|
    Flash image verification finished
    NVM image verification finished
    Flash update successful
Config file doesn't have any OROM components specified for device 'XL710'. Tool will use current device's combo set for the OROM update.
Config file doesn't have any OROM components specified for device 'XL710'. Tool will use current device's combo set for the OROM update.
Post update inventory
[00:005:00:00]: Intel(R) Ethernet Controller XL710 for 40GbE backplane
    EEPROM inventory started
Alternate MAC address is not set
    EEPROM inventory finished
    OROM inventory started
    OROM inventory finished
[00:005:00:01]: Intel(R) Ethernet Controller XL710 for 40GbE backplane
    Device already inventoried.
Please Power Cycle your system now and run the NVM update utility again to complete the update. Failure to do so will result in an incomplete NVM update.
Upgrade complete please power reboot
You may notify to  power reboot again after reboot if required

At this point you can power-cycle the CB, which will cut power cycle the RE. The first time you do this, you might see that it takes a while before show chassis environment cb 0 shows it as Offline. Once it is Offline, bring it back online again.

root@netboot> request chassis cb offline slot 0
Offline initiated, use "show chassis environment cb" to verify
 
root@netboot> show chassis environment cb 0
CB 0 status:
  State                      Offline
  Power 1                    Disabled
  Power 2                    Disabled
 
root@netboot> request chassis cb online slot 0
Online initiated, use "show chassis environment cb" to verify
 
root@netboot>

If you’re watching the console, you’ll see very similar messages to last time.

What if you don’t have a console server? How do you know when to power cycle the RE? If you leave it long enough, the routing engine will boot on its own. When you see it back up, power cycle it again.

Power cycle the CB again. Eventually you’ll see this:

i40e firmware revision is the latest to 6.01

Fortville NVM Firmware Version: 6.01

Host reboot is required to load compatible i40e driver

You don’t need to do anything at this point. Eventually it will carry on to load Junos.

You can check that the new firmware shows up:

root@netboot> show system firmware
Part Type Tag Current Available Status
version version
Routing Engine 0 RE BIOS 0 0.53.1 0.55.2 OK
Routing Engine 0 RE FPGA 1 41.0.0 41.0 OK
Routing Engine 0 RE SSD1 5 12028 12050 OK
Routing Engine 0 RE SSD2 5 12028 12050 OK
Routing Engine 0 RE i40e-NVM 7 6.1 6.01 OK
Routing Engine 1 0 0.0.0 0 OK

root@netboot>

Yes. Yes they did mix up “6.1” and “6.01”. Don’t worry about it. You’re running the right version.

Of course you’re not done yet, now you need to load the LLDP fix, and go through the whole process again with the other RE.

LLDP package install

On each RE, run this:

root@netboot> request system software add /var/tmp/lldp-patch-for-i40e-upgrade.tgz
Verified lldp-patch-for-i40e-upgrade signed by PackageDevelopmentEc_2018 method ECDSA256+SHA256
[ re_name = RE-S-2X00x6 ]
Pushing script(s) to host ...
Install the script(s) under host-os....
Script(s) copy done.

root@netboot> show version | match lldp
lldp-patch-for-i40e-upgrade

root@netboot>

Note that it’s request system software add ..., not request vmhost software add ...

Then one more reboot of each RE, but just a regular reboot, with no need for any power cycling. When the system is back up, you’ll see a message like this when loggging in:

FreeBSD/amd64 (netboot) (ttyu0)

login: root
Password:
Last login: Sun Oct 4 20:49:38 on ttyu0

--- JUNOS 18.4R3-S4.2 Kernel 64-bit JNPR-11.0-20200618.2bc7e35_buil
At least one package installed on this device has limited support.
Run 'file show /etc/notices/unsupported.txt' for details.
root@netboot:~ #

Don’t worry about it. Juniper TAC assures me that this is still a supported configuration.

Combining with Junos upgrade

Can I combine this with a Junos upgrade, and how do I minimise the number of FPC restarts, so I minimise user disruption?

Yes, if you pay attention to what you’re doing, and you give yourself a little more time. You can do the firmware upgrades, LLDP fix and Junos upgrade in one change window, with only one disruptive FPC restart.

Here’s how I would do it:

Disable GRES, and install the jfirmware package on RE0, but don’t reboot yet.
Re-enable GRES and wait for the REs to sync
Fail over to RE1 - this should be seamless.
Go through the 3 reboot cycles on RE0 to get the i40e firmware done
Install the LLDP package on RE0
Install the new Junos version on RE0 and reboot
Install the new jfirmware package on RE1.
Fail over to RE0. The new Junos version will take over, and all FPCs will restart. This is disruptive
Go through the reboot cycles on RE1. When i40e is done, install the LLDP package, followed by the new Junos version
Re-enable GRES

What about the LLDP fix with the new Junos version? No need to re-install it. Depending on which version you upgrade to, it will either still be there as a separate package, or it will be integrated into the main codebase.

The only good thing about this process? It worked on every RE I upgraded.

Hope this helps, and hope I never go through this again.

Juniper Direct vs Local Routes

2020-05-31T00:30:00+00:00

Juniper routers consider a directly configured IP as a “direct” route, except when you use a /32 mask (for IPv4). Then it is a “local” route. This caused me some confusion when creating a policy to redistribute loopback IP addresses into BGP.

Route Protocol Types

A router learns routes from a variety of sources - networks configured on the box, those learned from IS-IS, rumors of prefixes from BGP or RIP, etc. You can see the full list here.

When routes are learned from different sources, Junos uses “Route Preference Values” to decide which route source to prefer. (Cisco refers to this as Administrative Distance). If routes are otherwise identical, the route with the lowest preference will be installed into the FIB.

If you’re looking at the route table, you can narrow down displayed routes to look at a specific type, e.g. show route protocol direct to see locally connected networks:

vagrant@vqfx> show route protocol direct

inet.0: 7 destinations, 7 routes (7 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.0.2.0/24        *[Direct/0] 00:02:45
                    >  via em0.0
10.1.2.0/24        *[Direct/0] 00:00:59
                    >  via xe-0/0/0.0
169.254.0.0/24     *[Direct/0] 00:02:49
                    >  via em1.0

inet6.0: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

fe80::205:860f:fc71:8500/128
                   *[Direct/0] 00:02:49
                    >  via lo0.0

{master:0}
vagrant@vqfx>

Route Filtering by Protocol

It’s not just about displaying routes, or selecting which route to prefer. We can also use the route type when filtering, to decide which routes we want to redistribute. Let’s say we wanted to redistribute static routes (and only static routes) into OSPF. Something like this does the trick:

[edit]
set policy-options policy-statement export-ospf term statics from protocol static
set policy-options policy-statement export-ospf term statics then accept
set protocols ospf export export-ospf

Route filters can get quite complex, matching on all sorts of things - prefix length, route origin, AS path, etc.

So far so good. What if we wanted to write a filter that would match on loopback addresses?

“Direct” vs “Local”

First a diversion - What’s the difference between “Direct” and “Local” routes?

The docs say this:

direct — Directly connected route

…

local — Local address

OK, so a “direct” route comes from configuring an IP + subnet mask on an interface. If we run set interface xe-0/0/1 unit 0 family inet address 100.100.100.1/24, then the router creates a “direct” route for 100.100.100.0/24 via that interface. It will also create a local entry for 100.100.100.1/32:

vagrant@vqfx> show route 100.100.100.0/24

inet.0: 9 destinations, 9 routes (9 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

100.100.100.0/24   *[Direct/0] 00:00:35
                    >  via xe-0/0/1.0
100.100.100.1/32   *[Local/0] 00:00:35
                       Local via xe-0/0/1.0

{master:0}
vagrant@vqfx>

What About Loopbacks?

What happens when we configure a loopback interface? We almost always configure these with a /32 (or /128) subnet mask. How does it show up in the routing table? Is that a “direct” or a “local” route? Should be a “local” route, right? Turns out it’s not. It’s a direct route:

{master:0}[edit]
vagrant@vqfx# set interfaces lo0 unit 0 family inet address 198.51.100.1/32

{master:0}[edit]
vagrant@vqfx# commit
configuration check succeeds
commit complete

{master:0}[edit]
vagrant@vqfx# run show route 198.51.100.1/32

inet.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

198.51.100.1/32    *[Direct/0] 00:00:08
                    >  via lo0.0

{master:0}[edit]
vagrant@vqfx#

Hmm. Bit odd. What if we used a different prefix length on the loopback? Now it shows up a little differently:

{master:0}[edit]
vagrant@vqfx# delete interfaces lo0 unit 0 family inet address 198.51.100.1/32

{master:0}[edit]
vagrant@vqfx# set interfaces lo0 unit 0 family inet address 198.51.100.1/24

{master:0}[edit]
vagrant@vqfx# commit
configuration check succeeds
commit complete

{master:0}[edit]
vagrant@vqfx# run show route 198.51.100.0/24

inet.0: 11 destinations, 11 routes (11 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

198.51.100.0/24    *[Direct/0] 00:00:05
                    >  via lo0.0
198.51.100.1/32    *[Local/0] 00:00:05
                       Local via lo0.0

{master:0}[edit]
vagrant@vqfx#

It must be something to do with the /32 mask. There’s no need to have both a “direct” and a “local” entry for the same prefix, but the choice of “direct” is surprising, to me at least.

Why Does it Matter?

The reason I noticed this was because I was configuring a policy to redistribute loopbacks into BGP. This was for a leaf-spine network, so I wanted to have the exact same policy configured on all devices. Each system had a /32 address taken from 198.51.100.0/24. OK, this should be easy. Let’s use this config:

set policy-options policy-statement CLOS-OUT term loopbacks from protocol local
set policy-options policy-statement CLOS-OUT term loopbacks from route-filter 198.51.100.0/24 prefix-length-range /32-/32
set policy-options policy-statement CLOS-OUT term loopbacks then accept
set protocols bgp group SPINES export CLOS-OUT

Nope. Doesn’t work. It would work if I had a shorter mask than /32 on my loopbacks, but most people aren’t going to do that.

The network & prefix length matches, but the protocol doesn’t. You have to change it to from protocol direct, and then it works.

Funnily enough, if you use something like test policy CLOS-OUT 198.51.100.1/32, it will tell you that it accepts the prefix, regardless of whether you use from protocol local or from protocol direct. But in practice I found it did not export the routes unless I used from protocol direct. This was on a recent Junos version. Behavior could be version-specific, I have not tested different versions of Junos.

No Big Deal. Just Another Gotcha

Ultimately it’s no big deal. Just one of those random little things that might confuse someone. If you find this through Google, hope it helps :)