Jekyll2023-09-24T22:34:49+00:00https://lkhill.com//feed/index.xmlLindsay Hillautomation, networking, worklifeWhy Single-Port LAGs?2023-09-24T22:00:00+00:002023-09-24T22:00:00+00:00https://lkhill.com//why-single-port-lag<p>I recommend always using LACP for external connections. It will make your life easier, even when you only have a single connection. Here’s why we do it.</p>
<p>If you set up a PNI with AS32590, we will strongly recommend the use of LACP, even for a single link. If you have two PNIs with us, they will each be separate single-member LAGs, because they will be on different routers on our side.</p>
<p>It’s only once you have more than 2 links that we start using LACP in the way most people think of it.</p>
<p>It’s not just us. In Google’s <a href="https://peering.google.com/#/options/peering">Peering Policy</a>, under “Private peering physical connection requirements”, it states</p>
<blockquote>
<p>Link aggregation via LACP is required for all links, including single links</p>
</blockquote>
<p>Ever wondered why that is? What’s the point in setting up a LAG if I only have one link? What does it give me? More lines of config for no operational enhancement? And I thought we should use L3 everywhere anyway?</p>
<p>I can’t speak for Google, only for the way we operate our network. But I’m pretty sure their reasons are similar to ours. The obvious reason is for future growth, but there are operational benefits too.</p>
<h2 id="easy-expansion">Easy Expansion</h2>
<p>Traffic volumes only go one way: up and to the right. If I have an existing 2 x 1 x 100G PNIs to you, and we need more capacity, it’s easy - add another port to each LAG. No new IPs needed, no BGP changes.</p>
<p>I can order the cross-connect and patching, and preconfigure my port to be part of the LAG. Then when you patch your end, the new link starts working straight away, no further changes needed.</p>
<p>LAGs can also help when going from <em>n</em> x 10G to <em>n</em> x 100G. <code class="language-plaintext highlighter-rouge">link-speed mixed</code> seems like the Devil’s Work the first time you see it, but it is very useful. Set that option on a 2 x 10G LAG today. When you need to upgrade to 100G, you can add a 100G link to the bundle. It will start balancing traffic, and then you can remove the 10G links, no BGP flaps or changes.</p>
<h2 id="operational-benefits">Operational Benefits</h2>
<p>The growth part is obvious. But there’s other operational benefits. Consider a real example I am dealing with today. I have a flapping 100G port on a Juniper PTX. The issue only starts with 21.4R3, and only when the remote end is a CFP2 optic, but not all CFP2 optics.</p>
<p>ATAC wants me to move the connection from port 0/0/4 (a QSFP28 port) to 0/0/3 (a QSFP56-DD port). Yes, we are clutching at straws, but this one has been hard to reproduce in the lab, so we’re eliminating one more thing.</p>
<p>The router is at a remote site. I need to log a ticket to get remote hands to move the cross-connect. How can I do it with the shortest outage? I’d like to copy the IP address from port 0/0/4 to 0/0/3. That way when the patch cable gets moved, everything comes up:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
</pre></td><td class="rouge-code"><pre>lindsayh@ptx# show interfaces et-0/0/4
description "Transit: Potatotel AS64497 [100Gbit]"
unit 0 {
family inet {
address 198.51.100.122/31;
}
family inet6 {
address 2001:db8:aa00:15::1/127;
}
}
{master}[edit]
lindsayh@ptx# copy interfaces et-0/0/4 to et-0/0/3
{master}[edit]
lindsayh@ptx# show | compare
[edit interfaces]
+ et-0/0/3 {
+ unit 0 {
+ family inet {
+ address 198.51.100.122/31;
+ }
+ family inet6 {
+ address 2001:db8:aa00:15::1/127;
+ }
+ }
+ }
</pre></td></tr></tbody></table></code></pre></div></div>
<p>But I can’t do that:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="rouge-code"><pre>{master}[edit]
lindsayh@ptx# commit check
re0:
[edit interfaces et-0/0/4 unit 0 family inet]
'address 198.51.100.122/31'
identical local address found on rt_inst [default], intfs [et-0/0/3.0 and et-0/0/4.0], family [inet].
error: configuration check-out failed
{master}[edit]
lindsayh@ptx#
</pre></td></tr></tbody></table></code></pre></div></div>
<p>I need to schedule a time to co-ordinate the move with Equinix, or accept it will be down between when they move it and I’m next online.</p>
<p>If we had used a single-member LAG, it would be easy. Just run <code class="language-plaintext highlighter-rouge">set interfaces et-0/0/3 ether-options 802.3ad ae4</code>. When they move the cable, everything will work. Later I can clean up the LAG config from the old port.</p>
<p>For your own internal links between devices, you might choose to make them all independent L3 links, and that’s OK. That may be the best choice. But any connection to a third party, e.g. PNIs, IXPs, Transit links, you should default to always using LACP, even if you only have one link. A couple of extra lines of config today will save you time later.</p>lindsayI recommend always using LACP for external connections. It will make your life easier, even when you only have a single connection. Here’s why we do it.Enforcing First AS in BGP2023-04-17T01:00:00+00:002023-04-17T01:00:00+00:00https://lkhill.com//enforce-first-as<p>The BGP RFCs state that external BGP peers should insert their own AS into the AS PATH advertised to eBGP peers. Some peers strip their AS, generally for commercial gain. Juniper and Cisco have opposite default behaviors for handling this. Make sure you set <code class="language-plaintext highlighter-rouge">bgp enforce-first-as</code> on Juniper routers. Caveats apply.</p>
<h2 id="background-traffic-anomalies">Background: Traffic Anomalies</h2>
<p>A few years ago I was looking at some traffic reporting anomalies. My IPFIX data said that traffic with next-hop AS <dodgy-AS> was around 3Gb. But my SNMP data showed that a PNI to that peer was doing 8-10Gb.</p>
<p>I first doubted my router, because I had issues with IPFIX in the past on that specific platform. I also wondered about sampling rates. I have high flow rates, and need to set the sampling to be more coarse. But it was a big anomaly.</p>
<p>Slicing & dicing the data different ways, and chatting to colleagues about it, we saw what was going on. IPFIX showed the right volumes when reporting on destination interface. But some prefixes received from the peer did <em>not</em> contain the peer’s AS. We still accepted them.</p>
<p>Huh? Isn’t it normal behavior, to insert your own AS into any prefixes you advertise to external peers? It is a key part of BGP loop prevention. Why did my router accept those prefixes? What gives?</p>
<h2 id="always-check-the-rfc">Always Check the RFC</h2>
<p>When in doubt, always start with the RFC. They are very readable, and this is exactly the sort of behavior they should define.</p>
<p><a href="https://www.rfc-editor.org/rfc/rfc4271#section-5.1.2">RFC 4271 section 5.1.2</a> states that</p>
<blockquote>
<p>When a BGP speaker originates a route then:</p>
<p>a) the originating speaker includes its own AS number in a path segment, of type AS_SEQUENCE, in the AS_PATH attribute of all UPDATE messages sent to an external peer.</p>
</blockquote>
<p>Note that there is no “SHOULD”, “MAY” or “OPTIONAL” about it.</p>
<h2 id="legitimate-exception-route-servers">Legitimate Exception: Route Servers</h2>
<p>Route Servers are a specific, legitimate exception to the above. <a href="https://www.rfc-editor.org/rfc/rfc7947#section-2.2.2.1">RFC 7947 Section 2.2.2.1</a> states:</p>
<blockquote>
<p>As a route server does not participate in the process of forwarding data between client routers, and because modification of the AS_PATH attribute could affect the route server client BGP Decision Process, the route server SHOULD NOT prepend its own AS number to the AS_PATH segment nor modify the AS_PATH segment in any other way.</p>
</blockquote>
<p>Almost all IXPs operate this way today. I peer with a handful that don’t, and they annoy me. HKIX is one. PIT Chile changed default behavior this year.</p>
<h2 id="why-would-you-strip-your-as">Why would you strip your AS?</h2>
<p>Route servers have a legitimate reason to not insert their AS. Why else would a network do it?</p>
<p>There are other use-cases where you need to manipulate the advertised path, e.g. AS migration. See <a href="https://lostintransit.se/2012/08/13/bgp-local-as-command/">Daniel’s blog</a> for examples.</p>
<p>What about less than legitimate use-cases?</p>
<p>Imagine that I operate a CDN with extensive peering and transit connections.</p>
<p>And let’s say that you operate an eyeballs ISP, with two upstream providers. Your upstream providers charge you on a traffic volume basis. They in turn have transit agreements with other operators, and peer at IXPs. They might have bi-lateral peering at IXPs or PNIs with me.</p>
<p>All else being equal, if I have identical relationships with those networks, I will split traffic to you across them.</p>
<div class="alert alert-info" role="alert"><i class="fas fa-info-circle"></i> Disclaimer: BGP is a suggestion framework, not a proscribed routing protocol like OSPF. I can and do route traffic according to my business needs. Your routing suggestions are just that: suggestions.</div>
<p>Now what if one of your transit networks is a bit shady, and wants to maximize traffic going via their network? They have two levers to pull: Either announce more specifics for your prefixes, or strip their own AS. I ignore any other BGP attributes. Announcing more specifics has other issues, and may not be possible. But they can strip their AS, and hope that I don’t notice.</p>
<p>Now I have two paths, with everything equal except the AS path length. Default BGP best path selection will make me send traffic via that provider.</p>
<p>Or what if I peer with both you and your transit provider at an IX? I see two paths, with the same AS path length. I will split my traffic between direct to you, and via that transit. That is not cool. If you’re at the IX, I should send everything direct to you.</p>
<div class="alert alert-info" role="alert"><i class="fas fa-info-circle"></i> AFAIK, this only works with bi-lateral sessions. All route servers drop announcements where the first AS in the path is not the advertising peer’s AS.</div>
<h2 id="juniper-vs-cisco-behaviour">Juniper vs Cisco behaviour</h2>
<p>Juniper and Cisco take different approaches to this. By default, Cisco will only accept prefixes where the first AS matches the eBGP peer AS. You can disable this using the <a href="https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/iproute_bgp/command/irg-cr-book/bgp-a1.html#wp1026344430">bgp enforce-first-as</a> command.</p>
<p>Juniper <em>allows</em> peers to strip their AS by default. You must explicitly set <a href="https://www.juniper.net/documentation/us/en/software/junos/bgp/topics/ref/statement/enforce-first-as-edit-protocols.html">enforce-first-as</a>.</p>
<p>For a typical IXP scenario, if you have a Cisco router, you need to configure “no bgp enforce-first-as” for route server sessions. With a Juniper router you must set “enforce-first-as” for all sessions <em>except</em> the route server sessions. There is no equivalent to “no enforce-first-as” on Junos.</p>
<h2 id="effects-of-enforcing-first-as">Effects of enforcing first-AS</h2>
<p>When enforced, both Cisco and Juniper will discard any prefixes received where the first AS in the path is not the peer’s AS. They maintain the BGP session, and they will accept any other valid prefixes received.</p>
<p>Summary: If you peer at IXPs, and use Cisco gear, you’re OK. If you’re using Juniper, check that your config is enforcing first AS for all sessions except those from route servers.</p>lindsayThe BGP RFCs state that external BGP peers should insert their own AS into the AS PATH advertised to eBGP peers. Some peers strip their AS, generally for commercial gain. Juniper and Cisco have opposite default behaviors for handling this. Make sure you set bgp enforce-first-as on Juniper routers. Caveats apply.Juniper Version Selection2023-04-06T03:00:00+00:002023-04-06T03:00:00+00:00https://lkhill.com//juniper-version-selection<p>Picking the right Junos version is important. If you’re not familiar with Juniper, finding and downloading the right software package is confusing. Here’s some guidance on picking the right version.</p>
<div class="alert alert-success" role="alert"><i class="fas fa-check-square"></i> TLDR: Check the Suggested Releases page, find the latest service release in the suggested version for your platform. Almost never use the very latest version. Never use the version the box shipped with. Pay attention on the Downloads page, there are traps.</div>
<p>It’s useful to understand Junos version numbering, and the upgrade policy. Then check the <a href="https://supportportal.juniper.net/s/article/Junos-Software-Versions-Suggested-Releases-to-Consider-and-Evaluate?language=en_US">Suggested Releases</a> page to see what they recommend, check if that makes sense, and figure out how to get from here to there.</p>
<h2 id="understanding-version-numbering">Understanding Version Numbering</h2>
<p>These days Juniper publishes a new release train every quarter. Versioning is simple “<year>.<quarter>.R<release number>”. So 21.4R1 is released in the 4th quarter of 2021. New releases add new features and support new hardware. Configs may break</p>
<p>They then publish “service releases” on top of that, for example 21.4R1-S1 and 21.4R1-S2. These are supposed to only be bugfixes, but complacency breeds contempt. So sometimes they throw in throw in breaking changes that may render your existing config non-bootable, because why the hell not? Be grateful if they’re documented, like the <a href="https://supportportal.juniper.net/s/article/21-4R2-S1-EVO-IPv6?language=en_US">payload-protocol vs next-header</a> change.</p>
<p>A few months later they publish an “R2” release that rolls up bugfixes, and may have some small changes in behavior. No more service releases to the R1 train after that. A few more service releases, then they introduce an R3 release. Again rolls up bugfixes, perhaps with small behavior changes. They might add some more service releases on R2 after R3 comes out. I wish they wouldn’t.</p>
<p>The R3 release will see service releases over the course of its lifetime, becoming further and further apart. No R4. Many engineers look for R3 releases before considering upgrades.</p>
<div class="alert alert-success" role="alert"><i class="fas fa-check-square"></i> Pre 2017 versions, e.g. 12.x and 15.x had their own thing going on with release numbering, but you can ignore that if you’re working on supported hardware.</div>
<h3 id="extended-end-of-life">Extended End of Life</h3>
<p>Juniper went a bit funny with “Extended End of Life” versions for a while. Their docs were full of references to those versions, but they were all years out of date. I couldn’t predict which versions would be considered extended. They fixed this in the last couple of years. Now it’s obvious - the “even” numbered releases such as 21.2, 21.4 are “Extended” - they have 3 years of Engineering support after first release. The odd-numbered releases like 21.1 and 21.3 have 2 years of support.</p>
<p>This is 3 years from the first R1 release. Given that the R3 release comes out around 9 months after R1, and you’ll wait for a couple of service releases on top of that, it’s often a year from release when you install it. So you’ll want to be on the extended end of life releases, to give yourself a couple of years of support.</p>
<p>The <a href="https://support.juniper.net/support/eol/software/junos/">Junos OS Dates & Milestones</a> is your go-to page to check support lifecycles. Some releases will get even longer support, e.g. for specific EoL hardware.</p>
<h2 id="jtac-suggested-releases">JTAC Suggested Releases</h2>
<p>Juniper has a great page “<a href="https://supportportal.juniper.net/s/article/Junos-Software-Versions-Suggested-Releases-to-Consider-and-Evaluate?language=en_US">Junos Software Versions - Suggested Releases to Consider and Evaluate</a>.” That page has a section for each of the main product lines, and model-specific guidance. Find your model, and check the listed versions, and “last updated” date.</p>
<div class="alert alert-info" role="alert"><i class="fas fa-info-circle"></i> They used to call it “Recommended” versions rather than “Suggested,” toning it down a little. It’s still your first and best place to start when picking a version.</div>
<p>For example, if you’re updating an EX3400, it says “Latest 21.4R3-Sx”, last updated 14 Nov 2022. Straightforward - find the latest available service release for the 21.4R3 release train. Last updated a few months ago, so it’s solid advice.</p>
<p>Yes, these versions will still have bugs. There’s no guarantee they are perfect. But if you don’t have any specific requirements, they are always a solid choice. They are mostly conservative, but there will be no surprises from JTAC if you log a case against them. They will almost always be extended end of life releases, except for specific hardware support.</p>
<p>You should bookmark that page, and sign up for change notifications.</p>
<h2 id="ok-but-what-about-">OK, But What About <?></h2>
<p>What if I don’t like the suggested release? What if I need feature <X>? Or what if they have multiple suggested versions, e.g. for MX80 today they suggest 20.2R3, 20.4R3, 21.2R3. Not helpful.</p>
<p>In those cases you’ll need to make your own decision. Remember that Juniper is offering suggested releases - as long as you are still running a supported release, they will still support you. They might question your choice, but if you have a solid rationale, it’s fine.</p>
<p>If you really need feature <X> that is only in the very latest code, that’s what you have to do. If running brand new hardware, you might need a brand-new software release.</p>
<p>My general advice in those situations is:</p>
<ul>
<li>Pick an even-numbered release if possible</li>
<li>Pick the oldest even-numbered version that has the features you need</li>
<li>Take the latest service release in that train</li>
</ul>
<p>If there are multiple suggested versions, look at the future support lifetime, and pick something you’re comfortable with. E.g. 20.2R3 is about to go end of support. That’s a poor choice if you’re upgrading today. In my case I would pick the newest branch, unless I had wide deployment of an older branch and was sticking with it for a while.</p>
<h2 id="whats-this-about-hardened-releases">What’s this about hardened releases?</h2>
<blockquote>
<p>I saw a reference to hardened releases somewhere - what about those?</p>
</blockquote>
<p>My advice is to ignore those. As best as I can tell, Juniper decided to do “hardened” releases for specific use-cases, e.g. EVPN-VXLAN, or VCF. But they did a poor job of explaining what a hardened release is, and have not kept their documentation up to date. You can see traces of it in this <a href="https://pathfinderstatic.s3-us-west-2.amazonaws.com/public/DC-Architecture-Hardened-release-recommendation.pdf">PDF</a>, but I have not seen any clear public definitions. If you go to Pathfinder and click on <a href="https://apps.juniper.net/home/segment?segment=Cloud%20Providers&subSegment=Data%20Center">Data Center</a>, then select an architecture, e.g. <a href="https://apps.juniper.net/home/architecture?type=Data%20Center%20Architectures&name=mc-lag&segment=Cloud%20Providers&subSegment=Data%20Center">MC-LAG</a>, you’ll see a “hardended release.” At the time of writing, it says that is 20.2R3. Installing a release today that goes End of Engineering in June this year is a bad move. Don’t do that.</p>
<p>The “Suggested Releases” page has recommendations for use-cases for the QFX platforms. At 2023/03/30, for MC-LAG (QFX5K) it says “Latest 21.4R3-Sx / 22.2R3”. Those are better choices. At the time of writing, they suggest the same version for all use-cases. Probably tells us something about what happened to their plans for “hardened” releases.</p>
<h2 id="juniper-upgrade-policy">Juniper Upgrade Policy</h2>
<blockquote>
<p>If you’re trying to get to there, I wouldn’t start from here</p>
</blockquote>
<p>When I upgrade a standalone Arista switch, I copy the new software to it, tell it to use that image, and reload. I don’t care what version I’m on today. With Junos, your current version impacts how you get to your destination. Depending on your current version, you might officially need to go via interim steps to get to your target.</p>
<p>The <a href="https://www.juniper.net/documentation/us/en/software/junos/release-notes/21.3/junos-release-notes-21.3r1/topics/upgrade-downgrade/ex-upgrade-downgrade.html">official policy</a> says:</p>
<blockquote>
<p>For both EOL and EEOL releases, you can upgrade to the next three subsequent releases or downgrade to the previous three releases. For example, you can upgrade from 20.4 to the next three releases – 21.1, 21.2 and 21.3 or downgrade to the previous three releases – 20.3, 20.2 and 20.1.</p>
<p>For EEOL releases only, you have an additional option - you can upgrade directly from one EEOL release to the next two subsequent EEOL releases, even if the target release is beyond the next three releases…For example, 20.4 is an EEOL release. Hence, you can upgrade from 20.4 to the next two EEOL releases – 21.2 and 21.4</p>
</blockquote>
<p>If you do annual upgrades, it’s easy. Going 20.4 to 21.4? Straight shot. What if you have 18.4? Do you really want to go 18.4 -> 19.4 -> 20.4 -> 21.4? Tedious. What if you’ve found an old box on the shelf and want to use that? Junos upgrades and reboots are very slow, do I have to go through all that? Can’t I skip some steps?</p>
<p>If you do a USB/TFTP install, you can go straight to any version. If you’re doing this online…it is very version- and platform-dependent. I have my own experience for what steps I can do. For example, I know I can take a QFX5110 from 18.4R2 -> 21.4R3 in one step. I also know that EX3400 18.4R2 -> 20.4 is impossible if it’s too early a service release on the EX3400. I’ve hit similar issues on MX, where some steps were too large. If you have a large fleet to upgrade, try it out on a some low risk systems. Generally it fails at install time. If you have a handful of systems, then sticking to the guidance is safer.</p>
<p>Some platforms specifically call out supported large jumps, e.g. SRX 15.1 -> 19.4R3. If Juniper says it’s OK, then go for it. Otherwise test first.</p>
<h2 id="downloading-the-right-software">Downloading the right software</h2>
<p>OK, we’ve figured out what version we want, and any interim releases we need. Let’s download it. Should be simple right? Yes and no. It’s easy once you know the tricks, but hard at first.</p>
<p>Let’s say we have a QFX5100, and we want to run the latest 21.4R3-Sx release, as suggested.</p>
<p>Start with the <a href="https://support.juniper.net/support/downloads/">Downloads</a> page, and put QFX5100 into the search box:</p>
<p><a href="/assets/2023/04/qfx5100_download.jpg"><img src="/assets/2023/04/qfx5100_download.jpg" alt="QFX5199 Download" /></a></p>
<p>uh…do I want QFX 5e Series Switch, QFX 5e Series Switch with Enhanced Automation, Limited - MacSec Enabled QFX 5 Series, Limited - QFX 5 Series Switch with Enhanced Automation, QFX 5 Series Switch, QFX 5 Series Switch with Enhanced Automation?</p>
<p>Turns out you don’t want any of those. The first page of results is a trap. It only has the “R” releases. You almost never want a base “R” release, you want a Service Release, e.g. R3-S4.</p>
<p>Scroll up, go to the drop-down, select “Junos SR”, and make sure 21.4 is the selected version. Now we see the service releases, in reverse chronological order. 21.4R3-S3 is the latest at release time. But there’s 6 variations. Which one do I want?</p>
<p>As a general rule, you never want a “Limited” release unless you’re in specific restricted countries. If you’re in one of those places you’ll know about this, if you’ver never heard of it you can ignore it. So now the choices are 5 vs 5e, and with or without “Enhanced Automation.” If you don’t know, choose “Enhanced Automation” - it will help you later. Read <a href="https://www.juniper.net/documentation/us/en/software/junos/automation-scripting/topics/concept/junos-flex-overview.html">this</a> for more info. Last thing, 5 vs 5e? For QFX5110 and QFX5120, you must use the “5e” image, and it will be the only option you see. For QFX5100 you can install 5 or 5e images. This is a trap. Do not install 5e unless you know exactly why you’re doing it. Get the “5” image.</p>
<p>Other products like the PTX10001-36MR are much simpler. There are just one or two download variations.</p>
<p>One last thing before downloading that file. Make sure you are selecting the file from the “Install Package” section</p>
<p><a href="/assets/2023/04/install_package.jpg"><img src="/assets/2023/04/install_package.jpg" alt="QFX5199 Download" /></a></p>
<p>Do not download the file from the “Install Media” section unless you are trying to create a USB image. For a while the default page for some products opened to the Install Media section. Made me download the wrong file quite a few times.</p>
<h3 id="what-about-junos-vs-junos-evolved">What about Junos vs Junos Evolved?</h3>
<p>Go to the downloads page for PTX10003, and you’ll get this dropdown:</p>
<p><a href="/assets/2023/04/ptx1003_download.jpg"><img src="/assets/2023/04/ptx10003_download.jpg" alt="PTX10003 Download" /></a></p>
<p>Can I choose if I want Junos Artisanal or Junos Evolved? No, this is just an artifact of the way they publish OpenConfig models. Those are listed under Junos, but the software you want to download is under “Junos Evolved SR.” Do not ask me why.</p>
<p>Once you’ve finally tracked down the right software, click on the package, login if required, accept the license, and download it. Then copy it to your switch, and install it. Exact methods vary by platform, start <a href="https://www.juniper.net/documentation/us/en/software/junos/junos-install-upgrade/topics/topic-map/install-software-on-ex.html#id-installing-software-on-an-ex-series-switch-with-a-single-routing-engine-cli-procedure">here</a> for common methods. Install it, reboot, and cross your fingers!</p>lindsayPicking the right Junos version is important. If you’re not familiar with Juniper, finding and downloading the right software package is confusing. Here’s some guidance on picking the right version.New Juniper Rack Mount Kit2023-04-03T03:00:00+00:002023-04-03T03:00:00+00:00https://lkhill.com//juniper-rack-mount-kit<p>Juniper has a new enhanced four-post rack mount kit “JNP-4PST-RMK-1U-E” for their 1RU datacenter devices. It works with devices like the QFX5120 and PTX10001-36MR. It is much improved over the legacy rack mount kit. It are not as good as some competitors, but it is backwards compatible. It makes switch installation quicker and safer.</p>
<h2 id="background-current-4-post-rail-kit">Background: Current 4-post rail kit</h2>
<p>Juniper has used the same 4-post kit for their 1RU datacenter switches and routers for many years. The same kit works on QFX5100, QFX5110 and QFX5120-48Y switches. The MX204 uses a slight variation, but is almost identical. Oddly, the QFX5120-32C uses something completely different. Devices are secured to the front and rear posts. 2-post mounting is unwise for modern deep devices with heavy PSUs. You can still get away with 2-post mounting for lighter, shallower access switches. Modern servers and deep switches/routers need 4-post mounting, or some sort of shelf.</p>
<p>The current kit <a href="https://www.juniper.net/documentation/us/en/hardware/qfx5120/topics/topic-map/qfx5120-unpack-mount.html#id-flush-mounting-qfx5120-48y-qfx5120-48t-on-four-posts-of-a-rack-or-cabinet">“EX-4PST-RMK”</a> has 2 parts per side:</p>
<p><a href="/assets/2023/04/legacy_kit.jpg"><img src="/assets/2023/04/legacy_kit.jpg" alt="Legacy rail kit" /></a></p>
<p>One piece screws in to each side of the switch. Note that there are 8 holes per side, but Juniper supplies a total of 12 very small screws. As you can imagine, installing 12 very small screws per switch is no fun when you have a stack of 50. The other pieces of the rail kit mounts from the rear, to connect the switch to the rear posts.</p>
<p>The switch installs from the front, and screwed into…oh. Wait. Yes, you will need to install 8 cage nuts first (not supplied). Make your blood sacrifice to the networking gods.</p>
<p><a href="/assets/2023/04/legacy_kit_ears.jpg"><img src="/assets/2023/04/legacy_kit_ears.jpg" alt="Legacy kit front ears" /></a></p>
<p>Having installed the cage nuts, now you can supply your own screws, and screw in the front…no…wait. What’s going to hold this heavy switch up while the screws are going in? The documentation says you need two people for this step. But we all know that installation doesn’t work that way.</p>
<p><a href="/assets/2023/04/one_hand_balance.jpg"><img src="/assets/2023/04/one_hand_balance.jpg" alt="One handed balancing act" /></a></p>
<p>Can we pre-mount the rear rails, then slide the switch in, so it’s supported from the rear while we screw in the front? No chance. The rear rails are flimsy pieces of metal that wilt when you take them out of the box. Any slight bend and they bind up when sliding into the front rails. There’s no safe way to lift a switch in by yourself, slide it onto the rear rails, then screw in the front.</p>
<p>Your choices are: find a helper, install a temporary server below the switch, or: <a href="https://patchbox.com/setup-exe-installation-tool/">Patchbox setup.exe</a>:</p>
<p><a href="/assets/2023/04/patchbox_setupexe.jpg"><img src="/assets/2023/04/patchbox_setupexe.jpg" alt="Patchbox Setup.exe" /></a></p>
<p>This acts as another pair of hands. Now we can screw in the front, then install the rails at the rear. Note how easy they flex. Force them in, screw in to the rack, go back and tighten the front screws, and remove the setup.exe.</p>
<p><a href="/assets/2023/04/legacy_kit_bend.jpg"><img src="/assets/2023/04/legacy_kit_bend.jpg" alt="Flexible rails" /></a></p>
<p>This kit works, but it is dangerous for one person, and it wastes time and money. Juniper uses the same kit for the new PTX10001-36MR, a very dense 1RU router with 3kW PSUs. Even better, for the PTX10001-36MR, 4 of the little screws are different to the others. No guidance on which ones to use where. Those are very small screws and very flimsy rails for a system that weighs 18kg.</p>
<h2 id="the-industry-and-the-competition">The Industry and the Competition</h2>
<p>If you’ve spent your career installing network gear, you might assume “that’s the way it is.” Or it’s a challenge, see who can do the difficult task as quickly as possible. Or you take any suggested improvement as some sort of challenge to your engineering chops. Check <a href="https://mailman.nanog.org/pipermail/nanog/2021-September/215416.html">this NANOG thread</a>. Yes, I know, be very careful reading NANOG. A few quotes:</p>
<blockquote>
<p>A 30-minute time to install a regular 1U ToR switch seems a bit excessive. Maybe the very first time a tech installs any specific model switch with a unique rail configuration. After that one, it should be around 10 minutes for most situations.</p>
</blockquote>
<p>I’m an idiot for being so slow:</p>
<blockquote>
<p>30 minutes to pull a switch from the box stick ears on it and mount it in the rack seems like a realllllly long time. I think at tops that portion it that’s a 5-10 minute job if I unbox it at my desk</p>
</blockquote>
<p>Can someone explain to me how to install 8 x cage nuts, 12 x small screws, mount the device, and install the 8 x cage nut screws in 10 minutes? I’m sorry, I’m calling bullshit. Someone else claims to be even faster:</p>
<blockquote>
<p>it usually takes about 3 minutes</p>
</blockquote>
<p>And somehow, the tool-less kits are slower and more difficult? Why has no-one told the server engineers this?</p>
<blockquote>
<p>Those speed rails as well are a bit of a challenge to install</p>
</blockquote>
<p>I’m sorry what? No they are not. Not even close to a challenge to install compared to typical network vendor rails. I can install and remove Dell server rails <em>from the front of the rack</em> without even needing walk around the back. They are explicitly designed to be quick and easy to work with, not “as cheap as possible.”</p>
<p>Other comments were along the lines of “considering overall lifetime TCO, it’s not a deal-breaker.” That I can understand. It’s not going to be the sole factor deciding on vendor <em>X</em> over <em>Y</em>. Some folks said they only rack a few switches once a year. Yes, for you, it doesn’t matter. For those of us doing this more than a few times, it does matter. “My switches run 15 years with no hardware replacements.” Some of have let go of CatOS. There was no network hardware on the market 15 years ago that meets my needs today.</p>
<blockquote>
<p>I can install an entire 384lb 21U core router in 30 minutes.</p>
</blockquote>
<p>Yeah, well, good for you pal. My father walked uphill both ways to school, and on the day it snowed he still had to go to school. But here’s the thing: it doesn’t need to be that way. Just because <em>you</em> had to live with something doesn’t mean the rest of us have to. It doesn’t have to involve installing tiny fiddly screws and delicate balancing acts. There’s no reason other than stubbornness to resist improvement.</p>
<p>Server vendors solved this problem years ago. Dell, HPE, Lenovo have all had 4-post rail kits that work well for years. Yes, they’ve gone through iterations, and yes, they all have little tricks you need to learn to operate them. Yes I have caused a ball bearing explosion in DC11. But they are solid, and much quicker and safer than what network vendors do.</p>
<p>The key difference is that the rails are first installed into the rack, front and rear. They have clips and square lugs to fit into the posts, no cage nuts or screws needed. The server then slides into the rails. Quick, no tools needed.</p>
<p>Not all networking vendors have ignored progress. Arista rail kits work the same as server rail kits. Clip one piece to the side of the switch, install the rails into the rack, clip in, then slide the switch in. They also have adapters for 2-post rail kits. So it can be done. Dell does something similar.</p>
<h2 id="junipers-new-rack-mount-kit">Juniper’s New Rack Mount Kit</h2>
<p>I have been telling Juniper this is a problem for years. Other customers too. Juniper has listened, and developed a new rack mount kit “JNP-4PST-RMK-1U-E.” No public documentation yet, but it is on the pricelist and orderable.</p>
<p><a href="/assets/2023/04/new_kit.jpg"><img src="/assets/2023/04/new_kit.jpg" alt="New kit" /></a></p>
<p>The first thing you’ll notice is there are 3 metal pieces per side. One piece screws onto the switch. Yes, you do still need to use some tools. The mounting points are the same as the old rails, so you can use this on all the same hardware that the old kit works on. The small bag there contains 16 screws! Install them all, or keep some for spares, leave a couple empty, for old time’s sake.</p>
<p>The other front and rear parts attach to the rack with no cage nuts required. The square lugs fit through the hole, and the retainer clip holds it in. The retainer clip design is a little suspect to me, but it is good enough.</p>
<p><a href="/assets/2023/04/new_kit_attach.jpg"><img src="/assets/2023/04/new_kit_attach.jpg" alt="New kit mounting" /></a></p>
<p><a href="/assets/2023/04/new_front_clip.jpg"><img src="/assets/2023/04/new_front_clip.jpg" alt="Front retainer clips" /></a></p>
<p><a href="/assets/2023/04/new_front_closed.jpg"><img src="/assets/2023/04/new_front_closed.jpg" alt="Front clip closed" /></a></p>
<p><a href="/assets/2023/04/new_rear_1.jpg"><img src="/assets/2023/04/new_rear_1.jpg" alt="Rear View" /></a></p>
<p><a href="/assets/2023/04/new_rear_2.jpg"><img src="/assets/2023/04/new_rear_2.jpg" alt="Rear View" /></a></p>
<p><a href="/assets/2023/04/new_rear_3.jpg"><img src="/assets/2023/04/new_rear_3.jpg" alt="Rear View" /></a></p>
<p>Then slide the switch in, and tighten the thumbscrews. Done! No apprentice needed to hold the heavy switch while you faff about dropping cage nut screws.</p>
<p><a href="/assets/2023/04/net_insert.jpg"><img src="/assets/2023/04/new_insert.jpg" alt="Sliding in new switch" /></a></p>
<p><a href="/assets/2023/04/complete.jpg"><img src="/assets/2023/04/complete.jpg" alt="Completee" /></a></p>
<p>Note there are no catches stopping the switch coming out once you undo the front thumbscrews. Most server rails let you slide the server almost all the way out, then you need to undo a catch to remove them all the way. It’s OK to not have one here, where you are not going to be opening the top of the switch while it’s in the rack.</p>
<h2 id="verdict">Verdict</h2>
<p>This new system is a real improvement over the old design.</p>
<p>Pros:</p>
<ul>
<li>Safe for solo installs</li>
<li>No cage nuts needed</li>
<li>Saves time and money</li>
<li>Should handle higher weight devices better</li>
</ul>
<p>Cons:</p>
<ul>
<li>Still need to put the damn little screws in</li>
<li>Watch your fingers, the edges are a little sharp</li>
<li>Need to remember to add to order</li>
</ul>
<p>I’m glad Juniper has listened to feedback. They were falling behind competitors. This will save me time and money. I plan to order these, and hope they are the default option soon. I’d like to see more improvements with new hardware, offering true tool-less install.</p>
<p>And for the curmudgeons who say I’m wrong, the old ways are the best, I am a poor excuse for an engineer, they will stick to the old ways? Good luck to you. You do that. I’ll move with the times. While you’re faffing around in the DC, I’ll go do something more useful.</p>lindsayJuniper has a new enhanced four-post rack mount kit “JNP-4PST-RMK-1U-E” for their 1RU datacenter devices. It works with devices like the QFX5120 and PTX10001-36MR. It is much improved over the legacy rack mount kit. It are not as good as some competitors, but it is backwards compatible. It makes switch installation quicker and safer.EX3400 Disk Space and Upgrades2021-12-21T04:00:00+00:002021-12-21T04:00:00+00:00https://lkhill.com//ex3400-disk-space<p>The Juniper EX3400 switch series is a decent access switch. But a Product Manager chose to save $0.50 on COGS by choosing a 2GB disk. That’s just not enough space to handle normal Junos upgrades. This has wasted untold engineer hours on busywork. I hope that person (A) got a bonus, and (B) is never allowed to under-spec hardware again.</p>
<p>Here’s some tips I’ve learnt for manual and automated upgrades for EX3400s.</p>
<h2 id="manual-upgrades">Manual Upgrades</h2>
<p>Search for “Juniper EX3400 disk space” and you’ll find plenty of people complaining about this, and some suggestions. <a href="https://kb.juniper.net/InfoCenter/index?page=content&id=KB31198Juniper">Juniper KB31198</a> looks like a good place to start. But it starts with <code class="language-plaintext highlighter-rouge">request system storage cleanup</code> and <code class="language-plaintext highlighter-rouge">request system snapshot delete snap*</code>.</p>
<p>Those <em>might</em> work if you’re upgrading from 15.1X -> 18.2. Maybe if you’re lucky it will be enough for upgrades within the 18.4 train. But it almost certainly won’t work if you’re going from 18.4.x -> 20.2.x.</p>
<p>There have been PRs that are supposed to fix this, and they might help around the edges, but they don’t help a lot.</p>
<p>With certain version combinations, you could get away with copying the new verson to <code class="language-plaintext highlighter-rouge">/mfs</code>, and installing from there. It was dependent on your source & target image.</p>
<p>The only method I have found that works is this:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre>lindsayh@ex3400> start shell user root
Password:
pkg setop <span class="nb">rm </span>previous
pkg delete old
<span class="nb">exit</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>This will completely remove the oldest installed image, freeing up plenty of space. These commands have not caused me any problems on the hundreds of systems I have run it on. It would mean that you can’t do a rollback from the current version to the previous version, but this is not a problem. You’re about to install a new version.</p>
<p>It’s not quite as terrifying as my early days of upgrading Cisco 3500s that only enough space for one single image. If the upgrade failed, things went very bad.</p>
<h2 id="ztpautomated-upgrades">ZTP/Automated Upgrades</h2>
<p>The previous method is OK if you only have a handful of systems. It’s not practical if you have a large number of systems, or if you are using ZTP to set up new switches.</p>
<p>Juniper switches that have been zeroized will use DHCP to retrieve their config file, and the image to install. So you could have some config like this on your DHCP server:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre> <span class="k">if</span> <span class="o">(</span> substring <span class="o">(</span>option vendor-class-identifier, 0,14<span class="o">)</span> <span class="o">=</span> <span class="s2">"Juniper-ex3400"</span><span class="o">)</span> <span class="o">{</span>
option ztp-ops.config-file-name <span class="s2">"ztp/conf/ex3400.txt"</span><span class="p">;</span>
option ztp-ops.image-file-name <span class="s2">"ztp/images/ex3400.tgz"</span><span class="p">;</span>
<span class="o">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>With 15.1X systems, this would work. But new EX3400s ship with at least 18.2 code, and you want to run 18.4 or 20.2. Even on a brand new, out of the box system with no logs or other config, there is not enough space to run a regular upgrade. (Does that speak more to Juniper QA, or just how criminal the under-provisioning of hardware was?)</p>
<p>You need a different approach for ZTP. I use a modified version of the <a href="https://github.com/kquilliam/juniper-ztp">juniper-ztp scripts here</a>. It is a little more complicated, with a few moving parts, but here’s how it works:</p>
<p>1/ The new (or newly zeroized) switch boots up. DHCP gives it an IP address, and a <a href="https://github.com/kquilliam/juniper-ztp/blob/master/dhcpd.conf#L30">configuration file</a>.
2/ The configuration file contains some basic config info, and <a href="https://github.com/kquilliam/juniper-ztp/blob/master/configs/access.conf#L33">this section</a>:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
</pre></td><td class="rouge-code"><pre>event-options <span class="o">{</span>
generate-event <span class="o">{</span>
staging time-interval 300<span class="p">;</span>
<span class="o">}</span>
policy staging <span class="o">{</span>
events staging<span class="p">;</span>
<span class="k">then</span> <span class="o">{</span>
execute-commands <span class="o">{</span>
commands <span class="o">{</span>
<span class="s2">"op url ftp://10.0.0.1/slax/ztp.slax"</span><span class="p">;</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>3/ That says ‘every 5 minutes, generate an event called “staging”’. Whenever you see that event, run the script at that URL
4/ It is a <a href="https://github.com/kquilliam/juniper-ztp/blob/master/slax/test.slax">SLAX script</a>, which is a bit of a shit to work with. The script defines a target version. It then checks “am I already running that version? If so, remove the event policy, and quit.”. It will never run again.
If it is <strong>not</strong> running the right version, it will free up disk space using the earlier <code class="language-plaintext highlighter-rouge">pkg</code> commands, pull down the new version, install it and reboot.</p>
<p>Once the switch reboots, it re-runs the script every 5 minutes. Assuming the upgrade worked, it should now clean itself up.</p>
<p>Downloading & installing the image takes more than 5 minutes, so the script also sets a flag when it runs, and checks for the presence of that flag, aborting if it is already running.</p>
<p>I use a variation of this script for updating existing systems. All I need to do is push out the <code class="language-plaintext highlighter-rouge">event-options</code> config, and the switches will free up space, pull down the new image, and reboot.</p>
<p>It sucks that we have to do this because someone saved $0.50 on a $2,000 switch. This is what happens when people don’t think through the total lifecycle of a device. But using these commands and/or this script will make that hoop-jumping a bit easier.</p>
<p>It does make me wonder though: What’s Juniper Mist doing for EX3400 software upgrades? There’s no magic to what Mist does, underneath it all it probably runs some very similar commands.</p>lindsayThe Juniper EX3400 switch series is a decent access switch. But a Product Manager chose to save $0.50 on COGS by choosing a 2GB disk. That’s just not enough space to handle normal Junos upgrades. This has wasted untold engineer hours on busywork. I hope that person (A) got a bonus, and (B) is never allowed to under-spec hardware again.Juniper ARP Policer on PTX2021-08-17T06:00:00+00:002021-08-17T06:00:00+00:00https://lkhill.com//juniper-arp-policer-ptx<p>I’ve written before about the <a href="/juniper-arp-policer/">default ARP policer on Juniper MX</a>. It can create some odd failure conditions when you’re connected to noisy networks such as large Internet Exchanges. <a href="https://www.juniper.net/documentation/us/en/software/junos/evo-overview/topics/concept/evo-overview.html">Junos OS Evolved</a>, as used on platforms like the <a href="https://www.juniper.net/us/en/products/routers/ptx-series/ptx10003-packet-transport-router.html">PTX10003</a> has low default values for ARP and ICMPv6 ND DDoS protections. It will cause the same problems, but is easier to diagnose and mitigate.</p>
<h2 id="juniper-ddos-protection">Juniper DDoS Protection</h2>
<p>Platforms like MX, QFX, PTX have <a href="https://www.juniper.net/documentation/us/en/software/junos/security-services/topics/concept/subscriber-management-ddos-protection.html">Control Plane DDoS protections</a> built in. These will automatically rate-limit various traffic types that hit the CPU. This is generally a Good Thing. Certain packet types get punted from the ASIC to the CPU, but the CPU can’t handle anywhere near the traffic levels that the forwarding ASIC can. Send enough special packets to a router, choke the CPU, and you might be able to knock things offline. So having default policies to rate-limit traffic makes sense.</p>
<h2 id="platform-defaults">Platform Defaults</h2>
<p>Juniper might have “One Junos” but we know it’s not that simple. Behavior varies between platforms. Check these default values for some DDoS protections for different platforms:</p>
<table>
<thead>
<tr>
<th>Protocol</th>
<th>MX</th>
<th>QFX</th>
<th>PTX</th>
</tr>
</thead>
<tbody>
<tr>
<td>ARP</td>
<td>20,000</td>
<td>500</td>
<td>500</td>
</tr>
<tr>
<td>NDPv6</td>
<td>20,000</td>
<td>N/A</td>
<td>500</td>
</tr>
<tr>
<td>ICMP</td>
<td>20,000</td>
<td>N/A</td>
<td>500</td>
</tr>
<tr>
<td>BGP</td>
<td>20,000</td>
<td>3,000</td>
<td>5,000</td>
</tr>
</tbody>
</table>
<p>Note how the PTX values are much closer to the QFX values than the MX.</p>
<h2 id="diagnosis-and-mitigation">Diagnosis and Mitigation</h2>
<p>Those PTX ARP & NDPv6 values will cause you problems on a busy IX. This behavior shows up as flapping BGP sessions, especially IPv6 BGP sessions. It can be confusing at first, as you appear to have working connectivity. Most peers are unaffected. You might not pick up on it if you’re not looking at your logs. Exact symptoms will vary, and you will see some neighbors flap more than others.</p>
<p><code class="language-plaintext highlighter-rouge">show ddos-protection violations</code> will show currently violating ddos-protections.</p>
<p>Run <code class="language-plaintext highlighter-rouge">show ddos-protection protocols ndpv6</code> to see current traffic values, if/when it last triggered, and the number of times it has triggered.</p>
<p>Check your syslog server for DDOS_PROTOCOL_VIOLATION_SET entries.</p>
<p>Raising the limits is easy:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
</pre></td><td class="rouge-code"><pre>lindsayh@ptx> show configuration system ddos-protection
protocols {
arp {
aggregate {
bandwidth 8000;
burst 8000;
}
}
ndpv6 {
aggregate {
bandwidth 8000;
burst 8000;
}
}
}
lindsayh@ptx>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>If you have an active violation, it’s useful to run <code class="language-plaintext highlighter-rouge">clear ddos-protection protocols states</code> after making changes. Otherwise you have to wait a bit longer for the timer to expire.</p>
<h2 id="but-why-so-low">But Why so Low?</h2>
<p>The PTX platform started life as a high-bandwidth, low-featureset device. Typical use-case was an LSR, where you have P2P links, and low levels of ARP traffic. Picking 500 pps was a reasonable default.</p>
<p>But the PTX featureset has evolved, and now it’s suitable for edge peering. For 100G/400G platforms, the price is much better than MX. So people are starting to deploy it in new places in the network. Being on the leading edge means you’ll hit a few rough edges, bugs, or in this case simply defaults that don’t make sense.</p>
<p>No big deal. I expect that Juniper will change these defaults in the near future. With luck, this will be resolved before you ever hit it.</p>lindsayI’ve written before about the default ARP policer on Juniper MX. It can create some odd failure conditions when you’re connected to noisy networks such as large Internet Exchanges. Junos OS Evolved, as used on platforms like the PTX10003 has low default values for ARP and ICMPv6 ND DDoS protections. It will cause the same problems, but is easier to diagnose and mitigate.Juniper i40e NVM Firmware Upgrade2021-05-20T04:30:00+00:002021-05-20T04:30:00+00:00https://lkhill.com//juniper-i40e-upgrade<p>Juniper Routing Engines with VM Host need an <a href="https://kb.juniper.net/InfoCenter/index?page=content&id=TSB17603">i40e NVM firmware upgrade</a>. The procedure is a pain in the ass, and documentation is not great. But you can’t avoid the upgrade any more. New Junos versions need the firmware upgrade, and replacement REs will <a href="https://kb.juniper.net/InfoCenter/index?page=content&id=TSB17978">ship with it already installed</a>. Here’s some tips on doing the upgrade.</p>
<h2 id="background">Background</h2>
<p>Newer Juniper Routing Engines use a <a href="https://www.juniper.net/documentation/us/en/software/junos/junos-install-upgrade/topics/topic-map/vm-host-overview.html">Linux-based hypervisor</a>, and Junos (still BSD-based) runs as a guest VM. This is mostly transparent for day to day operations. When you do a Junos upgrade, it will upgrade the underlying hypervisor if required.</p>
<p>Upcoming Junos versions ship with a new version of Wind River Linux that needs i40e firmware version 6.01. Older versions used v4.26. You need the new i40e firmware installed first, before you can install the latest Junos versions. You can’t put this upgrade off forever. Sooner or later you’ll want to ugprade to a Junos version that only supports the new firmware. Or you’ll get a replacement RE delivered with new firmware, and you can’t downgrade it.</p>
<p>For the last couple of years, Juniper has been shipping Junos versions that will work with both old & new firmware versions. You need one of these to do the upgrade.</p>
<p>So you need to do something like this:</p>
<ol>
<li>Upgrade to a recent-ish Junos version (e.g. 18.4R3) that supports old & new firmware.</li>
<li>Install the new firmware</li>
<li>Now you can upgrade to future versions that only support the new firmware.</li>
</ol>
<h2 id="firmware-upgrade-overview">Firmware Upgrade Overview</h2>
<p>As mentioned, the upgrade process is a hassle, especially for dual-RE systems, since you need to do at least 3 reboots per RE. Juniper tells you that you need console access and remote power control. It’s nice to have console access, but you can get away without it. You definitely don’t need remote power control for a dual-RE system, since you can power cycle the CB from the other RE.</p>
<p>Here’s a bit more detailed look at the steps:</p>
<ol>
<li>Log a TAC case to get the correct jfirmware package for your Junos version. Copy that and the LLDP package (available at <a href="https://kb.juniper.net/InfoCenter/index?page=content&id=TSB17603">TSB17603</a>) to your router, and copy them to the backup RE.</li>
<li>Disable GRES.</li>
<li>Install the jfirmware package, and start the firmware upgrade.</li>
<li>Re-enable GRES and wait for sync to complete.</li>
<li>Fail over to RE1, and reboot RE0. You’ll need to go through 3 reboot cycles. Use power controls to speed it up, or wait a bit longer.</li>
<li>Disable GRES to allow you to install the jfirmware package on RE1. After install, re-enable GRES</li>
<li>Fail over to make RE0 primary. Reboot RE1, go through 3 reboot cycles.</li>
<li>Install the LLDP package on each RE, and reboot them again.</li>
<li>Done - now you’re set up for future Junos upgrades</li>
</ol>
<p>Tedious, right? There are some ways to make it slightly less painful, and incorporate a Junos upgrade along the way, with just one traffic interruption due to FPC reloads. But it’s a stupid process, and I hope I never have to go through this cycle again.</p>
<h2 id="jfirmware-install-and-reboot">jfirmware install and reboot</h2>
<p>Install the jfirmware package:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="rouge-code"><pre>root@netboot> request vmhost software add /var/tmp/jfirmware-vmhost-x86-64-...
Verified jfirmware-vmhost-x86-64-18.4R3-S4.2 signed by PackageProductionEc_2018 method ECDSA256+SHA256
<span class="o">[</span> platform <span class="o">=</span> <span class="p">;</span> re_name <span class="o">=</span> RE-S-2X00x6 <span class="o">]</span>
Pushing /packages/db/jfirmware-vmhost-x86-64-18.4R3-S4.2/contents/vmhost-firmware-x86-64-18.4R3-S4.2.tgz to host ...done.
Extracting /packages/db/jfirmware-vmhost-x86-64-18.4R3-S4.2/contents/vmhost-firmware-x86-64-18.4R3-S4.2.tgz ...done.
Preparing... <span class="c">##################################################</span>
supported <span class="k">for </span>kernel version: 3.10.100-ovp-rt110-WR6.0.0.38_preempt-rt
i40e_pkg <span class="c">##################################################</span>
Installation of /tmp/i40e_pkg-2.0-0.x86_64.rpm ... <span class="k">done
</span>Installing i40e pkg on host ... <span class="k">done</span><span class="nb">.</span>
Preparing... <span class="c">##################################################</span>
supported <span class="k">for </span>kernel version: 3.10.100-ovp-rt110-WR6.0.0.38_preempt-rt
bios_pkg <span class="c">##################################################</span>
Installing /tmp/bios_pkg-1.0-0.x86_64.rpm ... <span class="k">done</span><span class="nb">.</span>
Installing bios pkg on host ... <span class="k">done</span><span class="nb">.</span>
root@netboot>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>This just makes it available for install - you still need to do kick off the install. Note that <code class="language-plaintext highlighter-rouge">request system firmware upgrade</code> is a hidden command, and you need to type it out. You can tab-complete the last part:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
</pre></td><td class="rouge-code"><pre>root@netboot> request system firmware upgrade re i40nvm
Part Type Tag Current Available Status
version version
Routing Engine 0 RE i40e-NVM 7 4.26 6.01 OK
Perform indicated firmware upgrade ? <span class="o">[</span><span class="nb">yes</span>,no] <span class="o">(</span>no<span class="o">)</span> <span class="nb">yes
</span>Firmware upgrade initiated, use <span class="s2">"show system firmware"</span> after vmhost reboot to verify the firmware version
root@netboot>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Now flip the master routing engine over, and reboot RE0 from RE1 with <code class="language-plaintext highlighter-rouge">request vmhost reboot routing-engine other</code>.</p>
<p>If you watch the console on RE0, you’ll see this:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
</pre></td><td class="rouge-code"><pre>Initializing platform services: 2.4.3
NVM_version: 4.26
DRV_version: 2.4.3
Need Manual procedure to upgrade i40e firmware revision from 4.26 to 6.01
upgrading i40e firmware .....
Intel<span class="o">(</span>R<span class="o">)</span> Ethernet NVM Update Tool
NVMUpdate version 1.30.2.11
Copyright <span class="o">(</span>C<span class="o">)</span> 2013 - 2017 Intel Corporation.
Unsupported device found - DeviceId: 153A
Config file read.
Inventory
<span class="o">[</span>00:005:00:00]: Intel<span class="o">(</span>R<span class="o">)</span> Ethernet Controller XL710 <span class="k">for </span>40GbE backplane
Flash inventory started
Shadow RAM inventory started
Warning: VPD is not valid
Alternate MAC address is not <span class="nb">set
</span>Shadow RAM inventory finished
Flash inventory finished
OROM inventory started
OROM inventory finished
<span class="o">[</span>00:005:00:01]: Intel<span class="o">(</span>R<span class="o">)</span> Ethernet Controller XL710 <span class="k">for </span>40GbE backplane
Device already inventoried.
Update
<span class="o">[</span>00:005:00:00]: Intel<span class="o">(</span>R<span class="o">)</span> Ethernet Controller XL710 <span class="k">for </span>40GbE backplane
Flash update started
|<span class="o">======================[</span>100%]<span class="o">======================</span>|
NVM image verification started
Shadow RAM image verification started
|<span class="o">======================[</span>100%]<span class="o">======================</span>|
Shadow RAM image verification finished
Flash image verification started
|<span class="o">======================[</span>100%]<span class="o">======================</span>|
Flash image verification finished
NVM image verification finished
Flash update successful
Config file doesn<span class="s1">'t have any OROM components specified for device '</span>XL710<span class="s1">'. Tool will use current device'</span>s combo <span class="nb">set </span><span class="k">for </span>the OROM update.
Config file doesn<span class="s1">'t have any OROM components specified for device '</span>XL710<span class="s1">'. Tool will use current device'</span>s combo <span class="nb">set </span><span class="k">for </span>the OROM update.
Post update inventory
<span class="o">[</span>00:005:00:00]: Intel<span class="o">(</span>R<span class="o">)</span> Ethernet Controller XL710 <span class="k">for </span>40GbE backplane
EEPROM inventory started
Alternate MAC address is not <span class="nb">set
</span>EEPROM inventory finished
OROM inventory started
OROM inventory finished
<span class="o">[</span>00:005:00:01]: Intel<span class="o">(</span>R<span class="o">)</span> Ethernet Controller XL710 <span class="k">for </span>40GbE backplane
Device already inventoried.
Please Power Cycle your system now and run the NVM update utility again to <span class="nb">complete </span>the update. Failure to <span class="k">do </span>so will result <span class="k">in </span>an incomplete NVM update.
Upgrade <span class="nb">complete </span>please power reboot
You may notify to power reboot again after reboot <span class="k">if </span>required
</pre></td></tr></tbody></table></code></pre></div></div>
<p>At this point you can power-cycle the CB, which will cut power cycle the RE. The first time you do this, you might see that it takes a while before <code class="language-plaintext highlighter-rouge">show chassis environment cb 0</code> shows it as Offline. Once it is Offline, bring it back online again.</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="rouge-code"><pre>root@netboot> request chassis cb offline slot 0
Offline initiated, use <span class="s2">"show chassis environment cb"</span> to verify
root@netboot> show chassis environment cb 0
CB 0 status:
State Offline
Power 1 Disabled
Power 2 Disabled
root@netboot> request chassis cb online slot 0
Online initiated, use <span class="s2">"show chassis environment cb"</span> to verify
root@netboot>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>If you’re watching the console, you’ll see very similar messages to last time.</p>
<div class="alert alert-success" role="alert"><i class="fas fa-check-square"></i> What if you don’t have a console server? How do you know when to power cycle the RE? If you leave it long enough, the routing engine will boot on its own. When you see it back up, power cycle it again.</div>
<p>Power cycle the CB again. Eventually you’ll see this:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre>i40e firmware revision is the latest to 6.01
Fortville NVM Firmware Version: 6.01
Host reboot is required to load compatible i40e driver
</pre></td></tr></tbody></table></code></pre></div></div>
<p>You don’t need to do anything at this point. Eventually it will carry on to load Junos.</p>
<p>You can check that the new firmware shows up:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="rouge-code"><pre>root@netboot> show system firmware
Part Type Tag Current Available Status
version version
Routing Engine 0 RE BIOS 0 0.53.1 0.55.2 OK
Routing Engine 0 RE FPGA 1 41.0.0 41.0 OK
Routing Engine 0 RE SSD1 5 12028 12050 OK
Routing Engine 0 RE SSD2 5 12028 12050 OK
Routing Engine 0 RE i40e-NVM 7 6.1 6.01 OK
Routing Engine 1 0 0.0.0 0 OK
root@netboot>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Yes. Yes they did mix up “6.1” and “6.01”. Don’t worry about it. You’re running the right version.</p>
<p>Of course you’re not done yet, now you need to load the LLDP fix, and go through the whole process again with the other RE.</p>
<h2 id="lldp-package-install">LLDP package install</h2>
<p>On each RE, run this:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="rouge-code"><pre>root@netboot> request system software add /var/tmp/lldp-patch-for-i40e-upgrade.tgz
Verified lldp-patch-for-i40e-upgrade signed by PackageDevelopmentEc_2018 method ECDSA256+SHA256
<span class="o">[</span> re_name <span class="o">=</span> RE-S-2X00x6 <span class="o">]</span>
Pushing script<span class="o">(</span>s<span class="o">)</span> to host ...
Install the script<span class="o">(</span>s<span class="o">)</span> under host-os....
Script<span class="o">(</span>s<span class="o">)</span> copy <span class="k">done</span><span class="nb">.</span>
root@netboot> show version | match lldp
lldp-patch-for-i40e-upgrade
root@netboot>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Note that it’s <code class="language-plaintext highlighter-rouge">request system software add ...</code>, not <code class="language-plaintext highlighter-rouge">request vmhost software add ...</code></p>
<p>Then one more reboot of each RE, but just a regular reboot, with no need for any power cycling. When the system is back up, you’ll see a message like this when loggging in:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="rouge-code"><pre>FreeBSD/amd64 <span class="o">(</span>netboot<span class="o">)</span> <span class="o">(</span>ttyu0<span class="o">)</span>
login: root
Password:
Last login: Sun Oct 4 20:49:38 on ttyu0
<span class="nt">---</span> JUNOS 18.4R3-S4.2 Kernel 64-bit JNPR-11.0-20200618.2bc7e35_buil
At least one package installed on this device has limited support.
Run <span class="s1">'file show /etc/notices/unsupported.txt'</span> <span class="k">for </span>details.
root@netboot:~ <span class="c">#</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Don’t worry about it. Juniper TAC assures me that this is still a supported configuration.</p>
<h2 id="combining-with-junos-upgrade">Combining with Junos upgrade</h2>
<p>Can I combine this with a Junos upgrade, and how do I minimise the number of FPC restarts, so I minimise user disruption?</p>
<p>Yes, if you pay attention to what you’re doing, and you give yourself a little more time. You can do the firmware upgrades, LLDP fix and Junos upgrade in one change window, with only one disruptive FPC restart.</p>
<p>Here’s how I would do it:</p>
<ol>
<li>Disable GRES, and install the jfirmware package on RE0, but don’t reboot yet.</li>
<li>Re-enable GRES and wait for the REs to sync</li>
<li>Fail over to RE1 - this should be seamless.</li>
<li>Go through the 3 reboot cycles on RE0 to get the i40e firmware done</li>
<li>Install the LLDP package on RE0</li>
<li>Install the new Junos version on RE0 and reboot</li>
<li>Install the new jfirmware package on RE1.</li>
<li>Fail over to RE0. The new Junos version will take over, and all FPCs will restart. This is disruptive</li>
<li>Go through the reboot cycles on RE1. When i40e is done, install the LLDP package, followed by the new Junos version</li>
<li>Re-enable GRES</li>
</ol>
<p>What about the LLDP fix with the new Junos version? No need to re-install it. Depending on which version you upgrade to, it will either still be there as a separate package, or it will be integrated into the main codebase.</p>
<p>The only good thing about this process? It worked on every RE I upgraded.</p>
<p>Hope this helps, and hope I never go through this again.</p>lindsayJuniper Routing Engines with VM Host need an i40e NVM firmware upgrade. The procedure is a pain in the ass, and documentation is not great. But you can’t avoid the upgrade any more. New Junos versions need the firmware upgrade, and replacement REs will ship with it already installed. Here’s some tips on doing the upgrade.Juniper Direct vs Local Routes2020-05-31T00:30:00+00:002020-05-31T00:30:00+00:00https://lkhill.com//juniper-direct-local-routes<p>Juniper routers consider a directly configured IP as a “direct” route, except when you use a <code class="language-plaintext highlighter-rouge">/32</code> mask (for IPv4). Then it is a “local” route. This caused me some confusion when creating a policy to redistribute loopback IP addresses into BGP.</p>
<h2 id="route-protocol-types">Route Protocol Types</h2>
<p>A router learns routes from a variety of sources - networks configured on the box, those learned from IS-IS, rumors of prefixes from BGP or RIP, etc. You can see the full list <a href="https://www.juniper.net/documentation/en_US/junos/topics/reference/command-summary/show-route-protocol.html">here</a>.</p>
<p>When routes are learned from different sources, Junos uses “<a href="https://www.juniper.net/documentation/en_US/junos/topics/reference/general/routing-protocols-default-route-preference-values.html">Route Preference Values</a>” to decide which route source to prefer. (Cisco refers to this as <a href="https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/15986-admin-distance.html">Administrative Distance</a>). If routes are otherwise identical, the route with the lowest preference will be installed into the FIB.</p>
<p>If you’re looking at the route table, you can narrow down displayed routes to look at a specific type, e.g. <code class="language-plaintext highlighter-rouge">show route protocol direct</code> to see locally connected networks:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td class="rouge-code"><pre>vagrant@vqfx> show route protocol direct
inet.0: 7 destinations, 7 routes <span class="o">(</span>7 active, 0 holddown, 0 hidden<span class="o">)</span>
+ <span class="o">=</span> Active Route, - <span class="o">=</span> Last Active, <span class="k">*</span> <span class="o">=</span> Both
10.0.2.0/24 <span class="k">*</span><span class="o">[</span>Direct/0] 00:02:45
<span class="o">></span> via em0.0
10.1.2.0/24 <span class="k">*</span><span class="o">[</span>Direct/0] 00:00:59
<span class="o">></span> via xe-0/0/0.0
169.254.0.0/24 <span class="k">*</span><span class="o">[</span>Direct/0] 00:02:49
<span class="o">></span> via em1.0
inet6.0: 2 destinations, 2 routes <span class="o">(</span>2 active, 0 holddown, 0 hidden<span class="o">)</span>
+ <span class="o">=</span> Active Route, - <span class="o">=</span> Last Active, <span class="k">*</span> <span class="o">=</span> Both
fe80::205:860f:fc71:8500/128
<span class="k">*</span><span class="o">[</span>Direct/0] 00:02:49
<span class="o">></span> via lo0.0
<span class="o">{</span>master:0<span class="o">}</span>
vagrant@vqfx>
</pre></td></tr></tbody></table></code></pre></div></div>
<h2 id="route-filtering-by-protocol">Route Filtering by Protocol</h2>
<p>It’s not just about displaying routes, or selecting which route to prefer. We can also use the route type when filtering, to decide which routes we want to redistribute. Let’s say we wanted to redistribute static routes (and only static routes) into OSPF. Something like this does the trick:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre><span class="o">[</span>edit]
<span class="nb">set </span>policy-options policy-statement export-ospf term statics from protocol static
<span class="nb">set </span>policy-options policy-statement export-ospf term statics <span class="k">then </span>accept
<span class="nb">set </span>protocols ospf <span class="nb">export </span>export-ospf
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Route filters can get quite complex, matching on all sorts of things - prefix length, route origin, AS path, etc.</p>
<p>So far so good. What if we wanted to write a filter that would match on loopback addresses?</p>
<h2 id="direct-vs-local">“Direct” vs “Local”</h2>
<p>First a diversion - What’s the difference between “Direct” and “Local” routes?</p>
<p>The <a href="https://www.juniper.net/documentation/en_US/junos/topics/reference/command-summary/show-route-protocol.html">docs</a> say this:</p>
<blockquote>
<p><code class="language-plaintext highlighter-rouge">direct</code> — Directly connected route</p>
<p>…</p>
<p><code class="language-plaintext highlighter-rouge">local</code> — Local address</p>
</blockquote>
<p>OK, so a “direct” route comes from configuring an IP + subnet mask on an interface. If we run <code class="language-plaintext highlighter-rouge">set interface xe-0/0/1 unit 0 family inet address 100.100.100.1/24</code>, then the router creates a “direct” route for <code class="language-plaintext highlighter-rouge">100.100.100.0/24</code> via that interface. It will <strong>also</strong> create a <code class="language-plaintext highlighter-rouge">local</code> entry for <code class="language-plaintext highlighter-rouge">100.100.100.1/32</code>:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="rouge-code"><pre>vagrant@vqfx> show route 100.100.100.0/24
inet.0: 9 destinations, 9 routes <span class="o">(</span>9 active, 0 holddown, 0 hidden<span class="o">)</span>
+ <span class="o">=</span> Active Route, - <span class="o">=</span> Last Active, <span class="k">*</span> <span class="o">=</span> Both
100.100.100.0/24 <span class="k">*</span><span class="o">[</span>Direct/0] 00:00:35
<span class="o">></span> via xe-0/0/1.0
100.100.100.1/32 <span class="k">*</span><span class="o">[</span>Local/0] 00:00:35
Local via xe-0/0/1.0
<span class="o">{</span>master:0<span class="o">}</span>
vagrant@vqfx>
</pre></td></tr></tbody></table></code></pre></div></div>
<h2 id="what-about-loopbacks">What About Loopbacks?</h2>
<p>What happens when we configure a <a href="https://www.juniper.net/documentation/en_US/junos/topics/concept/interface-security-loopback-understanding.html">loopback interface</a>? We almost always configure these with a <code class="language-plaintext highlighter-rouge">/32</code> (or <code class="language-plaintext highlighter-rouge">/128</code>) subnet mask. How does it show up in the routing table? Is that a “direct” or a “local” route? Should be a “local” route, right? Turns out it’s not. It’s a <strong>direct</strong> route:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="rouge-code"><pre><span class="o">{</span>master:0<span class="o">}[</span>edit]
vagrant@vqfx# <span class="nb">set </span>interfaces lo0 unit 0 family inet address 198.51.100.1/32
<span class="o">{</span>master:0<span class="o">}[</span>edit]
vagrant@vqfx# commit
configuration check succeeds
commit <span class="nb">complete</span>
<span class="o">{</span>master:0<span class="o">}[</span>edit]
vagrant@vqfx# run show route 198.51.100.1/32
inet.0: 10 destinations, 10 routes <span class="o">(</span>10 active, 0 holddown, 0 hidden<span class="o">)</span>
+ <span class="o">=</span> Active Route, - <span class="o">=</span> Last Active, <span class="k">*</span> <span class="o">=</span> Both
198.51.100.1/32 <span class="k">*</span><span class="o">[</span>Direct/0] 00:00:08
<span class="o">></span> via lo0.0
<span class="o">{</span>master:0<span class="o">}[</span>edit]
vagrant@vqfx#
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Hmm. Bit odd. What if we used a different prefix length on the loopback? Now it shows up a little differently:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
</pre></td><td class="rouge-code"><pre><span class="o">{</span>master:0<span class="o">}[</span>edit]
vagrant@vqfx# delete interfaces lo0 unit 0 family inet address 198.51.100.1/32
<span class="o">{</span>master:0<span class="o">}[</span>edit]
vagrant@vqfx# <span class="nb">set </span>interfaces lo0 unit 0 family inet address 198.51.100.1/24
<span class="o">{</span>master:0<span class="o">}[</span>edit]
vagrant@vqfx# commit
configuration check succeeds
commit <span class="nb">complete</span>
<span class="o">{</span>master:0<span class="o">}[</span>edit]
vagrant@vqfx# run show route 198.51.100.0/24
inet.0: 11 destinations, 11 routes <span class="o">(</span>11 active, 0 holddown, 0 hidden<span class="o">)</span>
+ <span class="o">=</span> Active Route, - <span class="o">=</span> Last Active, <span class="k">*</span> <span class="o">=</span> Both
198.51.100.0/24 <span class="k">*</span><span class="o">[</span>Direct/0] 00:00:05
<span class="o">></span> via lo0.0
198.51.100.1/32 <span class="k">*</span><span class="o">[</span>Local/0] 00:00:05
Local via lo0.0
<span class="o">{</span>master:0<span class="o">}[</span>edit]
vagrant@vqfx#
</pre></td></tr></tbody></table></code></pre></div></div>
<p>It must be something to do with the <code class="language-plaintext highlighter-rouge">/32</code> mask. There’s no need to have both a “direct” and a “local” entry for the same prefix, but the choice of “direct” is surprising, to me at least.</p>
<h2 id="why-does-it-matter">Why Does it Matter?</h2>
<p>The reason I noticed this was because I was configuring a policy to redistribute loopbacks into BGP. This was for a leaf-spine network, so I wanted to have the exact same policy configured on all devices. Each system had a <code class="language-plaintext highlighter-rouge">/32</code> address taken from <code class="language-plaintext highlighter-rouge">198.51.100.0/24</code>. OK, this should be easy. Let’s use this config:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre>set policy-options policy-statement CLOS-OUT term loopbacks from protocol local
set policy-options policy-statement CLOS-OUT term loopbacks from route-filter 198.51.100.0/24 prefix-length-range /32-/32
set policy-options policy-statement CLOS-OUT term loopbacks then accept
set protocols bgp group SPINES export CLOS-OUT
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Nope. Doesn’t work. It <em>would</em> work if I had a shorter mask than <code class="language-plaintext highlighter-rouge">/32</code> on my loopbacks, but most people aren’t going to do that.</p>
<p>The network & prefix length matches, but the protocol doesn’t. You have to change it to <code class="language-plaintext highlighter-rouge">from protocol direct</code>, and then it works.</p>
<p>Funnily enough, if you use something like <code class="language-plaintext highlighter-rouge">test policy CLOS-OUT 198.51.100.1/32</code>, it will tell you that it accepts the prefix, regardless of whether you use <code class="language-plaintext highlighter-rouge">from protocol local</code> or <code class="language-plaintext highlighter-rouge">from protocol direct</code>. But in practice I found it did not export the routes unless I used <code class="language-plaintext highlighter-rouge">from protocol direct</code>. This was on a recent Junos version. Behavior could be version-specific, I have not tested different versions of Junos.</p>
<h2 id="no-big-deal-just-another-gotcha">No Big Deal. Just Another Gotcha</h2>
<p>Ultimately it’s no big deal. Just one of those random little things that might confuse someone. If you find this through Google, hope it helps :)</p>lindsayJuniper routers consider a directly configured IP as a “direct” route, except when you use a /32 mask (for IPv4). Then it is a “local” route. This caused me some confusion when creating a policy to redistribute loopback IP addresses into BGP.Juniper Default ARP Policer2020-05-11T00:00:00+00:002020-05-11T00:00:00+00:00https://lkhill.com//juniper-arp-policer<p>Juniper devices have a default ARP policer that drops ARP requests and responses over 150kbps. By default, this is an aggregate policer that applies to <strong>all</strong> interfaces. This can lead to unexpected behavior when high levels of ARP on one interface lead to BGP session drops on another interface. You can’t change the default policer limits, but you can create a new policer, with higher limits.</p>
<p><strong>UPDATE</strong>: There is a similar issue with <a href="/juniper-arp-policer-ptx/">PTX</a>. A bit easier to diagnose & resolve.</p>
<h2 id="problem-ipv4-bgp-session-flaps-on-pni">Problem: IPv4 BGP Session Flaps on PNI</h2>
<p>I was investigating a problem reported by one of our Transit providers. Once a day or so, our IPv4 BGP session with them would flap. The interface itself was stable, and the IPv6 session remained up. One particular site was seeing this more than others. The sites used different platforms, but were running the same code version.</p>
<p>The curious thing was the logs - we saw log messages saying that we had a notification message saying <code class="language-plaintext highlighter-rouge">NOTIFICATION received from 192.0.2.188 (External AS 64498): code 4 (Hold Timer Expired Error)</code>. The syslog included this <code class="language-plaintext highlighter-rouge">hold timer 30s, hold timer remain 0s, last sent 2s</code>. So our router thought it was sending regular KEEPALIVE messages, but the remote end thought it had missed too many.</p>
<p>Looking more closely, we saw that we had BGP session flaps with other neighbors, including iBGP sessions. Clearly it was not a problem with a specific interface, or some other vendor or configuration.</p>
<h2 id="whats-happening">What’s Happening?</h2>
<p>Our routers were connected to an IXP that has a large subnet, with many peers. This can mean large amounts of ARP traffic. By default, Juniper routers police ARP to 150kbps. Any ARP requests or responses above that rate are dropped. The interesting wrinkle is that it is not per-interface, but per-PFE.</p>
<p>When the router saw large amounts of ARP traffic on the IX-facing interface, it started policing it. The Transit links we had used the same PFE. During the window when ARP was being policed, it affected ARP on the transit interface. Our router could not get an ARP response for our upstream provider’s IP, so it could not send keepalives. The upstream detected loss of keepalives, and sent a notification to us. The session was cleared and reset. By this time we had an ARP entry, and the session quickly came back up.</p>
<p>Of course, IPv6 was unaffected.</p>
<h2 id="detecting-arp-policing">Detecting ARP Policing</h2>
<p>There’s a couple of things to look at to see if this is affecting you:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="rouge-code"><pre>lhill@mx.lab.net> show policer __default_arp_policer__
Policers:
Name Bytes Packets
__default_arp_policer__ 3091706 67211
<span class="o">{</span>master<span class="o">}</span>
lhill@mx.lab.net>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>That tells you the aggregate policer has been triggered. To check which FPCs are affected, you need to drill down:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="rouge-code"><pre><span class="o">{</span>master<span class="o">}</span>
lhill@mx.lab.net> start shell pfe network fpc2
NGMPC platform <span class="o">(</span>1200Mhz QorIQ P2020 processor, 3584MB memory, 512KB flash<span class="o">)</span>
NGMPC2<span class="o">(</span>mx.lab.net vty<span class="o">)</span><span class="c"># show filter index 17000 counters</span>
Filter Counters/Policers:
Index Packets Bytes Name
<span class="nt">--------</span> <span class="nt">--------------------</span> <span class="nt">--------------------</span> <span class="nt">--------</span>
17000 67186 __default_arp_policer__
NGMPC2<span class="o">(</span>mx.lab.net vty<span class="o">)</span><span class="c">#</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>I have not seen any syslogs to indicate ARP policing. It is a different framework to the DDoS Protection capabilities.</p>
<h2 id="changing-arp-limits">Changing ARP Limits</h2>
<p>150kbps of ARP traffic is not a huge amount on a router with multiple interfaces in large subnets. So you’ll probably want to change the limits.</p>
<p>You can’t just change the default ARP limit though. Instead, you have two choices:</p>
<p>1/ Create a new ARP policer, and associate the “busy” interface with that policer. The policer will only apply to that interface, and all the other interfaces will share the default 150kbps.
2/ Create a new ARP policer, and associate interfaces with that. The “busy” interface will be left with the default bucket.</p>
<p>Configuration is pretty simple. First create a new policer:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre>set firewall policer arp_limit if-exceeding bandwidth-limit 1m
set firewall policer arp_limit if-exceeding burst-size-limit 1m
set firewall policer arp_limit then discard
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Then associate it with an interface:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>set interfaces ae1 unit 0 family inet policer arp arp_limit
</pre></td></tr></tbody></table></code></pre></div></div>
<h2 id="monitoring-the-new-policer">Monitoring the new Policer</h2>
<p>You can look at the output of <code class="language-plaintext highlighter-rouge">show policer</code> - note that if you have created a new policer and associated it with more than one interface, you will see an entry per interface in the <code class="language-plaintext highlighter-rouge">show policer output</code>:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
</pre></td><td class="rouge-code"><pre>lhill@mx.valve.net> show policer ?
Possible completions:
<<span class="o">[</span>Enter]> Execute this <span class="nb">command</span>
<policer> Policer name
__auto_policer_template_1__
__auto_policer_template_2__
__auto_policer_template_3__
__auto_policer_template_4__
__auto_policer_template_5__
__auto_policer_template_6__
__auto_policer_template_7__
__auto_policer_template_8__
__auto_policer_template__
__default_arp_policer__
arp_limit-ae1.0-inet-arp
arp_limit-xe-1/0/1.0-inet-arp
detail Show filter statistics with enhanced policer statistics
logical-system Name of logical system, or <span class="s1">'all'</span>
| Pipe through a <span class="nb">command</span>
<span class="o">{</span>master<span class="o">}</span>
lindsayh@mx.lab.net> show policer arp_limit-ae1.0-inet-arp
Policers:
Name Bytes Packets
arp_limit-ae1.0-inet-arp 0 0
<span class="o">{</span>master<span class="o">}</span>
lhill@mx.lab.net>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>You can see that I associated the policer with <code class="language-plaintext highlighter-rouge">ae1</code> and <code class="language-plaintext highlighter-rouge">xe-1/0/1.0</code>.</p>
<p>Hope this helps someone else looking at flapping BGP or BFD sessions.</p>lindsayJuniper devices have a default ARP policer that drops ARP requests and responses over 150kbps. By default, this is an aggregate policer that applies to all interfaces. This can lead to unexpected behavior when high levels of ARP on one interface lead to BGP session drops on another interface. You can’t change the default policer limits, but you can create a new policer, with higher limits.Juniper Branch SRX LACP Weirdness2020-04-25T00:00:00+00:002020-04-25T00:00:00+00:00https://lkhill.com//juniper-srx-lacp<p><a href="https://www.juniper.net/us/en/products-services/security/srx-series/srx300/">Juniper SRX 300 Series</a> firewalls may stop forwarding traffic in some situations. The firewall says it is forwarding the traffic, but it doesn’t work. Monitoring traffic looks OK, ARP entries are present, but traffic never gets to the destination, until you clear ARP. Turns out the problem comes from using LACP with fast timers and active mode. Luckily the fix is simple.</p>
<h2 id="alert-firewall-offline">Alert: Firewall Offline</h2>
<p>Here’s the situation we saw: Our NMS reported a Juniper SRX320 offline. All other devices at the site were still working, but the firewall was unreachable. Traffic from the firewall to the NMS goes via the firewall’s default gateway. Firewall A in this diagram was unreachable, but Firewall B was fine.</p>
<p><a href="/assets/2020/04/srx320_layout.jpg"><img src="/assets/2020/04/srx320_layout.jpg" alt="network_overview" /></a></p>
<p>OK, what’s happening? Why is my firewall unreachable?</p>
<h2 id="firewall-says-its-fine">Firewall says its fine?</h2>
<p>Try to ping Firewall A, no response. From the default gateway, we can see an ARP entry for the firewall, but no response to ping. We can log in to Firewall B, and we see an ARP entry for Firewall A. Crucially: <strong>we can ping Firewall A from Firewall B</strong>. Hmmm. That’s strange. Why can we ping it from one locally connected device but not another?</p>
<p>From Firewall B, we SSH across to Firewall A. Everything looks fine - it’s up and running, it has not restarted, security policies are as expected, and it has a valid ARP entry for its default gateway. Why can’t we ping it?</p>
<p>When we monitor traffic, the firewal says it is generating packets with the right L2 headers, and forwarding them out the right interface. We see regular ARP traffic. But we have no reachability.</p>
<p>On a whim, we try clearing ARP on the firewall. Traffic starts working again. Check the ARP cache, it has the same entry it had before. It wasn’t like it had an invalid entry before.</p>
<p>The firewall starts working properly again, and might be OK for hours, days or weeks before it fails again.</p>
<h2 id="whats-happening">What’s Happening?</h2>
<p>We investigated it quite deeply, including a very long debugging session with JTAC with a firewall that was in a known-bad state. But we couldn’t work out why it wasn’t forwarding packets properly. Everything looked correct. Changing software versions made no difference. The only common factor was that this problem was only seen on SRX300-series devices. Not on old SRX200-series, or bigger iron boxes. Every few days or weeks a firewall would go offline, and clearing ARP restored it.</p>
<h2 id="lacp-timers">LACP Timers?</h2>
<p>We use LACP on these firewalls, and we set them up as active mode, with fast timers. JTAC suggested we should change that to passive, with slow timers. They didn’t think this was the cause of the problem, it’s just something that they suggest as a general good practice. This may be officially documented somewhere, but I haven’t come across it.</p>
<p>I resisted making that change at first, as LACP didn’t seem related. LACP was working, and we weren’t seeing any issues with the LACP interface.</p>
<p>On a whim, I made the change in one place. Couldn’t tell straight away if it solved anything, because the problem only occurred every few days or weeks. But after a week that system was still looking OK. So we rolled out the change everywhere else:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre>set interfaces ae0 aggregate-ether-options lacp periodic slow
set interfaces ae0 aggregate-ether-options lacp passive
</pre></td></tr></tbody></table></code></pre></div></div>
<p>A week goes by with no outages. Then another week…and another. We cracked it at last. I don’t know <strong>why</strong> this works, but it does.</p>
<p>Footnote: I missed a system when rolling out the change. 3 months later that firewall went offline. None of the others had broken in that period. Proves that this was the issue. I just wish I knew why.</p>lindsayJuniper SRX 300 Series firewalls may stop forwarding traffic in some situations. The firewall says it is forwarding the traffic, but it doesn’t work. Monitoring traffic looks OK, ARP entries are present, but traffic never gets to the destination, until you clear ARP. Turns out the problem comes from using LACP with fast timers and active mode. Luckily the fix is simple.