Partager

26 janvier 2023

Comment un Paquet Réseau Atteint-il sa Destination? (article en anglais)

Visiting Mozilla's website is easy, right? Just enter mozilla.org in the browser's address bar, and everything's ready.

Let's stop and think for a minute. How can that be so easy? How can your computer know which server, out of the billions of servers on the planet, belongs to mozilla.org? How can the request then navigate through the Internet and manage to reach that server? Should we come to the conclusion that it is anything but "easy"?

This post will explain, at beginner and intermediate levels, what happens for one of these network packets to arrive at its destination. Some basic knowledge in networking is assumed, but should otherwise be accessible to all. There are a lot of technologies in play, some that will be silently skipped, others skimmed quickly, but I will focus on two:

  1. Hostname Resolution with DNS;
  2. Network Routing with BGP.

The explanations and examples are from a Linux Debian computer. It's possible to follow along from within a Docker container:

$ docker run -ti --rm debian:latest bash
root@debian$ apt update && apt install -y dnsutils iproute2

Hostname Resolution

When we type curl example.com in a terminal, a network packet cannot yet be created, because the process doesn't know what example.com is, nor how to communicate with its server. For that, an IP address is required, so the first step is to convert a hostname, example.com, into an IP address.

Linux uses a Name Service Switch, or NSS, which is a configuration file located at /etc/nsswitch.conf. In this file can be found the hosts database that tells the computer how to try resolving hostnames:

$ cat /etc/nsswitch.conf | grep hosts
hosts:          files dns

As seen in the previous output, our hosts database is set to use two services: files and dns. These services will be used, in order, until one successfully resolves the hostname, otherwise, the hostname is unresolvable.

/etc/hosts

The service named files says that the hostname should be resolved using the file /etc/hosts. That file is a relic of how hostnames worked in the early days of ARPANET. It contains an IP address, along with the hostname it belongs to, and any aliases it may have.

$ cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback

If example.com had been found on a line, that would give us its IP address, but since it wasn't, the next service is used.

DNS

The service named dns tells the computer to use the Domain Name System, or DNS, to find a hostname's IP address.

Note: To keep a nice flow in this section, I am ignoring that DNS queries are packets that go through a network, which we haven't yet seen.

DNS is a hierarchical system separated into zones of authority. At the top of this hierarchy is the root zone ., in control of all the TLD zones (eg. com., but many others), which themselves can be split into subzones (eg. example.com.) in control of all domain names under them.

A DNS server is called a nameserver and can be queried for a domain name's resource records. There are many types of records, but we'll focus on two: A, which defines an IPv4 address, and NS, which points to an authoritative nameserver.

An authoritative nameserver is one that actually knows the records that are configured on a domain name. Resource records can also be retrieved from other nameservers that have cached them, but these answers are not considered "authoritative".

To retrieve example.com.'s A record, we just need to make a DNS query to a nameserver. Which DNS server should be used? That's configured in /etc/resolv.conf:

$ cat /etc/resolv.conf
nameserver 192.168.1.1

This tells the computer that it should send its DNS queries to 192.168.1.1 (port 53). That could be the router's IP address (if it can handle DNS resolution), the ISP's IP address, or any other DNS resolver, eg. CloudFlare's DNS service 1.1.1.1.

Now that we know the DNS server, all that's left is to query it for an A record, which would hopefully result in example.com's IP address. The command dig can be used to retrieve resource records:

$ dig +short example.com. A
93.184.216.34

What the previous command does would be similar to the following steps, but we'll ignore caching to illustrate the recursive nature of DNS. Actual steps depend on the nameserver's implementation.

  • the computer makes a DNS query to 192.168.1.1, asking for example.com.'s A record;
  • 192.168.1.1 is not an authority for example.com, so it doesn't know its records;
  • 192.168.1.1 wants to make a new DNS query to example.com.'s nameserver, but it doesn't know what it is, so it needs to first ask com. about example.com.;
  • 192.168.1.1 doesn't know either how to reach com., so it first needs to ask . about com.;
  • 192.168.1.1 actually knows where to reach ., because recursive loops need an exit point, and all DNS servers are configured with at least one hard-coded IP address for the root zone. It makes a DNS query to ., asking for com.'s NS records, and receives them (eg. a.gtld-servers.net.) along with their IP addresses;
  • 192.168.1.1 makes a DNS query to one of com.'s nameservers, asking for example.com.'s NS records, and receives them (eg. a.iana-servers.net.);
  • since the previous response did not include the nameservers' IP addresses as additional information, we need to retrieve one of them ourselves, asking . about net., then net. about iana-servers.net., which helpfully includes the desired IP addresses as well;
  • 192.168.1.1 makes a DNS query to one of example.com.'s nameservers, asking for example.com.'s A record, and receives 93.184.216.34 as a response;
  • the computer receives the response 93.184.216.34 from 192.168.1.1.

These steps are listed next:

# Already known: . A 198.41.0.4
dig +norecurse @198.41.0.4 com. NS              # a.gtld-servers.net. A 192.5.6.30
dig +norecurse @192.5.6.30 example.com. NS      # example.com. NS a.iana-servers.net.
dig +norecurse @198.41.0.4 net. NS              # a.gtld-servers.net. A 192.5.6.30
dig +norecurse @192.5.6.30 iana-servers.net. NS # a.iana-servers.net. A 199.43.135.53
dig +norecurse @199.43.135.53 example.com. A    # example.com A 93.184.216.34

All of this, in order to get example.com's IP address! With proper caching, these steps are clearly a lot simpler, but they show an overview of how DNS works.

Now that we have the target IP address, a network packet can finally be created, but how can we send it to that IP address? How does it actually reach its destination?

Network Routing

The first step for the network packet to example.com is to leave the computer, and find out where to go next. From the point of view of the computer, this part is just a series of networking rules.

Let's first look at the routing table. I've removed IPv6 and broadcast entries for brevity.

$ ip route show table all
default via 192.168.1.1 dev enp5s0 proto dhcp metric 100
192.168.1.0/24 dev enp5s0 proto kernel scope link src 192.168.1.201 metric 100
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1
local 192.168.1.201 dev enp5s0 table local proto kernel scope host src 192.168.1.201

With this routing table and the target IP address, a computer can find out where next to send the packet. On each line, the keyword following dev (for "device") determines the interface that the packet should go through. These usually lead to a NIC but don't need to.

Loopback

Let's pretend we're trying to send a network packet to localhost, which was resolved to 127.0.0.1 using /etc/hosts. Looking at the routing table, the target IP address 127.0.0.1 matches exactly with one of the lines:

local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1

This tells us that the packet will go through the lo interface. Now, what's the lo interface?

$ ip link show lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

The interface lo does not have any hardware behind it and is just a virtual interface that is used for loopback. It works by sending back to itself all of its packets, with the IP addresses and port numbers reversed.

Inside the Network

Let's say we try to reach another computer, one on the local network, whose IP address happens to be 192.168.1.210. There is a line in the routing table that matches this IP address with the CIDR 192.168.1.0/24:

192.168.1.0/24 dev enp5s0 proto kernel scope link src 192.168.1.201 metric 100

It means that these packets should be sent to the interface enp5s0.

$ ip link show enp5s0
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:01:02:03:04:05 brd ff:ff:ff:ff:ff:ff

Right now, this interface has the computer's NIC card behind it, the one with MAC address 00:01:02:03:04:05, so all packets sent to enp5s0 will go out through that NIC.

Since 192.168.1.210 is in the same network, we only need the target computer's NIC's MAC address. It may be found in the cache, but if it isn't, it will be retrievable via an ARP broadcast:

$ ip neigh show
192.168.1.210 dev enp5s0 lladdr 08:09:0a:0b:0c:0d REACHABLE

All of this stays within the local network, in my case keeping the communications in my router's switch.

Outside the Network

Let's finally continue with our network packet to example.com. There are no routes that match 93.184.216.34, so what happens now? There's one route, though, that was a bit different than the others:

default via 192.168.1.1 dev enp5s0 proto dhcp metric 100

This line means that all IP addresses that match none of the routes are sent to the gateway at 192.168.1.1 through the enp5s0 interface. This is the default gateway, in my case a router, whose job it is to find out where to route these packets.

Now typically, that packet will go out of the router, through the modem, reach an ISP, and go into the wild Internet to somehow finally arrive at the target server. That's way too high-level though, so let's dig deeper into the last piece of technology.

BGP

The path a network packet takes is solved by an interesting, albeit daunting, technology: the Border Gateway Protocol, or BGP. BGP is how a network packet gets routed throughout the Internet, moving from router to router.

The Internet is separated into Autonomous Systems, or AS, which are network meshes of routers under the jurisdiction of an ISP, a large company, or a hosting provider. They are in charge of overseeing and/or assigning their allocated IP addresses.

When my router sends the packet to my ISP, their router receives it and matches it with a route in its routing table. There may be different paths available towards the destination, and the best one will be chosen. The network packet will then be sent to the next BGP router in the path, and the cycle continues, until the packet arrives at the target server.

Let's look at an actual example using traceroute:

$ traceroute example.com
traceroute to example.com (93.184.216.34), 30 hops max, 60 byte packets
1  pfSense.localdomain (192.168.1.1)  0.244 ms  0.268 ms  0.302 ms
2  198-84-254-97.cpe.teksavvy.com (198.84.254.97)  10.884 ms  10.867 ms  10.877 ms
3  10.170.192.53 (10.170.192.53)  14.954 ms  15.005 ms  15.038 ms
4  xe-0-1-0-0-bdr01-mtl.teksavvy.com (192.171.63.17)  14.318 ms  14.264 ms xe-5-2-1-0-bdr01-mtl.teksavvy.com (206.248.155.109)  13.974 ms
5  motl-b2-link.ip.twelve99.net (62.115.44.222)  14.930 ms  14.859 ms  14.901 ms
6  chi-b23-link.ip.twelve99.net (62.115.118.188)  33.839 ms  32.588 ms  32.610 ms
7  edgecast-ic315149-chi-b23.ip.twelve99-cust.net (213.248.97.182)  32.664 ms  33.045 ms  32.752 ms
8  ae-65.core1.chb.edgecastcdn.net (192.229.225.131)  32.955 ms ae-66.core1.chb.edgecastcdn.net (192.229.227.131)  29.664 ms  35.778 ms
9  93.184.216.34 (93.184.216.34)  31.456 ms  28.201 ms  32.559 ms
10  93.184.216.34 (93.184.216.34)  32.550 ms  30.099 ms  32.577 ms

What we can learn from this, using online tools like ipinfo.io, is that a network packet's path from my computer to example.com goes through three AS, and would arrive at its destination in 10 hops. It should be similar to the following:

Hops ASN Company Router Location
1 Quebec City
2, 3, 4 AS5645 TekSavvy Solutions Montreal
5 AS1299 Arelion Montreal
6, 7 AS1299 Arelion Chicago
8 AS15133 Edgecast Chicago
9, 10 AS15133 Edgecast New York

The packets leave my router to my ISP, Teksavvy, move inside their network in Montreal for a couple of hops, continue towards Chicago through a Swedish Tier 1 network named Arelion, to finally enter Edgecast's network.

If I do the same from a server hosted in a Toronto data center:

Hops ASN Company Router Location
1, 2, 3 AS14061 Digital Ocean Toronto
4, 5 AS6453 Tata Communications Toronto
6 AS6453 Tata Communications Chicago
7, 8, 9, 10 AS6453 Tata Communications New York
11, 12, 13 AS15133 Edgecast New York

Because a BGP router can make any and all decisions it wants about where next to send a packet, the path a packet will take is not deterministic. In the previous example (from a Digital Ocean droplet), the third hop can go through different routers: 138.197.249.78, 138.197.249.82, 138.197.249.86, or 138.197.249.90. Which one is chosen depends on the previous router's decision, may have considered many things, including routing rules, latency, congestion, etc.

This also means that a router or server is not necessarily in a definite location, and multiple servers in different cities can receive requests targeting the same IP address. This is one way to implement a Content Delivery Network, or CDN, which appears to be the case for example.com -- the server in New York seems to be a CDN server for the origin server located in Los Angeles. I can trigger a different path if I use a VPN in Vancouver, which results in a path that stays near the west coast, indeed ending up in Los Angeles. Which server is the origin server may be impossible to verify, though, and could even change at any time.

From a VPN in Ukraine, the packets are routed through several cities in Europe, then in a transatlantic cable towards Boston, before ending up in New York.

The End at the Target Server

The network packet to example.com just reached the target server, which can build and send back a response, if it wants. For that packet to get all the way to that server, though, it had a lot of things to go through.

We've seen the computer figure out the IP address from a hostname, and followed the packet from the computer to the target server.

Resolving example.com to 93.184.216.34 was done with DNS, a hierarchical system that decentralizes domain name resolution. With more than 350 million domain names accessible from the Internet, DNS allows everyone to have access to most of them by only remembering a short name.

Finding the path from the ISP to the target server required BGP, a decentralized planetary network mesh that boils down to simple routing decisions. With more than 100,000 AS that take part in BGP, this makes for a constantly evolving network defined by routing tables containing a million entries.

It's easy to forget how the Internet works, and despite its complexity, these systems allow communications across the world as quickly as a few tens of microseconds.

Simon Bernier

Simon Bernier