January 31, 2023
The 5 SOLID principles explained to developersLire maintenant
January 26, 2023
Visiting Mozilla's website is easy, right? Just enter
mozilla.org in the browser's address bar, and everything's ready.
Let's stop and think for a minute. How can that be so easy? How can your computer know which server, out of the billions of servers on the planet, belongs to
mozilla.org? How can the request then navigate through the Internet and manage to reach that server? Should we come to the conclusion that it is anything but "easy"?
This post will explain, at beginner and intermediate levels, what happens for one of these network packets to arrive at its destination. Some basic knowledge in networking is assumed, but should otherwise be accessible to all. There are a lot of technologies in play, some that will be silently skipped, others skimmed quickly, but I will focus on two:
The explanations and examples are from a Linux Debian computer. It's possible to follow along from within a Docker container:
$ docker run -ti --rm debian:latest bash root@debian$ apt update && apt install -y dnsutils iproute2
When we type
curl example.com in a terminal, a network packet cannot yet be created, because the process doesn't know what
example.com is, nor how to communicate with its server. For that, an IP address is required, so the first step is to convert a hostname,
example.com, into an IP address.
Linux uses a Name Service Switch, or NSS, which is a configuration file located at
/etc/nsswitch.conf. In this file can be found the
hosts database that tells the computer how to try resolving hostnames:
$ cat /etc/nsswitch.conf | grep hosts hosts: files dns
As seen in the previous output, our
hosts database is set to use two services:
dns. These services will be used, in order, until one successfully resolves the hostname, otherwise, the hostname is unresolvable.
The service named
files says that the hostname should be resolved using the file
/etc/hosts. That file is a relic of how hostnames worked in the early days of ARPANET. It contains an IP address, along with the hostname it belongs to, and any aliases it may have.
$ cat /etc/hosts 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback
example.com had been found on a line, that would give us its IP address, but since it wasn't, the next service is used.
The service named
dns tells the computer to use the Domain Name System, or DNS, to find a hostname's IP address.
Note: To keep a nice flow in this section, I am ignoring that DNS queries are packets that go through a network, which we haven't yet seen.
DNS is a hierarchical system separated into zones of authority. At the top of this hierarchy is the root zone
., in control of all the TLD zones (eg.
com., but many others), which themselves can be split into subzones (eg.
example.com.) in control of all domain names under them.
A DNS server is called a nameserver and can be queried for a domain name's resource records. There are many types of records, but we'll focus on two:
A, which defines an IPv4 address, and
NS, which points to an authoritative nameserver.
An authoritative nameserver is one that actually knows the records that are configured on a domain name. Resource records can also be retrieved from other nameservers that have cached them, but these answers are not considered "authoritative".
A record, we just need to make a DNS query to a nameserver. Which DNS server should be used? That's configured in
$ cat /etc/resolv.conf nameserver 192.168.1.1
This tells the computer that it should send its DNS queries to
53). That could be the router's IP address (if it can handle DNS resolution), the ISP's IP address, or any other DNS resolver, eg. CloudFlare's DNS service
Now that we know the DNS server, all that's left is to query it for an
A record, which would hopefully result in
example.com's IP address. The command
dig can be used to retrieve resource records:
$ dig +short example.com. A 220.127.116.11
What the previous command does would be similar to the following steps, but we'll ignore caching to illustrate the recursive nature of DNS. Actual steps depend on the nameserver's implementation.
192.168.1.1, asking for
192.168.1.1is not an authority for
example.com, so it doesn't know its records;
192.168.1.1wants to make a new DNS query to
example.com.'s nameserver, but it doesn't know what it is, so it needs to first ask
192.168.1.1doesn't know either how to reach
com., so it first needs to ask
192.168.1.1actually knows where to reach
., because recursive loops need an exit point, and all DNS servers are configured with at least one hard-coded IP address for the root zone. It makes a DNS query to
., asking for
NSrecords, and receives them (eg.
a.gtld-servers.net.) along with their IP addresses;
192.168.1.1makes a DNS query to one of
com.'s nameservers, asking for
NSrecords, and receives them (eg.
iana-servers.net., which helpfully includes the desired IP addresses as well;
192.168.1.1makes a DNS query to one of
example.com.'s nameservers, asking for
Arecord, and receives
18.104.22.168as a response;
These steps are listed next:
# Already known: . A 22.214.171.124 dig +norecurse @126.96.36.199 com. NS # a.gtld-servers.net. A 188.8.131.52 dig +norecurse @184.108.40.206 example.com. NS # example.com. NS a.iana-servers.net. dig +norecurse @220.127.116.11 net. NS # a.gtld-servers.net. A 18.104.22.168 dig +norecurse @22.214.171.124 iana-servers.net. NS # a.iana-servers.net. A 126.96.36.199 dig +norecurse @188.8.131.52 example.com. A # example.com A 184.108.40.206
All of this, in order to get
example.com's IP address! With proper caching, these steps are clearly a lot simpler, but they show an overview of how DNS works.
Now that we have the target IP address, a network packet can finally be created, but how can we send it to that IP address? How does it actually reach its destination?
The first step for the network packet to
example.com is to leave the computer, and find out where to go next. From the point of view of the computer, this part is just a series of networking rules.
Let's first look at the routing table. I've removed IPv6 and broadcast entries for brevity.
$ ip route show table all default via 192.168.1.1 dev enp5s0 proto dhcp metric 100 192.168.1.0/24 dev enp5s0 proto kernel scope link src 192.168.1.201 metric 100 local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1 local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1 local 192.168.1.201 dev enp5s0 table local proto kernel scope host src 192.168.1.201
With this routing table and the target IP address, a computer can find out where next to send the packet. On each line, the keyword following
dev (for "device") determines the interface that the packet should go through. These usually lead to a NIC but don't need to.
Let's pretend we're trying to send a network packet to
localhost, which was resolved to
/etc/hosts. Looking at the routing table, the target IP address
127.0.0.1 matches exactly with one of the lines:
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1
This tells us that the packet will go through the
lo interface. Now, what's the
$ ip link show lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
lo does not have any hardware behind it and is just a virtual interface that is used for loopback. It works by sending back to itself all of its packets, with the IP addresses and port numbers reversed.
Let's say we try to reach another computer, one on the local network, whose IP address happens to be
192.168.1.210. There is a line in the routing table that matches this IP address with the CIDR
192.168.1.0/24 dev enp5s0 proto kernel scope link src 192.168.1.201 metric 100
It means that these packets should be sent to the interface
$ ip link show enp5s0 2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 00:01:02:03:04:05 brd ff:ff:ff:ff:ff:ff
Right now, this interface has the computer's NIC card behind it, the one with MAC address
00:01:02:03:04:05, so all packets sent to
enp5s0 will go out through that NIC.
192.168.1.210 is in the same network, we only need the target computer's NIC's MAC address. It may be found in the cache, but if it isn't, it will be retrievable via an ARP broadcast:
$ ip neigh show 192.168.1.210 dev enp5s0 lladdr 08:09:0a:0b:0c:0d REACHABLE
All of this stays within the local network, in my case keeping the communications in my router's switch.
Let's finally continue with our network packet to
example.com. There are no routes that match
220.127.116.11, so what happens now? There's one route, though, that was a bit different than the others:
default via 192.168.1.1 dev enp5s0 proto dhcp metric 100
This line means that all IP addresses that match none of the routes are sent to the gateway at
192.168.1.1 through the
enp5s0 interface. This is the default gateway, in my case a router, whose job it is to find out where to route these packets.
Now typically, that packet will go out of the router, through the modem, reach an ISP, and go into the wild Internet to somehow finally arrive at the target server. That's way too high-level though, so let's dig deeper into the last piece of technology.
The path a network packet takes is solved by an interesting, albeit daunting, technology: the Border Gateway Protocol, or BGP. BGP is how a network packet gets routed throughout the Internet, moving from router to router.
The Internet is separated into Autonomous Systems, or AS, which are network meshes of routers under the jurisdiction of an ISP, a large company, or a hosting provider. They are in charge of overseeing and/or assigning their allocated IP addresses.
When my router sends the packet to my ISP, their router receives it and matches it with a route in its routing table. There may be different paths available towards the destination, and the best one will be chosen. The network packet will then be sent to the next BGP router in the path, and the cycle continues, until the packet arrives at the target server.
Let's look at an actual example using
$ traceroute example.com traceroute to example.com (18.104.22.168), 30 hops max, 60 byte packets 1 pfSense.localdomain (192.168.1.1) 0.244 ms 0.268 ms 0.302 ms 2 198-84-254-97.cpe.teksavvy.com (22.214.171.124) 10.884 ms 10.867 ms 10.877 ms 3 10.170.192.53 (10.170.192.53) 14.954 ms 15.005 ms 15.038 ms 4 xe-0-1-0-0-bdr01-mtl.teksavvy.com (126.96.36.199) 14.318 ms 14.264 ms xe-5-2-1-0-bdr01-mtl.teksavvy.com (188.8.131.52) 13.974 ms 5 motl-b2-link.ip.twelve99.net (184.108.40.206) 14.930 ms 14.859 ms 14.901 ms 6 chi-b23-link.ip.twelve99.net (220.127.116.11) 33.839 ms 32.588 ms 32.610 ms 7 edgecast-ic315149-chi-b23.ip.twelve99-cust.net (18.104.22.168) 32.664 ms 33.045 ms 32.752 ms 8 ae-65.core1.chb.edgecastcdn.net (22.214.171.124) 32.955 ms ae-66.core1.chb.edgecastcdn.net (126.96.36.199) 29.664 ms 35.778 ms 9 188.8.131.52 (184.108.40.206) 31.456 ms 28.201 ms 32.559 ms 10 220.127.116.11 (18.104.22.168) 32.550 ms 30.099 ms 32.577 ms
What we can learn from this, using online tools like ipinfo.io, is that a network packet's path from my computer to
example.com goes through three AS, and would arrive at its destination in 10 hops. It should be similar to the following:
|2, 3, 4||AS5645||TekSavvy Solutions||Montreal|
|9, 10||AS15133||Edgecast||New York|
The packets leave my router to my ISP, Teksavvy, move inside their network in Montreal for a couple of hops, continue towards Chicago through a Swedish Tier 1 network named Arelion, to finally enter Edgecast's network.
If I do the same from a server hosted in a Toronto data center:
|1, 2, 3||AS14061||Digital Ocean||Toronto|
|4, 5||AS6453||Tata Communications||Toronto|
|7, 8, 9, 10||AS6453||Tata Communications||New York|
|11, 12, 13||AS15133||Edgecast||New York|
Because a BGP router can make any and all decisions it wants about where next to send a packet, the path a packet will take is not deterministic. In the previous example (from a Digital Ocean droplet), the third hop can go through different routers:
22.214.171.124. Which one is chosen depends on the previous router's decision, may have considered many things, including routing rules, latency, congestion, etc.
This also means that a router or server is not necessarily in a definite location, and multiple servers in different cities can receive requests targeting the same IP address. This is one way to implement a Content Delivery Network, or CDN, which appears to be the case for
example.com -- the server in New York seems to be a CDN server for the origin server located in Los Angeles. I can trigger a different path if I use a VPN in Vancouver, which results in a path that stays near the west coast, indeed ending up in Los Angeles. Which server is the origin server may be impossible to verify, though, and could even change at any time.
From a VPN in Ukraine, the packets are routed through several cities in Europe, then in a transatlantic cable towards Boston, before ending up in New York.
The network packet to
example.com just reached the target server, which can build and send back a response, if it wants. For that packet to get all the way to that server, though, it had a lot of things to go through.
We've seen the computer figure out the IP address from a hostname, and followed the packet from the computer to the target server.
126.96.36.199 was done with DNS, a hierarchical system that decentralizes domain name resolution. With more than 350 million domain names accessible from the Internet, DNS allows everyone to have access to most of them by only remembering a short name.
Finding the path from the ISP to the target server required BGP, a decentralized planetary network mesh that boils down to simple routing decisions. With more than 100,000 AS that take part in BGP, this makes for a constantly evolving network defined by routing tables containing a million entries.
It's easy to forget how the Internet works, and despite its complexity, these systems allow communications across the world as quickly as a few tens of microseconds.
November 28, 2022
How Colima is a good alternative to Docker DesktopLire maintenant