Skip to content

Commit 1f7a437

Browse files
authored
Update 2024-12-27-URL-encoding.md
1 parent 9adc1dd commit 1f7a437

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

_posts/2024-12-27-URL-encoding.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -51,12 +51,12 @@ Here's the breakdown:
5151
</p>
5252

5353

54-
- **Protocol/URI Scheme**(`https://`): Specifies how your browser should communicate with the internet server. Common protocols include `http`/`https` (secure `http`), and `ftp` (file transfer), `mailto` (email address) and more. While network protocols are not the focus of this blog post, you can learn more about them in these articles: [Types of Network Protocols and Their Uses](https://www.geeksforgeeks.org/types-of-network-protocols-and-their-uses/), [What is a network protocol?](https://www.cloudflare.com/learning/network-layer/what-is-a-protocol/), [Uniform Resource Identifier (URI) Schemes](https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml)
55-
- **Domain**(`www.internet.com`): Identifies the server hosting the internet resource. This is typically a human readable name that maps to an IP address via DNS (Domain Name System). For a given domain, you can easily look up the corresponding IP address in DNS. For instance, on my local machine, I can use the `host` command or the `nslookup` utility to take a look at the IP addresses for Google servers. Public tools like [DNS Checker](https://dnschecker.org/) and [MX Toolbox](https://mxtoolbox.com/SuperTool.aspx) are also handy for peeking at DNS records for domains.
56-
- **Port**(`:8080`): Optional and specifies which port the server should use. The default for `http` is 80, and for `https` it's 443. Other port numbers and the correspodning protocals are `21` for `ftp`, and `22` for `ssh`, etc. The port essentially specifies which application or service on a server to connect to via the URL. If the URL is your friend’s home address, the port number can be thought of as your friend’s room.
54+
- **Protocol/URI Scheme**(`https://`): Specifies how your browser should communicate with the internet server. Common protocols include `http`/`https` (secure `http`), `ftp` (file transfer), `mailto` (email address) and more. While network protocols are not the focus of this blog post, you can learn more about them in these articles: [Types of Network Protocols and Their Uses](https://www.geeksforgeeks.org/types-of-network-protocols-and-their-uses/), [What is a network protocol?](https://www.cloudflare.com/learning/network-layer/what-is-a-protocol/), [Uniform Resource Identifier (URI) Schemes](https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml)
55+
- **Domain**(`www.internet.com`): Identifies the server hosting the internet resource. This is typically a human-readable name that maps to an IP address via DNS (Domain Name System). For a given domain, you can easily look up the corresponding IP address in DNS. For instance, on my local machine, I can use the `host` command or the `nslookup` utility to take a look at the IP addresses for Google servers. Public tools like [DNS Checker](https://dnschecker.org/) and [MX Toolbox](https://mxtoolbox.com/SuperTool.aspx) are also handy for peeking at DNS records for domains.
56+
- **Port**(`:8080`): Optional and specifies which port the server should use. The default for `http` is 80, and for `https` it's 443. Other port numbers and the corresponding protocols are `21` for `ftp`, and `22` for `ssh`, etc. The port essentially specifies which application or service on a server to connect to via the URL. If the URL is your friend’s home address, the port number can be thought of as your friend’s room.
5757
- **Path**(`/path/to/resource`): Indicates the specific location of the resource on the server. It's like the folders and files on your computer.
58-
- [**Query**(`?query=parameter`)](https://www.branch.io/glossary/query-parameters/): A set of key-value pairs used to pass information to the server. For example, a search query when we want to learn about Näive Bayes or a users preferences when making a request from the server.
59-
- [**Fragment**(`#fragment`)](https://medium.com/@dhanukasn/understanding-query-parameters-and-uri-fragments-in-urls-f6c52034b634): Refers to a specific section within the resource. For instance, it could be specific lines in a text file or a particular section or bookmark on a webpage.
58+
- [**Query**(`?query=parameter`)](https://www.branch.io/glossary/query-parameters/): A set of key-value pairs used to pass information to the server. For example, a search query when we want to learn about Näive Bayes or a user's preferences when making a request from the server.
59+
- [**Fragment**(`#fragment`)](https://medium.com/@dhanukasn/understanding-query-parameters-and-uri-fragments-in-urls-f6c52034b634): Refers to a specific section within the resource. For instance, it could be specific lines in a text file, a particular section or a bookmark on a webpage.
6060

6161
## Where Did URLs Come From?
6262

@@ -69,7 +69,7 @@ Here's the breakdown:
6969
</em>
7070
</p>
7171

72-
The concept of URLs was introduced alongside the HTTP protocol and HTML in 1992 by [Tim Berners-Lee](https://www.w3.org/People/Berners-Lee/), way for researchers to easily share and access documents.
72+
In 1992, [Tim Berners-Lee](https://www.w3.org/People/Berners-Lee/) introduced the concept of URLs alongside the HTTP protocol and HTML as a way for researchers to share and access documents easily.
7373

7474
Berners-Lee has already proposed the ideas of the [World Wide Web](https://www.w3.org/History/1989/proposal.html). However, for this network to function, it needed a standardized way to identify and locate resources. To address this need, he proposed the idea of the [URL](http://1997.webhistory.org/www.lists/www-talk.1991/0018.html) to serve as a "document identifier." The URL became one of the three core components of the web, alongside **HTML** (for structuring documents) and **HTTP** (for transferring them).
7575

@@ -83,7 +83,7 @@ This [128-character set](https://www.ascii-code.com/) included:
8383
- **Printable characters:** Uppercase (`A-Z`), lowercase (`a-z`), digits (`0-9`), and symbols like `@`, `#`, and `$`, etc.
8484
- **Control characters:** Instructions for managing text streams, such as newline (`\n`) and tab (`\t`).
8585

86-
ASCII’s simplicity and universality made it the foundation for early computer systems and networks, including the internet. However, its biggest limitation was its inability to represent non-English characters, like `é`, `ß`, or ``, as well as other writing systems like Cyrillic, Arabic, and Chinese. The biggest reason for this limitation is because at the time it was invented, memory and processing power were incredibly expensive; hence every it mattered. By sticking to 7-bits, and thus, 128 characters, [ASCII struck a balance between functionality and efficiency](https://randomtechnicalstuff.blogspot.com/2009/05/unicode-and-oracle.html). It was small enough to fit into the limited storage and memory of the time, yet comprehensive enough to provide a range of characters to work with. Likewise, it was a light-weight, simple, easy-to-implement solution and universal (at least for English-speaking developers).
86+
ASCII’s simplicity and universality made it the foundation for early computer systems and networks, including the internet. However, its biggest limitation was its inability to represent non-English characters, like `é`, `ß`, or ``, as well as other writing systems like Cyrillic, Arabic, and Chinese. The biggest reason for this limitation is because, at the time it was invented, memory and processing power were incredibly expensive; hence every bit mattered. By sticking to 7-bits, and thus, 128 characters, [ASCII struck a balance between functionality and efficiency](https://randomtechnicalstuff.blogspot.com/2009/05/unicode-and-oracle.html). It was small enough to fit into the limited storage and memory of the time yet comprehensive enough to provide a range of characters to work with. Likewise, it was a light-weight, simple, easy-to-implement solution and universal (at least for English-speaking developers).
8787

8888
### Expanded Character Sets (Beyond ASCII)
8989

@@ -98,7 +98,7 @@ ASCII’s simplicity and universality made it the foundation for early computer
9898

9999
As the internet connected the world, the need for a broader character set became apparent. This led to the development of [Unicode](https://www.translationroyale.com/the-history-of-unicode/), which could represent virtually every character in [every language](https://youtu.be/MijmeoH9LT4?feature=shared). [Unicode](https://home.unicode.org/) works with [multiple encodings](https://youtu.be/GMF2Z1EZHXk?si=5Q2JBozHR_LY3UAJ), such as UTF-8, UTF-16, and UTF-32 (You can read more about them here: [Difference between UTF-8, UTF-16 and UTF-32 Character Encoding? Example](https://javarevisited.blogspot.com/2015/02/difference-between-utf-8-utf-16-and-utf.html))
100100

101-
**UTF-8** is the most widely used encoding on the web today. It is backward-compatible with ASCII, meaning that all ASCII characters retain their original binary values, while non-ASCII characters are represented using additional bytes. For example:
101+
**UTF-8** is the most widely used encoding on the web today. It is backwards-compatible with ASCII, meaning that all ASCII characters retain their original binary values, while non-ASCII characters are represented using additional bytes. For example:
102102

103103
- The ASCII character `A` remains `01000001` in binary under UTF-8.
104104
- The Unicode character `é` is represented as `11000011 10101001` in UTF-8.
@@ -112,8 +112,8 @@ To address this, the solution was **percent-encoding**, or **URL encoding**, whi
112112
URL encoding ensures that any character—whether it's [unsafe](https://www.ietf.org/rfc/rfc1738.txt#:~:text=Other%20characters%20are%20unsafe%20because,be%20encoded%20within%20a%20URL.), [reserved](https://www.ibm.com/docs/en/cics-ts/6.x?topic=concepts-reserved-excluded-characters), or [non-ASCII](https://rbutterworth.nfshost.com/Tables/compose/)—can be safely transmitted in a URL. Here's how it works:
113113

114114
1. **Identity Characters to Encode:**
115-
- **Reserved Characters:** Characters with special meanings in URLs (e.g., `?` to start a query string, `&` to seperate parameters, `/` to seperate path components, etc.) must be encoded when used outside their context.
116-
- **Unsafe Characters:** Characters like `spaces`, `<`, `>`, `{`, `}`, etc. are unsafe because gateways and transport agents might modify them. Encoding prevents such misinterpretation.
115+
- **Reserved Characters:** Characters with special meanings in URLs (e.g., `?` to start a query string, `&` to separate parameters, `/` to separate path components, etc.) must be encoded when used outside their context.
116+
- **Unsafe Characters:** Characters like `spaces`, `<`, `>`, `{`, `}`, etc., are unsafe because gateways and transport agents might modify them. Encoding prevents such misinterpretation.
117117
- **Non-ASCII Characters:** These characters, which fall outside the ASCII set, must be encoded for compatibility across systems.
118118

119119
2. **Convert Characters to Hexadecimal ASCII:**

0 commit comments

Comments
 (0)