Url encoding in so many ways
After over ten years of web programming, I would have expected to know my way around various encoding methods. It turned out there was one more called UrlPathEncode that I hadn’t actively registered.
So here’s the hopefully final stab on the issue:
- When you’re encoding a query string parameter for an HTML link, use Server.UrlEncode (HttpServerUtility.UrlEncode). It will use the encoding known as application/x-www-form-urlencoded. Spaces will be encoded as plus signs, as specified in the section 17.13.4 of the HTML specification.
- Example:
link.NavigateUrl = "search.aspx?what=" + Server.UrlEncode(searchString)
- Example:
- When you’re encoding a path segment of the URI for an HTML link, use Server.UrlPathEncode (HttpServerUtility.UrlPathEncode). Spaces will be encoded as "%20", but otherwise the encoding is the same as above.
- Example:
link.NavigateUrl = "http://en.wikipedia.org/" + Server.UrlPathEncode(keyword)
- Example:
- When you’re encoding for any other scenario (a non-HTTP URI), use Uri.EscapeDataString.
Some background
The .NET Framework has plenty of tools for encoding data for URIs. The URI encoding has been specified in the RFC 3986 (earlier on in RFC 2396), and is intended to be protocol-agnostic. This RFC-compliant encoding has been implemented in the System.Uri class.
However, most of the URIs we create are actually intended for HTTP use. The HttpServerUtility class, exposed through the Server property in the ASP.NET Page and Control classes provides the UrlEncode and UrlPathEncode methods that are designed to be compliant with the HTTP conventions used in browsers and web servers.
One of the said conventions is that the query part of the URI has different encoding rules than the path part. In practice, a plus sign in the path means a plus, while in the query part it means a space. Spaces can be encoded as %20 in both the path and the query part, but the additional plus sign encoding exists to make URIs more legible (after all, it is commonly the query string part that has spaces encoded into it). If you need to embed a literal plus sign into a URI, it is to be represented as “%2B”.
ASP.NET uses the HttpServerUtility family of methods to encode and decode strings internally. Microsoft recommends that you do not mix the System.Uri encoding/decoding methods with the equivalent methods in System.Web, as they are not fully compatible with each other. The HttpServerUtility methods will properly decode anything encoded with either the HttpServerUtility or the Uri class methods, but the decoding methods in the Uri class will not understand the HTTP-specific rules regarding plus signs in query strings. But to keep things simple, just don’t mix them.
Special thanks to Stefan Schackow from the ASP.NET team for helping me clear this one out! (any possible remaining errors or misunderstandings are, of course, still mine)
September 15, 2009
· Jouni Heikniemi · No Comments
Tags: URI, Web · Posted in: .NET
Leave a Reply