Wednesday, January 11, 2012

The problem with document.cookie

Accessing/modifying HTTP cookies with JavaScript is possible through the document.cookie interface which permits two operations: retrieving all of the cookies that are accessible from the current document domain and path; and setting/updating an individual cookie. This interface is limited and can cause unexpected behavior if used without a thorough understanding of the interface and the W3C spec for HTTP cookies.

Browser access/management of cookies is specified by the RFC 2109 spec which states:
  • Cookie uniqueness is controlled by the combination of the cookie's name, domain, and path.
  • When setting a cookie with a specified domain, it must begin with a dot (.jasoncust.com).
  • The domain will default to the host domain's location (www.jasoncust.com).
  • The path will default to host domain's path up to but not including the right most '/'.
  • Cookie access is restricted to the cookie path being a prefix of the document's full path (since paths were originally directory structures, cookies were accessible only from subdirectories).
  • Similarly, cookie access is restricted to the cookie domain being a suffix of the document's domain. Note this is only true for cookie domains starting with the required dot (.jasoncust.com). If a cookie domain was set by default to say 'jasoncust.com', the cookie would not be accessible from any subdomains.
That last two points could use a little more elaboration as people tend to gloss over them. Regarding just domains (assuming all cookie paths are set to '/'), if your document location is 'www.jasoncust.com', only cookies with domains set to either 'www.jasoncust.com' or '.jasoncust.com' are accessible. Any other subdomains including 'jasoncust.com' are not accessible. Combining paths with domains, if the document location is 'www.jasoncust.com/some/path/', only cookies with a domain of 'www.jasoncust.com' (or '.jasoncust.com') and paths of '/', '/some' or '/some/path' are accessible.

Retrieving cookie values is easy enough through the interface, simply go to www.nytimes.com, open a console and then enter document.cookie to see a string containing all of the cookies accessible from the current host domain and path. It will look something like:
> document.cookie
"RMID=0734175263b54f0d07f7801a; adxcs=s*2b53d=0:1; adxcl=t*2b53d=4f32014f:1326254071|ti=4f32014f:1326254071"
The format of the returned string is name=value with multiple cookies joined by a semicolon + space ('; '). So in order to get a particular cookie's value, a bit of parsing is required. One major caveat is only the cookie's value is returned. The cookie's path, domain, secure setting, and the expiration/max-age are not returned through the interface which can be problematic.

Why is this problematic? Lets walk through an example, albeit a trivial one, to illuminate this issue. Starting with two cookies:

name value domain path
uuid 19691231 .nytimes.com /
xid 42 .nytimes.com /

If the document location is 'www.nytimes.com', then the returned cookies will be:
> document.cookie
"uuid=19691231; xid=42"
To set or update a cookie a string representing the cookie is assigned to document.cookie. For example, to update the 'xid' cookie to 'lorum', a first approach might look like:
> document.cookie = "xid=lorum";
Checking the cookie value returns:
> document.cookie
"uuid=19691231; xid=42; xid=lorum"
What happened? There are now two cookies with the name 'xid'. Looking at the table again we see why this happened:

name value domain path
uuid 19691231 .nytimes.com /
xid 42 .nytimes.com /
xid lorum www.nytimes.com /

The two cookies while sharing the same name have a different subdomain. Since the domain wasn't specified in the assignment, it defaulted to the host domain ('www.nytimes.com') in the example. So in order to update an existing cookie the same domain and path originally assigned to the cookie must be used. However no meta data is returned via document.cookie. This means any cookies that are to be manipulated by JavaScript need to have a set domain and path (probably the root domain and path) that both the server and the JavaScript code use when setting cookies. This also applies to any optional cookie settings such as the expiration and the secure flag.

Assuming for some reason it was desirable to have two cookies with the same name but a different subdomain and/or path, how do you know which cookie is what? Since document.cookies does not return the domain or path of the cookies, there is no way to tell. This may seem trivial, but what if an operation needed to clear or overwrite the 'xid' cookie with the value 'lorum'? How would it do so?

tl;dr

When using the document.cookie interface, the lack of any meta data about the cookies returned can cause issues if not properly designed around. Two key points to remember are:
  1. Cookies with the same name but different subdomains and/or paths are allowed but this meta information is not returned by the interface. So there is no way to tell which cookie value is for what domain/path.
  2. When updating a cookie, the original cookie is replaced if the name/domain/path match. If they don't match, a new cookie with the same name but different domain/path will be created. Also, when replacing a cookie, any other meta data is overwritten even if not explicitly stated.
Should there be an update to the interface for ECMA5? I'm not positive since cookies should only be used if the data needs to be included in every request to/from the server. Otherwise newer APIs such as the Storage APIs probably are a better design choice and don't have this issue. Still, it would be a nice addition to have access to the meta data for each cookie when you do need to work with them from both the client and the server.

11 comments: