Thursday, April 26, 2012

The Lion Who Was An Insomniac

Sometime over the last couple weeks my MacBook Pro running OS X Lion (10.7.3) decided it no longer needed to sleep. While peculiar, I didn't worry about it since I had just installed some software recently and figured that might be the reason for my laptop's new found bout of insomnia. Well, two weeks later and my poor little guy hadn't slept a wink.

Everyday I'd close the lid and go about my business only to find it still purring along waiting for my return. While I do love when my pets are excited to see me, this seemed to be a bit of a waste. So, like any good techie, I backtracked everything I had done over the last two weeks to see what could be the pea in my laptop's mattress. I checked the software I installed (nope), the updates I applied (nope), the accessories I had plugged in (nope), the second monitor (nope), on and on but nothing seemed to be the thorn in his side.

Now, I could have just restarted my computer to see if that resolved the issue. Two things prevented me from doing this:
1. I am not on Windows.
2. Where is the fun in that?

So, digging into my techie war chest it was time to hit the old Terminal and get under the hood. First things first, pmset is a godsend for this exact issue. If something is keeping your mac awake, BAM! you know what it is. Why I didn't just do this from the beginning I don't know aside from maybe I am a masochist.

$ pmset -g assertions
4/25/12 11:35:13 PM EDT 
Assertion status system-wide:
   ChargeInhibit                           0
   PreventUserIdleDisplaySleep             0
   PreventUserIdleSystemSleep              1
   NoRealPowerSources_debug                0
   CPUBoundAssertion                       0
   EnableIdleSleep                         1
   PreventSystemSleep                      1
   DisableInflow                           0
   DisableLowPowerBatteryWarnings          0
   ExternalMedia                           0

Listed by owning process:
  pid 50: [0x0000012c00000032] PreventSystemSleep named: "org.cups.cupsd"

The last line in the output is the smoking gun. A cupsd process was preventing my laptop from taking a much deserved nap.

Who, what and why? CUPS allows macs to act as print servers essentially allowing the computer to queue up print jobs. So, more than likely there was a stalled print job that was keeping the process open, waiting to complete, before allowing the system to go to sleep (which makes logical sense to prevent a print job from being interrupted).

I opened up my print queue from the preferences pane and voila, a stalled job was sitting there mocking me. After quickly killing this unruly guest and checking pmset again:

$ pmset -g assertions
4/25/12 11:37:05 PM EDT 
Assertion status system-wide:
   ChargeInhibit                           0
   PreventUserIdleDisplaySleep             0
   PreventUserIdleSystemSleep              0
   NoRealPowerSources_debug                0
   CPUBoundAssertion                       0
   EnableIdleSleep                         1
   PreventSystemSleep                      0
   DisableInflow                           0
   DisableLowPowerBatteryWarnings          0
   ExternalMedia                           0

No more processes or assertion statuses indicating the system was preventing itself from sleeping. So, with baited breath I closed the lid and waited a couple seconds...pure silence. No purring, no whirling, just silence. My laptop had finally been cured of it's insomnia.

Wednesday, January 11, 2012

The problem with document.cookie

Accessing/modifying HTTP cookies with JavaScript is possible through the document.cookie interface which permits two operations: retrieving all of the cookies that are accessible from the current document domain and path; and setting/updating an individual cookie. This interface is limited and can cause unexpected behavior if used without a thorough understanding of the interface and the W3C spec for HTTP cookies.

Browser access/management of cookies is specified by the RFC 2109 spec which states:
  • Cookie uniqueness is controlled by the combination of the cookie's name, domain, and path.
  • When setting a cookie with a specified domain, it must begin with a dot (.jasoncust.com).
  • The domain will default to the host domain's location (www.jasoncust.com).
  • The path will default to host domain's path up to but not including the right most '/'.
  • Cookie access is restricted to the cookie path being a prefix of the document's full path (since paths were originally directory structures, cookies were accessible only from subdirectories).
  • Similarly, cookie access is restricted to the cookie domain being a suffix of the document's domain. Note this is only true for cookie domains starting with the required dot (.jasoncust.com). If a cookie domain was set by default to say 'jasoncust.com', the cookie would not be accessible from any subdomains.
That last two points could use a little more elaboration as people tend to gloss over them. Regarding just domains (assuming all cookie paths are set to '/'), if your document location is 'www.jasoncust.com', only cookies with domains set to either 'www.jasoncust.com' or '.jasoncust.com' are accessible. Any other subdomains including 'jasoncust.com' are not accessible. Combining paths with domains, if the document location is 'www.jasoncust.com/some/path/', only cookies with a domain of 'www.jasoncust.com' (or '.jasoncust.com') and paths of '/', '/some' or '/some/path' are accessible.

Retrieving cookie values is easy enough through the interface, simply go to www.nytimes.com, open a console and then enter document.cookie to see a string containing all of the cookies accessible from the current host domain and path. It will look something like:
> document.cookie
"RMID=0734175263b54f0d07f7801a; adxcs=s*2b53d=0:1; adxcl=t*2b53d=4f32014f:1326254071|ti=4f32014f:1326254071"
The format of the returned string is name=value with multiple cookies joined by a semicolon + space ('; '). So in order to get a particular cookie's value, a bit of parsing is required. One major caveat is only the cookie's value is returned. The cookie's path, domain, secure setting, and the expiration/max-age are not returned through the interface which can be problematic.

Why is this problematic? Lets walk through an example, albeit a trivial one, to illuminate this issue. Starting with two cookies:

name value domain path
uuid 19691231 .nytimes.com /
xid 42 .nytimes.com /

If the document location is 'www.nytimes.com', then the returned cookies will be:
> document.cookie
"uuid=19691231; xid=42"
To set or update a cookie a string representing the cookie is assigned to document.cookie. For example, to update the 'xid' cookie to 'lorum', a first approach might look like:
> document.cookie = "xid=lorum";
Checking the cookie value returns:
> document.cookie
"uuid=19691231; xid=42; xid=lorum"
What happened? There are now two cookies with the name 'xid'. Looking at the table again we see why this happened:

name value domain path
uuid 19691231 .nytimes.com /
xid 42 .nytimes.com /
xid lorum www.nytimes.com /

The two cookies while sharing the same name have a different subdomain. Since the domain wasn't specified in the assignment, it defaulted to the host domain ('www.nytimes.com') in the example. So in order to update an existing cookie the same domain and path originally assigned to the cookie must be used. However no meta data is returned via document.cookie. This means any cookies that are to be manipulated by JavaScript need to have a set domain and path (probably the root domain and path) that both the server and the JavaScript code use when setting cookies. This also applies to any optional cookie settings such as the expiration and the secure flag.

Assuming for some reason it was desirable to have two cookies with the same name but a different subdomain and/or path, how do you know which cookie is what? Since document.cookies does not return the domain or path of the cookies, there is no way to tell. This may seem trivial, but what if an operation needed to clear or overwrite the 'xid' cookie with the value 'lorum'? How would it do so?

tl;dr

When using the document.cookie interface, the lack of any meta data about the cookies returned can cause issues if not properly designed around. Two key points to remember are:
  1. Cookies with the same name but different subdomains and/or paths are allowed but this meta information is not returned by the interface. So there is no way to tell which cookie value is for what domain/path.
  2. When updating a cookie, the original cookie is replaced if the name/domain/path match. If they don't match, a new cookie with the same name but different domain/path will be created. Also, when replacing a cookie, any other meta data is overwritten even if not explicitly stated.
Should there be an update to the interface for ECMA5? I'm not positive since cookies should only be used if the data needs to be included in every request to/from the server. Otherwise newer APIs such as the Storage APIs probably are a better design choice and don't have this issue. Still, it would be a nice addition to have access to the meta data for each cookie when you do need to work with them from both the client and the server.

Friday, January 6, 2012

IIFEs, Closures, Unit Testing and Privacy

One of the toughest decisions within JavaScript code design is the choice between testability and privacy. It's definitely one of the most frustrating aspects of JavaScript for me personally (and that is definitely saying something). This choice can lead to a less than ideal design if unit testing is important to you... which it should be.

Lets start with an example to see how this choice can affect design. We are going to create an object that will represent a basic parallelogram that requires as input the measurements of two adjacent sides and an angle. Assuming the second measurement is the base, the object will internally compute the height and area of the parallelogram from the inputs on object creation exposing the original measurements and the two new computed values.

A typical design pattern for an object in JavaScript is to use an IIFE (Immediately Invoked Function Expression) to create a closure so we don't pollute the global namespace. A simple design (we aren't concerning ourselves with optimizations for this discussion) could look like:
(function( win, undef ) {
  // Global namespace object name
  var objName = 'myAwesomeParallelogram';

  // Compute the height of a parallelogram where length is the adjacent side
  // of the base and the angle is acute in radians length * sin(angle)
  function height( length, angle ) {
    return length * Math.sin( Math.min( angle, 180 - angle ) * Math.PI/180 );
  }

  // Compute the area of a parallelogram b * h
  function area( base, height ) {
    return base * height;
  }

  var obj = function( sideA, sideB, angle ) {
    if ( angle < 0 || angle > 180 ) {
      throw new SyntaxError('Angle is not between 0 and 180 degrees');
    }

    this.sideA = sideA;
    this.sideB = sideB;
    this.angle = angle;

    this.height = height( sideA, angle );
    this.area = area( sideB, this.height );
  };

  win[ objName ] = obj;
})(window);
Sample usage:
> var p = new myAwesomeParallelagram( 15, 12, 45);
undefined
> p.area
127.27922061357853
> p.height
10.606601717798211
Great! But aside from testing the object itself, how would we be able to test if height or area work correctly? How would we test the code flow? We could redesign our code to expose these functions and we could likewise include some logging calls to console to show the flow. But, is this what we want in production? Is it necessary?

There have been a few clever ways of handling this dilemma (I know I rolled a few of my own over the years) but Mr. Douglas Crockford (the author of JavaScript: The Good Parts and JSLint) has just released a new tool to help end the Sophie's choice between good design/security and testability. It's called JSDev and it allows you to include specially formated comments that can be transformed into development unit testing code and then removed through normal minification processes.

To use the new tool we first need to download the raw file (or use git) and compile it locally for a CLI.
$ curl 'https://raw.github.com/douglascrockford/JSDev/master/jsdev.c' -o jsdev.c && gcc jsdev.c -o jsdev
Now that we have a compiled version of jsdev we can edit our original object code to include development only code for unit testing.
(function( win, undef ) {
  // Global namespace object name
  var objName = 'myAwesomeParallelogram';

  // Compute the height of a parallelogram where length is the adjacent side
  // of the base and the angle is acute in radians length * sin(angle)
  function height( length, angle ) {
    return length * Math.sin( Math.min( angle, 180 - angle ) * Math.PI/180 );
  }

  // Compute the area of a parallelogram b * h
  function area( base, height ) {
    return base * height;
  }

  var obj = function( sideA, sideB, angle ) {
    if ( angle < 0 || angle > 180 ) {
      throw new SyntaxError('Angle is not between 0 and 180 degrees');
    }

    this.sideA = sideA;
    this.sideB = sideB;
    this.angle = angle;

    this.height = height( sideA, angle );
    this.area = area( sideB, this.height );
  };

  // JSDev code comments
  /*dev
    obj.height = height;
    obj.area = area;
  */
 
  win[ objName ] = obj;
})(window);
Now if we process it through or handy dandy new tool:
$ ./jsdev dev -comment "Development Version" < input.js > output-dev.js
The new file, output-dev.js looks like:
// Development Version
(function( win, undef ) {
  // Global namespace object name
  var objName = 'myAwesomeParallelogram';

  // Compute the height of a parallelogram where length is the adjacent side
  // of the base and the angle is acute in radians length * sin(angle)
  function height( length, angle ) {
    return length * Math.sin( Math.min( angle, 180 - angle ) * Math.PI/180 );
  }

  // Compute the area of a parallelogram b * h
  function area( base, height ) {
    return base * height;
  }

  var obj = function( sideA, sideB, angle ) {
    if ( angle < 0 || angle > 180 ) {
      throw new SyntaxError('Angle is not between 0 and 180 degrees');
    }

    this.sideA = sideA;
    this.sideB = sideB;
    this.angle = angle;

    this.height = height( sideA, angle );
    this.area = area( sideB, this.height );
  };

  // JSDev code comments
  {
    obj.height = height;
    obj.area = area;
  }
 
  win[ objName ] = obj;
})(window);
Before was continue there are a few things to notice between the input files, the command and the output file:
  • The comment from the CLI is now a header comment in the output file. This is handy to always include the type of output file in the file.
  • The other arguments for the CLI specify which multi line comments that match the format /*<argument> <code>*/ to include in the output file. This is handy if you have different testing levels or needs. Note that there cannot be a space between the opening comment notation and the argument.
  • Single line comment notations ("//") are unaffected by the jsdev tool as well as multi line comments that don't match the specified format above.
Now lets see if we can access our development code private functions using a console.
> myAwesomeParallelogram.area
  function area( base, height ) {
    return base * height;
  }
> myAwesomeParallelogram.height
  function height( length, angle ) {
    return length * Math.sin( Math.min( angle, 180 - angle ) * Math.PI/180 );
  }
And when we minify the original code we get this (using JSMin for this example):
(function(win,undef){var objName='myAwesomeParallelogram';function height(length,angle){return length*Math.sin(Math.min(angle,180-angle)*Math.PI/180);}
function area(base,height){return base*height;}
var obj=function(sideA,sideB,angle){if(angle<0||angle>180){throw new SyntaxError('Angle is not between 0 and 180 degrees');}
this.sideA=sideA;this.sideB=sideB;this.angle=angle;this.height=height(sideA,angle);this.area=area(sideB,this.height);};win[objName]=obj;})(window);
We no longer need to sacrifice our design and security for testability. Is it the most elegant solution? Maybe not, but it works well and can be easily added to a build process for testing and will not impact your current build process for production.

For more information and more usage info, please refer to the README file on github.

Thursday, January 5, 2012

Chrome, Prerender and Site Stats

As Chrome keeps inching towards market dominance, it is definitely helping to move the web forward and speeding it up along the way. One of the interesting ways they have helped speed up the web (as of Chrome 13) is through the use of "prerendering". Basically, prerendering will begin to download a page that is linked from the current page via link declarations on the current page like so:
<link rel="prerender" href="/someOtherAwesomeness.html">
<link rel="prerender" href="http://anotherAwesome.website.com">
These pages will now be downloaded in the background while the user is currently on the page theses are linked from. When/if the user follows a link to one of those pages, it will feel nearly instantaneous to the user since the page is already loaded.

Pretty interesting trick on Google's part to make Chrome feel faster. As one can imagine, to avoid unnecessary overhead to websites linked from one page, discretion should be practiced in selecting what if any pages should be prerendered from the current page being viewed. Easy candidates are next-type pages and other links that have a high click through rate.

How does this impact site stats? It depends on whether you can truly count a page view if a page was prerendered but then never seen? If you just care about inflated numbers, then this is probably a welcome exploit to increasing views across a larger number of pages even though a person truly only viewed one page. But, if you do care about them for being able to understand user interaction with your content, this could skew your numbers in a horrible way. These false views will increase the denominator in your analytics formulas without a chance of increase to the numerator, thus making your impact for user actions on those pages seem less effective.

The good news is that individual sites will have to enable this in their code for now and hence adoption will be slow. But with Google search results pages utilizing this feature already and the Chrome Beta released today enabling it from the omnibox, these false views will quickly start to accumulate.

So how do you prevent that from happening? Depends on how you determine page views.

If you use Google Analytics, you are already covered with the 2011-07 release (July 26, 2011) as it uses the Page Visibility API (but only with the webkit prefix, hopefully they keep up with other vendor releases as well).

If not, you can roll your own by using the Page Visibility API yourself. Continuing on form the example from my previous post, we can build a simple check for determining whether or not a page has been prerendered:
var visibilityAPI = ( typeof document.hidden != 'undefined' && { 'visibilitystate': 'visibilitystate', 'visibilitychange': 'visibilitychange' } ) || ( typeof document.webkitHidden != 'undefined' && { 'visibilitystate': 'webkitvisibilitystate', 'visibilitychange': 'webkitvisibilitychange' } ) || ( typeof document.mozHidden != 'undefined' && { 'visibilitystate': 'mozvisibilitystate', 'visibilitychange': 'mozvisibilitychange' } ) || ( typeof document.msHidden != 'undefined' && { 'hidden': 'msHidden', 'visibilitystate': 'msvisibilitystate', 'visibilitychange': 'msvisibilitychange' } );

(function pageVisibilityChanged() {
    if ( document[ visibilityAPI.visibilitystate ] === "prerender" ) {
        // Application has been prerendered
        // Add listener to fire when page is no longer in prerender state
        document.addEventListener( visibilityAPI.visibilitychange, pageVisibilityChanged, false );
    }
    else {
        // Page is no longer in presender state
        // A page view can be counted now
        document.removeEventListener( visibilityAPI.visibilitychange, pageVisibilityChanged, false );
    }
})();
There is still one problem with this example: the person could still never see your site though even after the follow one of the links. For example, they could open the page in a new tab behind their current one and then close it before they view it. Yes, this is probably an edge case, but you could easily expand the previous example to only count when the page is truly viewed.
var visibilityAPI = ( typeof document.hidden != 'undefined' && { 'hidden': 'hidden', 'visibilitystate': 'visibilitystate', 'visibilitychange': 'visibilitychange' } ) || ( typeof document.webkitHidden != 'undefined' && { 'hidden': 'webkitHidden', 'visibilitystate': 'webkitvisibilitystate', 'visibilitychange': 'webkitvisibilitychange' } ) || ( typeof document.mozHidden != 'undefined' && { 'hidden': 'mozHidden', 'visibilitystate': 'mozvisibilitystate', 'visibilitychange': 'mozvisibilitychange' } ) || ( typeof document.msHidden != 'undefined' && { 'hidden': 'msHidden', 'visibilitystate': 'msvisibilitystate', 'visibilitychange': 'msvisibilitychange' } );

(function pageVisibilityChanged() {
    if ( document[ visibilityAPI.visibilitystate ] === "prerender" ) {
        // Application has been prerendered
        // Add listener to fire when page is no longer in prerender state
        document.addEventListener( visibilityAPI.visibilitychange, pageVisibilityChanged, false );
    }
    else if ( !document[ visibilityAPI.hidden ] ) {
        // Page is no longer in presender state or hidden
        // A true page view can be counted now
        document.removeEventListener( visibilityAPI.visibilitychange, pageVisibilityChanged, false );
    }
})();
Now you are one small step closer to more accurate reporting!

Sunday, January 1, 2012

Can you see me? Can you see me now?

What is the Page Visibility API and what is it good for? It is a JavaScript API that allows developers to check if their page/application is visible and to attach an event listener for visibility state changes. What is that useful for? Well, lets discuss web applications in general for a moment to see how and why it is something developers should be concerned with.

Every window or tab a user opens for services like Google+, Facebook, Gmail, Twitter, and Reddit increases the resource requirement for each application to poll, update, animate and perform other intensive operations. The big question is do end-users really have to pay the resource costs for each of these web applications?

To answer this question, we can ask a few more questions about web applications in general:
  1. Do they need to constantly run?
  2. What operations need to be run and at what rate?
  3. Can these operation rates be altered based on interaction types/levels?
  4. What does user interaction entail?
First, let me state this is not a blanket solution nor an advocacy that all applications should be handled in this manner. That said, developers should think about not only how their applications are actively used but also what they need to do when not in active use.

Second, lets use a simple example to provide a context for pondering these questions. A basic news or message feed would work well enough. It is based around a data stream that is updated both asynchronously and irregularly so some form of polling and updating the stream with new items is required. Also, dynamic animation showing new items will be used to display these updates to an end-user.

Finally, the first three questions are in one way or another predicated on the answer for the fourth so we will begin there.

What does user interaction entail? The obvious cases are when a user is interacting with the application using an input device (mouse, keyboard, mic, camera, finger, stylus, etc.). But can someone use an application without interacting with it directly? What if it is open and updating with new items that the user reads as the show up on the screen? So, the commonality amongst all of these use cases for our application is that the user needs to see the application in order to use it.

Thinking about that a bit more generally, consider all of the tabs a user has open at any given moment. If only one is visible at a time, why do the rest need to do anything at all (aside from pending operations a user has queued up)? Shouldn't the visible tab be the sole active window?

So a more general question arises: if a user cannot see our application, does it need to run if at all?

For our application at least, this makes perfect sense. Why waste resources continuously polling and updating (especially with animations) if the user can't see it? Couldn't we fall back to less frequent polling with no special effects for updating the page? Seems reasonable with little to no user impact. In fact, since the user can't see the page, how would they know.

The only problem is how do we know if the user can see the page? Well, thanks to the Page Visibility API proposal, we can do this today (through browser prefixes at the moment). The API is available with a webkit prefix in Chrome 13+, a ms prefix in IE10+ and a moz prefix in FF10+.

For the example below, using the non-prefix version (which will be the standard) and only checking if the document is visible or not.
function pageVisibilityChanged() {
  if ( document.hidden ) {
    // Application is not visible to the user
    // Adjust polling rates and display update for inactive display mode
  }
  else {
    // Application is visible to the user
    // Adjust polling rates and display update for active display mode
  }
}

document.addEventListener( 'visibilitychange', pageVisibilityChanged, false );
That's it. Really. That's all you need to do to be a great application neighbor and also help reduce energy use by reducing unnecessary resource use by your application.

UPDATE: Here is a browser prefix version for those who want to play today with Chrome 13+, IE10+ and FF10+ and any browser that implements the standard.
var visibilityAPI = ( typeof document.hidden != 'undefined' && { 'hidden': 'hidden', 'visibilitychange': 'visibilitychange' } ) || ( typeof document.webkitHidden != 'undefined' && { 'hidden': 'webkitHidden', 'visibilitychange': 'webkitvisibilitychange' } ) || ( typeof document.mozHidden != 'undefined' && { 'hidden': 'mozHidden', 'visibilitychange': 'mozvisibilitychange' } ) || ( typeof document.msHidden != 'undefined' && { 'hidden': 'msHidden', 'visibilitychange': 'msvisibilitychange' } );

function pageVisibilityChanged() {
  if ( document[ visibilityAPI.hidden ] ) {
    // Application is not visible to the user
    // Adjust polling rates and display update for inactive display mode
  }
  else {
    // Application is visible to the user
    // Adjust polling rates and display update for active display mode
  }
}

document.addEventListener( visibilityAPI.visibilitychange, pageVisibilityChanged, false );

Thursday, December 29, 2011

Bash sort... what the hell is up with tab separators?

UPDATE: I have updated the sort Wiki page to include an example for tab separated sorting.

In the world of Unix shells there exists a very common one called Bash. And within the Bash shell there are a whole host of commands that can be used both on the command line and in a script file. These commands run the gamut for what they can do, but there is a small subset that most people find their daily lives centered around when it comes to programming or hacking out solutions such as sed, cut, and sort.

Fairly common inputs to these commands are delimited-separated values such as CSVs and TSVs. Thankfully almost every command that you would use for these formats allows you to specify the delimiter.

For instance, say we have a TSV file called phonebook that contains the name and number for each contact:
$ cat phonebook 
Smith, Brett 555-4321
Doe, John 555-1234
Doe, Jane 555-3214
Avery, Cory 555-4132
Fogarty, Suzie 555-2314
With cut you could get just the names if you wanted:
$ cut -f1 phonebook 
Smith, Brett
Doe, John
Doe, Jane
Avery, Cory
Fogarty, Suzie
How did it know what the delimiter was? Luckily with cut the default delimiter is the tab character. What if it's not though? Looking at the man page for cut gives us:
$ man cut
...
     -d delim
             Use delim as the field delimiter character instead of the tab character.
...
So, say you wanted just the last name for everyone. Well, you can pipe the output from the first command to a second command to do just that! The only difference is now we are going to specify the delimiter to be a comma for the second command.
$ cut -f1 phonebook | cut -f1 -d ','
Smith
Doe
Doe
Avery
Fogarty
Nice! Now, lets check out sort. First, lets sort our phonebook by last name:
$ sort -k1,1 phonebook 
Avery, Cory 555-4132
Doe, Jane 555-3214
Doe, John 555-1234
Fogarty, Suzie 555-2314
Smith, Brett 555-4321
That works well enough. Now lets sort by phone numbers:
$ sort -k2,2 phonebook 
Smith, Brett 555-4321
Avery, Cory 555-4132
Doe, Jane 555-3214
Doe, John 555-1234
Fogarty, Suzie 555-2314
Well... that's not right. It sorted by first name instead. Hmmmmm. Looking at the man page we see:
$ man sort
...
       -t, --field-separator=SEP
              use SEP instead of non-blank to blank transition
...
OK, lets add our trusty tab character:
$ sort -k2,2 -t '\t' phonebook 
sort: multi-character tab `\\t'
Uhhhhh... multi-character? Looks like sort doesn't interpret '\t' as a tab character, but instead a literal '\' and 't'. In another way:
$ echo -n '\t' | hexdump -c
0000000   \   t                                                        
0000002
So, how do we set the separator to be a tab character? The beginner's bash guide provides some guidance on that:
3.3.5. ANSI-C quoting

Words in the form "$'STRING'" are treated in a special way. The word expands to a string, with backslash-escaped characters replaced as specified by the ANSI-C standard. Backslash escape sequences can be found in the Bash documentation.
Using our echo example again:
$ echo -n $'\t' | hexdump -c
0000000  \t                                                            
0000001
Yup, one character. Trying out phone number sort again:
$ sort -k2,2 -t $'\t' phonebook 
Doe, John 555-1234
Fogarty, Suzie 555-2314
Doe, Jane 555-3214
Avery, Cory 555-4132
Smith, Brett 555-4321
BINGO! Now we are truly sorting on the second column.

tl;dr

Turns out sort is similar to echo in that by default escaped characters are interpreted as two character literals rather than the intended escaped character. While echo -e does provide a means to do so, sort does not. So we must use the ANSI-C quoting (or some other means).

Saturday, July 9, 2011

Robot Chicken - Star Wars Episode III

Robot Chicken: Star Wars IIIHere we go again. I really can not get enough of these guys. I won't break it down this time. Just sit back and enjoy as Palpatine looks back on his life: full video at adultswim.com.