UPDATE: I have updated the
In the world of Unix shells there exists a very common one called Bash. And within the Bash shell there are a whole host of commands that can be used both on the command line and in a script file. These commands run the gamut for what they can do, but there is a small subset that most people find their daily lives centered around when it comes to programming or hacking out solutions such as
Fairly common inputs to these commands are delimited-separated values such as CSVs and TSVs. Thankfully almost every command that you would use for these formats allows you to specify the delimiter.
For instance, say we have a TSV file called
sort Wiki page to include an example for tab separated sorting.In the world of Unix shells there exists a very common one called Bash. And within the Bash shell there are a whole host of commands that can be used both on the command line and in a script file. These commands run the gamut for what they can do, but there is a small subset that most people find their daily lives centered around when it comes to programming or hacking out solutions such as
sed, cut, and sort.Fairly common inputs to these commands are delimited-separated values such as CSVs and TSVs. Thankfully almost every command that you would use for these formats allows you to specify the delimiter.
For instance, say we have a TSV file called
phonebook that contains the name and number for each contact:$ cat phonebook Smith, Brett 555-4321 Doe, John 555-1234 Doe, Jane 555-3214 Avery, Cory 555-4132 Fogarty, Suzie 555-2314With
cut you could get just the names if you wanted:$ cut -f1 phonebook Smith, Brett Doe, John Doe, Jane Avery, Cory Fogarty, SuzieHow did it know what the delimiter was? Luckily with
cut the default delimiter is the tab character. What if it's not though? Looking at the man page for cut gives us:$ man cut
...
-d delim
Use delim as the field delimiter character instead of the tab character.
...
So, say you wanted just the last name for everyone. Well, you can pipe the output from the first command to a second command to do just that! The only difference is now we are going to specify the delimiter to be a comma for the second command.$ cut -f1 phonebook | cut -f1 -d ',' Smith Doe Doe Avery FogartyNice! Now, lets check out
sort. First, lets sort our phonebook by last name:$ sort -k1,1 phonebook Avery, Cory 555-4132 Doe, Jane 555-3214 Doe, John 555-1234 Fogarty, Suzie 555-2314 Smith, Brett 555-4321That works well enough. Now lets sort by phone numbers:
$ sort -k2,2 phonebook Smith, Brett 555-4321 Avery, Cory 555-4132 Doe, Jane 555-3214 Doe, John 555-1234 Fogarty, Suzie 555-2314Well... that's not right. It sorted by first name instead. Hmmmmm. Looking at the man page we see:
$ man sort
...
-t, --field-separator=SEP
use SEP instead of non-blank to blank transition
...
OK, lets add our trusty tab character:$ sort -k2,2 -t '\t' phonebook sort: multi-character tab `\\t'Uhhhhh... multi-character? Looks like
sort doesn't interpret '\t' as a tab character, but instead a literal '\' and 't'. In another way:$ echo -n '\t' | hexdump -c 0000000 \ t 0000002So, how do we set the separator to be a tab character? The beginner's bash guide provides some guidance on that:
3.3.5. ANSI-C quotingUsing our
Words in the form "$'STRING'" are treated in a special way. The word expands to a string, with backslash-escaped characters replaced as specified by the ANSI-C standard. Backslash escape sequences can be found in the Bash documentation.
echo example again:$ echo -n $'\t' | hexdump -c 0000000 \t 0000001Yup, one character. Trying out phone number sort again:
$ sort -k2,2 -t $'\t' phonebook Doe, John 555-1234 Fogarty, Suzie 555-2314 Doe, Jane 555-3214 Avery, Cory 555-4132 Smith, Brett 555-4321BINGO! Now we are truly sorting on the second column.
tl;dr
Turns outsort is similar to echo in that by default escaped characters are interpreted as two character literals rather than the intended escaped character. While echo -e does provide a means to do so, sort does not. So we must use the ANSI-C quoting (or some other means).