UPDATE: I have updated the
In the world of Unix shells there exists a very common one called Bash. And within the Bash shell there are a whole host of commands that can be used both on the command line and in a script file. These commands run the gamut for what they can do, but there is a small subset that most people find their daily lives centered around when it comes to programming or hacking out solutions such as
Fairly common inputs to these commands are delimited-separated values such as CSVs and TSVs. Thankfully almost every command that you would use for these formats allows you to specify the delimiter.
For instance, say we have a TSV file called
sort
Wiki page to include an example for tab separated sorting.In the world of Unix shells there exists a very common one called Bash. And within the Bash shell there are a whole host of commands that can be used both on the command line and in a script file. These commands run the gamut for what they can do, but there is a small subset that most people find their daily lives centered around when it comes to programming or hacking out solutions such as
sed
, cut
, and sort
.Fairly common inputs to these commands are delimited-separated values such as CSVs and TSVs. Thankfully almost every command that you would use for these formats allows you to specify the delimiter.
For instance, say we have a TSV file called
phonebook
that contains the name and number for each contact:$ cat phonebook Smith, Brett 555-4321 Doe, John 555-1234 Doe, Jane 555-3214 Avery, Cory 555-4132 Fogarty, Suzie 555-2314With
cut
you could get just the names if you wanted:$ cut -f1 phonebook Smith, Brett Doe, John Doe, Jane Avery, Cory Fogarty, SuzieHow did it know what the delimiter was? Luckily with
cut
the default delimiter is the tab character. What if it's not though? Looking at the man page for cut
gives us:$ man cut ... -d delim Use delim as the field delimiter character instead of the tab character. ...So, say you wanted just the last name for everyone. Well, you can pipe the output from the first command to a second command to do just that! The only difference is now we are going to specify the delimiter to be a comma for the second command.
$ cut -f1 phonebook | cut -f1 -d ',' Smith Doe Doe Avery FogartyNice! Now, lets check out
sort
. First, lets sort our phonebook by last name:$ sort -k1,1 phonebook Avery, Cory 555-4132 Doe, Jane 555-3214 Doe, John 555-1234 Fogarty, Suzie 555-2314 Smith, Brett 555-4321That works well enough. Now lets sort by phone numbers:
$ sort -k2,2 phonebook Smith, Brett 555-4321 Avery, Cory 555-4132 Doe, Jane 555-3214 Doe, John 555-1234 Fogarty, Suzie 555-2314Well... that's not right. It sorted by first name instead. Hmmmmm. Looking at the man page we see:
$ man sort ... -t, --field-separator=SEP use SEP instead of non-blank to blank transition ...OK, lets add our trusty tab character:
$ sort -k2,2 -t '\t' phonebook sort: multi-character tab `\\t'Uhhhhh... multi-character? Looks like
sort
doesn't interpret '\t' as a tab character, but instead a literal '\' and 't'. In another way:$ echo -n '\t' | hexdump -c 0000000 \ t 0000002So, how do we set the separator to be a tab character? The beginner's bash guide provides some guidance on that:
3.3.5. ANSI-C quotingUsing our
Words in the form "$'STRING'" are treated in a special way. The word expands to a string, with backslash-escaped characters replaced as specified by the ANSI-C standard. Backslash escape sequences can be found in the Bash documentation.
echo
example again:$ echo -n $'\t' | hexdump -c 0000000 \t 0000001Yup, one character. Trying out phone number sort again:
$ sort -k2,2 -t $'\t' phonebook Doe, John 555-1234 Fogarty, Suzie 555-2314 Doe, Jane 555-3214 Avery, Cory 555-4132 Smith, Brett 555-4321BINGO! Now we are truly sorting on the second column.
tl;dr
Turns outsort
is similar to echo
in that by default escaped characters are interpreted as two character literals rather than the intended escaped character. While echo -e
does provide a means to do so, sort
does not. So we must use the ANSI-C quoting (or some other means).
When I'm in a shell, I tend to escape these by pressing CTRL+V and then the key for the special character I need (usually tab, or enter for newline).
ReplyDeleteThanks for the reminder about $'string', though. Clea{r,n}ly the way to go in scripts.