Archive

Posts Tagged ‘shell’

Skinning A Cat

May 17th, 2009 No comments

I was recently discussing with someone about sed usage. They were having difficultly creating an appropriate regex to handle their problem:

(Orlando) sed regex kicks my ass.  I like to remove the second : in the
line. So that in 123:456:6789 it will only returns 123:456.  I can find the
first :, but I have not been able to use s/:{2}// to find the second
one, and remove the rest.

I enjoy regex (I know I’m weird, leave me alone), so I was able to provide an answer to this problem:

(@znx) Orlando: the trick is to use [^:] and \1 ..
(@znx) like:   s/\(:[^:]*\):.*/\1/

Now obviously at first glance this regex could be a bit intimating to someone who is still picking up the skills but as with most things if we break it down it becomes easier.

Working out to in, \( \):.*, that says match something with : at the end and all the character after it. The . is a special meaning “any character” and * to match multiple characters. The first match will be stored by sed and assigned into the \1 for the replacement (that is what the brackets do). Inside the brackets we have :[^:]*. The sequence [^ ] is a negated list, that means that we are asking it to match everything that is NOT inside the list, in this case :.

Putting it altogether we are saying: Match a leading : and a trailing : with any characters after it. Placing the contents between the two : in memory. Then finally we replace the contents.

% echo 123:456:6789 | sed 's/\(:[^:]*\):.*/\1/'
123:456

Sucess, however as with most things, there is more than one way to skin a cat and regex is rarely the prettiest method. So what other ways can we solve this problem?

With AWK:

% echo 123:456:6789 | awk -F: '{print $1":"$2}'
123:456

With cut:

% echo 123:456:6789 | cut -d: -f1,2
123:456

As always, experimentation with the mass of GNU tools you can find on your system will bring a greater deal of power to your tool chest. Mind you, then I wouldn’t get complements for helping would I?

(Orlando) Wow, When I grow up, I like to remember this thing like you do. :-) 

Haha, till next time!

Tags: ,

File Oddity

February 13th, 2009 No comments

Today at work I was attempting to parse a file and discovered something odd happening. When I simply viewed the file with cat, I could see this:

<html><head><title>Status</title></head>
<table>
<tr><td>Failed</td><td>Backup Group</td></tr>
<tr><td>Success</td><td>Another Backup Group</td></tr>
</table>
</body>
</html>

Nothing odd there, the file is normal but when I tried this command:

$ grep -i failed status.html
$

Huh? No output, suggesting that there is no lines with the words failed on them. The same occurs with awk and sed, indeed I could not find any tool to be able to grep out the status. So the next step was to check what was odd about the file:

$ file status.html
status.html: HTML document text

Still, nothing unusual. So now I am in full head scratching mode, I open up the file with vim to see if I can discover anything strange about the file but nothing. At this point I happened to switch to more rather than cat and the result was the start of how I solved it.

$ more status.html
��<

$

Ah, so now I can see why grep and the others cannot view anything in the file. So this time I switch to vim again and check what file encoding we have:

:set fileencoding
fileencoding=ucs-2le

For those that are unaware, this is UCS-2 (little endian), also know as UTF-16. So the issue was simply that we had UTF-16 characters, now for the trick to get around it:

$ iconv -f UTF-16 -t UTF-8 status.html | grep -i failed
<tr><td>Failed</td><td>Backup Group</td></tr>
$

Tada. Once more a solution!

Tags: ,

Switch to our mobile site