Archive

Archive for April, 2011

Two other classical problem…and their solutions

April 21, 2011 Leave a comment

After the recent course I gave on Unix and awk basics, several of my colleagues came to my office with some strange case. Problem I already encountered a long time ago, so that I totally forgot to mention it.

First problem :  wc is not counting line properly

You open a file, read it, and observe that the number of line is let’s  say 10 but wc -l tell you that he count 9 or 11 or more lines

What’s wrong ?

There is for some reason a problem in the last line of your file. Some software doesn’t write properly the last line of a file so wc find the end of file sign in the last line containing text (normaly the end of file should be found alone on the last line of the file), this can explain why wc -l will think you got 9 lines whereas you have actually 10.

An easy way to fix is to open the problematic file with any editor  go to the last line press enter to move the cursor to the last empty line. This way your

In case wc -l tell you you have 11 or more lines, you may have additional empty lines at the end of your file, just open the problematic file and erase the extra empty lines.

Second problem :  floating points figure  are written with a comma by gawk 

This bug  is in fact related to the fact that i am French (nobody’s perfect) and that on one server among all the other available at the office, in fact in this server the LANG variable is set to “fr_FR”. The only way to fix this is to change the value of LANG to either  “us_US” or better “fr_FR.UTF-8”.

This could be conveniently done by adding this line  in any script before using awk (or by adding this in your .profile or .bashrc) .

export LANG="fr_FR.UTF-8"


Should we consider that all our scripts should be protected for such problem ?

Honestly, this problem are pretty rare and with well designed script on properly configured platform you shouldn’t have this problem. In fact the problem with wc generally appear after  manual editing of a file. And the LANG problem only appear if you don’t use UTF-8 encoding (which is not so usual nowadays). So just remind that this kind of problem can appear from time to time, but don’t focus too much on it.


Advertisements
Categories: Awk, Linux, Shell

Strange behaviour with awk

April 2, 2011 3 comments

We made recently an introduction course with my colleague on Awk and basics of shell scripting. The main points were to present some tricks for script creation.

Here is an interesting question arising after the course with its (or should I say one, among  some other) answer !

Awk seems to have a weird behaviour with a rather simple command.

To illustrate the problem, we had a file called “file.txt” like this one :

1 2 3 4

And we apply this command :

gawk '{print $NF ; print $NF " something" $0}' file

We get something like this !

4
something 1 2 3 4

when we were expecting something like this !

4
4 something 1 2 3 4

So what’s wrong here ?

Found it ?

In fact, an awfully classic problem, the file was coded as a dos file ! An hidden sign was positioned to the end of the line and was misinterpreted.

What are our solutions here ?

We could use dos2unix command which will convert our file into a unix style file, but we can also use the following trick :

gawk '{gsub("\015$","");print $NF ; print $NF " something" $0}' file.txt >file.unix

Explanations :

We use here the global substitution function “gsub”, this function will substitute any occurrence of the ASCII character “\15”, by nothing “”. You ‘ve understood that this ASCII symbol 015 is the one that turn our simple command into a devil driven nightmare (any one who once waist several hour on such a problem will understand what I am talking about !)

This trick is a good thing to add to your command when you must process files that come from another platform, in the best scenario it will be useless in the worst scenario it will avoid any problem !

How to avoid this kind of problem ?

  1. Use command “file” to test your file
  2. Open file with emacs check out that no the encoding is not dos (you can see this in the bottom left part of your buffer)
  3. Retrieve files on ftp with ASCII mode enable  by default
  4. Use dos2unix every time you have a doubt on the file you receive
  5. Change the world work only with people using Linux !
Categories: Awk, Linux, SNP