Archive for July, 2013

Software update

July 25, 2013 Leave a comment

So what are the last noticeable update of this latest months ?

GATK 2.6-5

At last HaplotypeCaller is now a real alternative for Big project ! <grumpy> The bad thing <\grumpy> is that now GATK need java7, which is still not so common nowadays and (thanks to oracle) no that easy to install !

I am actually running it on a batch of 100 bulls  @12&+X(so far so good, I will complete a whole genome variant calling in 2 weeks).

Beagle 4
Now allowing imputation ! And this just make everything so simple !


This tool has been released in its 5.0 version.


This is the yearly release. Not a lot of new things but it’s always worthy to upgrade.

Categories: emacs, Linux, NGS

Serial snapshots with IGV batch

July 16, 2013 Leave a comment

IGV is a very handy tool. Nevertheless, scrolling from one position to another may be fastidious. Second, the bad aspect of this very user-friendly software, is that you can spend hours looking here and there, with at last no backtrack of your discoveries.

Thankfully, IGV can run in batch mode, allowing, for a targeted list of positions, to take screenshots and store the later in a folder. We’ll illustrate in the following with a small example :

Setting up a session

To test our script, we will first download some publicly available data.

#Download a vcf file (on beagle4 website, you may have to change file name according to last release)
gunzip test.r1099.vcf.gz
#Create an index
igvtools index test.r1099.vcf
#(You can alternatively prefer to use igvtools from igv GUI)
#Launch igv

To check if everything is right, load the vcf file test.r1099.vcf. Then move to positions chr22:20,000,000-20,100,000.


Set everything to your taste, and save the session :  File, save session, you should obtain a xml file.

Running IGV in batch mode

Now, let’s consider you are interested in some particular position. Let’s say we’ve stored several positions of interest in a csv file. Our aim is to create an IGV batch file.

Basically, we’ll have to load the session, set a directory to store the screenshots, and then move from one position to the other. A very crude version could therefore be. (I know : Why csv ? Because a lot of person still use excel 😦 )

#Create a fake positions list
cat >Liste.csv <<EOF

#Create a ss directory
mkdir IMG
#Write the header of the script
cat >Batch.igv <<EOF
load igv_session.xml
snapshotDirectory IMG
#Now parse the csv file
gawk -F ";" -v R=10000 '{print "goto "$1":"$2 -R"-"$2 + R"\nsnapshot Screen"NR".png"}END{print "exit"}' Liste.csv >>Batch.igv

From, igv, go to Tools, Run a batch script. Load Batch.igv, when all the process will be done, IGV will terminate and you’ll find your screenshots in IMG.

For an even more automated version you can use the script “” available in the scriptotheque

More on polled

July 13, 2013 Leave a comment

iris.I’ve collected so far 23 answers to my little survey (by the way, a great “thank you” to all of you who answered). Just to remind to those of you who haven’t bet yet, why being polled and red carrier could be of interest for a breeder, here comes some R code. First you ‘ll have to download French’s holstein  gEBV here. Then start R in a terminal.

#Find file name (I suppose the file is unique and in the directory)


#Polled genotype===================================
#create a vector of status "pp=>Normal,Pp=>heterozygous polled,PP=>Homozygous polled"
#Identify heterozygous polled with grep applied to the name of the sire
PO[gEBV$NOTAUR[grep(" P | P$",gEBV$NOTAUR,perl=TRUE)]]="Pp"
#Identify homozygous polled
PO[gEBV$NOTAUR[grep(" PP | PP$",gEBV$NOTAUR,perl=TRUE)]]="PP"
#Draw a boxplot
boxplot(gEBV$X0 ~ PO ,col="blue")
#Add the Number of observation
for(i in 1:3){text(i,0.95*max(gEBV$X0),paste("N= ",table(PO)[i]))}

#=Now the Red factor================================
RF[gEBV$NOTAUR[grep(" RF | RF$",gEBV$NOTAUR,perl=TRUE)]]="Rr"
#Draw a boxplot
boxplot(gEBV$X0 ~ RF ,col="red")
#Add the Number of observation
for(i in 1:dim(table(RF))){text(i,0.95*max(gEBV$X0),paste("N= ",table(RF)[i]))}

So for a given genotype, chances to be in the “Elite” will vary a lot. Iris and her brother have pedigree (gEBV) of 155. Their parents reliability is .7.

mu=155 ; VarG=25 ; Rel=0.25*(0.7+0.7)
#Boxplot of gEBV depending on genotypes (in the actual sire population)
boxplot(gEBV$X0 ~ RF + PO ,col="grey",main="gEBV according to genotype")
#Add E(EBV) and trace a 90IC area

bpSo, we can clearly see that for Iris and her brother their career will mostly depend on the fact that they are red and polled carriers !

Categories: Agriculture, Funny science, R

Recent articles

July 8, 2013 Leave a comment

Just a small review of the articles I went through recently :

Sufficiently rare to be mentioned, Bayes theorem in Science  by  Efron , with a nice follow-up post on the og

Although the lab technicalities were far beyond my understanding, the questions raised by this article on Evolution of  essential gene, stroke me !

I wish I had time to have a look on these kind of procedure during my  Ph’D, a simple permutation algorithm to compute significance threshold. By the way, I also learned a new distribution : the Rademacher distribution

I was eager to see this article, the Rat Genome Sequencing and Mapping consortium  made a very interesting piece of work combining sequence and genetic mapping in outbred rats. A lot of questions came to my mind based on these results…yeah hunting the so called “causal mutation” may not be that easy.

And last the funny  article of the month ! This kind of question could have been seen on  Freakonometrics