Recent articles

February 28, 2014 Leave a comment

Just a small review of the articles I went through recently :

  • A surprising results on sex bias in Holstein. Huge impact on the industry could be thought of, if the figure are confirmed and impact better estimated (I mean, conventional semen give 50% of females. So the worst case scenario against which +450 kg more milk at the end of the second lactation could be produced…is not the common situation).
  • Analysis of selective sweep in cattle with sequence data.
  • Some hints on way to correct for population stratification
  • The funny article of the month, a compression exercise with all the results included (Side effect of the OpenData policy ? …lots of information you’ll never really use ?)

Recent articles

January 21, 2014 Leave a comment

Just a small review of the articles I went through recently :

Software update

November 30, 2013 Leave a comment

Picards tools

The classical oneūüėČ

GATK 2.7-4

A new experimental feature to assess PK for reference site. Plus some bug fixing


A new version of the algorithm. Shortened arguments flags. New sides tools (data checking and graph handling).


A new version released 2.12 :  faster and with additional feature. Note a GCTA forum is now available for discussion and technical problem.


A more flexible way to define reference population and the way data are handled. Multi-threading available during sampling step.

Delly 0.12

Apart from translocation detection removal (not for long ?), this version ship a lot of awaited features like vcf output.

Varscan 2.3.6

Still some work on vcf compatibility improvement.

Categories: Linux, NGS, QTL-detection, SNP

Recent articles

October 31, 2013 Leave a comment

Just a small review of the articles I went through recently :

Recent article

September 9, 2013 Leave a comment
  • Why partially identifiable covariance matrix should be consider with utmost care. I wondered if when dealing with Genomic relationship based on haplotypes, the mentioned¬† phenomenon doesn’t apply.
  • Hints on the results of my survey¬† (soon on the blog)
Categories: Uncategorized

Software update

August 18, 2013 Leave a comment

Picards tools

The classical one ;-)  Note that some changes were done for java 7 compatibility. So, after GATK, turning to java 7 as default may be on its way.

Beagle 4

Another update (r1128), not documented so far.


The new version now use all variants in the reference panel (snp, indels, SV)

Open MP

The openmp specification 4.0 are out ! Now support Fortran2003 and prepare the support for accelerator. Note the next intel compiler version already support a large number of the new specifications.


This library now in its version 1.4 . Support for new GPUs were added as well as additional subroutines. I wish more Fortran interface were added…maybe next time !

Cuda 5.5

As previously mentioned, the last version of cuda is now available as rpm/deb package (allowing a much easier install).

Categories: emacs, Linux, NGS

More on variant distribution with dbsnp and Vep

August 12, 2013 Leave a comment

Just a follow-up post, there are so many questions one can wonder about genome, that I thought it would be nice to elaborate a bit on the dbSNP data. So to move forward, we’ll see how to obtain sift score for the dbnsp ressources with Vep.

Install Vep

#Download VEP
wget -O Out && wget `gawk '/variant_effect_predictor.tar.gz/ && /latest/{split($0,T,"\"");print T[2]}' Out ` -O vep.tar.gz
#Extract it
gunzip vep.tar.gz ; tar -xvf vep.tar
#Go into the directory, create a cache folder
cd variant_effect_predictor ; mkdir .vep
#Run install, answers should be yes to ask for the use of cache file, and the number <=> bos Taurus
perl -c .vep

Note :

  • We first download Vep page
  • Then the html code is parsed, and the link to the latest Vep verions is extracted and donwloaded
  • During install, you’ll have to indicate a local vep directory (here .vep)
  • Prefer¬† a local cache file

Running Vep on vcf

#Download dbSNP vcf
for BTA in `seq 1 29 ` X MT
zcat vcf_chr_${BTA}.vcf.gz >$BTA.vcf
#Run Vep
perl --offline --species bos_taurus -i ${BTA}.vcf --vcf --html --sift b --dir .vep --output_file Vep${BTA}

Note :

  • We use the Vep in local mode, so you’ll have to declare the .vep directory explicitly “–dir .vep”
  • Output will be in vcf format (to avoid handling too many different file format)¬† “–vcf”
  • sift score are available for cow since ensembl 71 nevertheless you must ask for them in Vep “–sift b”

Location of “deleterious” variant

The vcf now have some annotations appended. We just go back to last post’s R code, but¬† wonder this time where the variations supposed to be deleterious are ?

#Open an output file
#split the screen in 30
#Create a vector of chromosomes
#run a loop on chromosomes
for( i in chr){
#Read file
#plot the snp density
plot(density(Pos$V2[grep("snp",Pos$V8,perl=TRUE)]),col="grey",main=paste("BTA : ",i,sep=""))
#Add in-del line

Note: A shortcut would be to download the (not so up-to-date) vcf available at ensembl ftp site. The code is essentially the same as the one used in the previous post.

And you should obtain something like the following plot. DeleteriousDistribution

Once again this is still a quick and very dirty result. I wonder if there are any good story in these graphs (I mean one story that would not instantly vanished due to assembly problem or obvious bias !).