Archive

Archive for the ‘NGS’ Category

Software update

November 30, 2013 Leave a comment

Picards tools

The classical one šŸ˜‰

GATK 2.7-4

A new experimental feature to assess PK for reference site. Plus some bug fixing

Shapeit2

A new version of the algorithm. Shortened arguments flags. New sides tools (data checking and graph handling).

GCTA

A new version released 2.12 :Ā  faster and with additional feature. Note a GCTA forum is now available for discussion and technical problem.

Beagle4

A more flexible way to define reference population and the way data are handled. Multi-threading available during sampling step.

Delly 0.12

Apart from translocation detection removal (not for long ?), this version ship a lot of awaited features like vcf output.

Varscan 2.3.6

Still some work on vcf compatibility improvement.

Advertisements
Categories: Linux, NGS, QTL-detection, SNP

Software update

August 18, 2013 Leave a comment

Picards tools

The classical one šŸ˜‰Ā  Note that some changes were done for java 7 compatibility. So, after GATK, turning to java 7 as default may be on its way.

Beagle 4

Another update (r1128), not documented so far.

Minimac

The new version now use all variants in the reference panel (snp, indels, SV)

Open MP

The openmp specification 4.0 are out ! Now support Fortran2003 and prepare the support for accelerator. Note the next intel compiler version already support a large number of the new specifications.

Magma

This library now in its version 1.4 . Support for new GPUs were added as well as additional subroutines. I wish more Fortran interface were added…maybe next time !

Cuda 5.5

As previously mentioned, the last version of cuda is now available as rpm/deb package (allowing a much easier install).

Categories: emacs, Linux, NGS

More on variant distribution with dbsnp and Vep

August 12, 2013 Leave a comment

Just a follow-up post, there are so many questions one can wonder about genome, that I thought it would be nice to elaborate a bit on the dbSNP data. So to move forward, we’ll see how to obtain sift score for the dbnsp ressources with Vep.

Install Vep

#Download VEP
wget www.ensembl.org/info/docs/variation/vep/vep_script.html -O Out && wget `gawk '/variant_effect_predictor.tar.gz/ && /latest/{split($0,T,"\"");print T[2]}' Out ` -O vep.tar.gz
#Extract it
gunzip vep.tar.gz ; tar -xvf vep.tar
#Go into the directory, create a cache folder
cd variant_effect_predictor ; mkdir .vep
#Run install, answers should be yes to ask for the use of cache file, and the number <=> bos Taurus
perl INSTALL.pl -c .vep

Note :

  • We first download Vep page
  • Then the html code is parsed, and the link to the latest Vep verions is extracted and donwloaded
  • During install, you’ll have to indicate a local vep directory (here .vep)
  • PreferĀ  a local cache file

Running Vep on vcf

#Download dbSNP vcf
wget ftp://ftp.ncbi.nih.gov/snp/organisms/cow_9913/VCF/vcf_*vcf.gz
for BTA in `seq 1 29 ` X MT
do
#Decompress
zcat vcf_chr_${BTA}.vcf.gz >$BTA.vcf
#Run Vep
perl variant_effect_predictor.pl --offline --species bos_taurus -i ${BTA}.vcf --vcf --html --sift b --dir .vep --output_file Vep${BTA}
done

Note :

  • We use the Vep in local mode, so you’ll have to declare the .vep directory explicitly “–dir .vep”
  • Output will be in vcf format (to avoid handling too many different file format)Ā  “–vcf”
  • sift score are available for cow since ensembl 71 nevertheless you must ask for them in Vep “–sift b”

Location of “deleterious” variant

The vcf now have some annotations appended. We just go back to last post’s R code, butĀ  wonder this time where the variations supposed to be deleterious are ?

#Open an output file
png("DeleteriousDistribution.png",1920,1080)
#split the screen in 30
par(mfrow=c(3,10))
#Create a vector of chromosomes
chr=c(seq(1:29),"X")
#run a loop on chromosomes
for( i in chr){
#Read file
Pos=read.table(paste("Vep",i,sep=""),skip=15,fill=TRUE)
#plot the snp density
plot(density(Pos$V2[grep("snp",Pos$V8,perl=TRUE)]),col="grey",main=paste("BTA : ",i,sep=""))
polygon(density(Pos$V2[grep("snp",Pos$V8,perl=TRUE)]),col=rgb(0,0,0.3,0.04))
#Add in-del line
dense=density(Pos$V2[grep("deleterious",Pos$V8,perl=TRUE)])
lines(dense$x,0.5*dense$y,col="red")
}
dev.off()

Note: A shortcut would be to download the (not so up-to-date) vcf available at ensembl ftp site. The code is essentially the same as the one used in the previous post.

And you should obtain something like the following plot. DeleteriousDistribution

Once again this is still a quick and very dirty result. I wonder if there are any good story in these graphs (I mean one story that would not instantly vanished due to assembly problem or obvious bias !).

Hourglass distribution

August 6, 2013 1 comment

Some days ago, I went through an article on mutation rate mentioning the “hourglass distribution”. The illustration of this distribution is pretty obvious from the plot in the article, with for some chromosome a pretty clear reduction in number of SNP around centromeres. I just wondered if we could also observe this phenomenon on bos taurus. So i just had a look on dbSNP138 data to have some clues.

First version


#Retrieve dbSNP
 wget ftp://ftp.ncbi.nih.gov/snp/organisms/cow_9913/VCF/vcf_*vcf.gz

And then just run the following R code.

#Open an output file
png("hourglass.png",1920,1080)
#split the screen in 30
par(mfrow=c(3,10))
#Create a vector of chromosomes
 chr=c(seq(1:29),"X")
 #run a loop on chromosomes
 for( i in chr){
 #Read file
 Pos=read.table(paste("vcf_chr_",i,".vcf.gz",sep=""),skip=15,fill=TRUE)
 #plot the snp density
 plot(density(Pos$V2),col="grey",main=paste("BTA : ",i,sep=""))
 polygon(density(Pos$V2),col=rgb(0,0,0.3,0.04))
 }
 dev.off()

Notes on code :

  • read.table is used directly on a gzipped file (very handy trick)
  • dbsnp file have a 15 lines long header, so I use the skip =15 option
  • I had some glitches while reading some files, (a problem with # of fields) option fill=TRUE, just fix it
  • Plot are nice….but polygon are even better, so I first plot the density and then add a polygon on it
  • rgb function is a simple way to obtain transparency, so after the three values for red, green and blue, I add the alpha setting to a value of 0.04

And after a while….you ‘ll obtain something like this

hourglass

I must say I was a bit disappointed. At least, there are no clear pattern as can be seen in human. All the bovine chromosomes are acrocentricĀ  this may explain why generally no clear decrease in SNP density can be seen. The pattern observed on chromosome 12, 18 and 10 were even more surprising. I am wondering if there could be some sampling bias. Concerning the pattern on BTA23, the latter could be due to MHC, known to exhibit a great diversity. Density computation may also blur a bit things.

Second version

The basic work being done, we can try to investigate others hypotheses. As instance, are SNP and Indel distributed the same way along the genome ? With some slight changes, the code become :

#Open an output file
png("SNPandindelDistribution.png",1920,1080)
#split the screen in 30
par(mfrow=c(3,10))
#Create a vector of chromosomes
 chr=c(seq(1:29),"X")
 #run a loop on chromosomes
 for( i in chr){
 #Read file
 Pos=read.table(paste("vcf_chr_",i,".vcf.gz",sep=""),skip=15,fill=TRUE)
 #plot the snp density
 plot(density(Pos$V2[grep("snp",Pos$V8,perl=TRUE)]),col="grey",main=paste("BTA : ",i,sep=""))
 polygon(density(Pos$V2[grep("snp",Pos$V8,perl=TRUE)]),col=rgb(0,0,0.3,0.04))
 #Add in-del line
 dense=density(Pos$V2[grep("in-del",Pos$V8,perl=TRUE)])
 lines(dense$x,0.5*dense$y,col="red")
 }
dev.off()

Notes on code :

  • in dbsnp variant are coded either as snp or in-del, we extract line with the grep function accordingly
  • I tweaked a bit the indel line in order to avoid scale problems.

SNPandindelDistributionWe observe roughly the same pattern between snp and indel, albeit indel distribution may be smoother. I was expecting some discrepancies (relying on the article by Montgomery et al. but here again, we are only dealing with 1 base indel, which is not really representative of short indelĀ  in general). I may try to check this results with my own results.

Software update

July 25, 2013 Leave a comment

So what are the last noticeable update of this latest months ?

GATK 2.6-5

At last HaplotypeCaller is now a real alternative for Big project ! <grumpy> The bad thing <\grumpy> is that now GATK need java7, which is still not so common nowadays andĀ (thanks to oracle) no that easy to install !

I am actually running it on a batch of 100 bullsĀ  @12&+X(so far so good, I will complete a whole genome variant calling in 2 weeks).

Beagle 4
Now allowing imputation ! And this just make everything so simple !

eigenstrat

This tool has been released in its 5.0 version.

TexLive2013

This is the yearly release. Not a lot of new things but it’s always worthy to upgrade.

Categories: emacs, Linux, NGS

Serial snapshots with IGV batch

July 16, 2013 Leave a comment

IGV is a very handy tool. Nevertheless, scrolling from one position to another may be fastidious. Second, the bad aspect of this very user-friendly software, is that you can spend hours looking here and there, with at last no backtrack of your discoveries.

Thankfully, IGV can run in batch mode, allowing, for a targeted list of positions, to take screenshots and store the later in a folder. We’ll illustrate in the following with a small example :

Setting up a session

To test our script, we will first download some publicly available data.

#Download a vcf file (on beagle4 website, you may have to change file name according to last release)
wget http://faculty.washington.edu/browning/beagle/test.r1099.vcf.gz
gunzip test.r1099.vcf.gz
#Create an index
igvtools index test.r1099.vcf
#(You can alternatively prefer to use igvtools from igv GUI)
#Launch igv
igv.sh

To check if everything is right, load the vcf file test.r1099.vcf. Then move to positions chr22:20,000,000-20,100,000.

IGV

Set everything to your taste, and save the session :Ā  File, save session, you should obtain a xml file.

Running IGV in batch mode

Now, let’s consider you are interested in some particular position. Let’s say we’ve stored several positions of interest in a csv file. Our aim is to create an IGV batch file.

Basically, we’ll have to load the session, set a directory to store the screenshots, and then move from one position to the other. A very crude version could therefore be. (I know : Why csv ? Because a lot of person still use excel šŸ˜¦ )

#Create a fake positions list
cat >Liste.csv <<EOF
chr22;20070000
chr22;20081000
EOF

#Create a ss directory
mkdir IMG
#Write the header of the script
cat >Batch.igv <<EOF
new
load igv_session.xml
snapshotDirectory IMG
EOF
#Now parse the csv file
gawk -F ";" -v R=10000 '{print "goto "$1":"$2 -R"-"$2 + R"\nsnapshot Screen"NR".png"}END{print "exit"}' Liste.csv >>Batch.igv

From, igv, go to Tools, Run a batch script. Load Batch.igv, when all the process will be done, IGV will terminate and you’ll find your screenshots in IMG.

For an even more automated version you can use the script “PrepIgvBatch.sh” available in the scriptotheque

Software update

June 18, 2013 Leave a comment

Another month, another software update

samtools 0.1.19

I missed it in the last posts (this version was released in March). Multi-thread is now available. I play a bit with the the different displays (Html, Curse and Text) of tview, (pretty handy). I may need more time to get acustomed to bamcheck/plot-bamcheck output.

Delly 0.0.11

Some small fixes, a new progress bar to control process, better temporary file handling, and BWA-mem support…more here

BWA 0.7.5a

Several small fixes

Picard tools 1.93

Small bug fixes. Released some hours ago (I had to compensate samtools glitch)

llvm 3.3

Still some efficiency enhancement, clang is now totally C++11 complient. New Arch support…and more. Release announcement is still pending, but binary are already available.

GCTA

Apparently computing speed improvement are the newsworthy point of this release.

Coming soon…

I’ve had access to both next cuda 5.5 RC and Intel compiler….and so far the good new is that both now offer a simplified install procedure. Cuda 5.5 is available as a deb file, Intel compiler as a script with a GUI. I am eager to test these RC.