Home > Agriculture, Funny science, NGS, SNP > More on variant distribution with dbsnp and Vep

More on variant distribution with dbsnp and Vep

Just a follow-up post, there are so many questions one can wonder about genome, that I thought it would be nice to elaborate a bit on the dbSNP data. So to move forward, we’ll see how to obtain sift score for the dbnsp ressources with Vep.

Install Vep

#Download VEP
wget www.ensembl.org/info/docs/variation/vep/vep_script.html -O Out && wget `gawk '/variant_effect_predictor.tar.gz/ && /latest/{split($0,T,"\"");print T[2]}' Out ` -O vep.tar.gz
#Extract it
gunzip vep.tar.gz ; tar -xvf vep.tar
#Go into the directory, create a cache folder
cd variant_effect_predictor ; mkdir .vep
#Run install, answers should be yes to ask for the use of cache file, and the number <=> bos Taurus
perl INSTALL.pl -c .vep

Note :

  • We first download Vep page
  • Then the html code is parsed, and the link to the latest Vep verions is extracted and donwloaded
  • During install, you’ll have to indicate a local vep directory (here .vep)
  • Prefer  a local cache file

Running Vep on vcf

#Download dbSNP vcf
wget ftp://ftp.ncbi.nih.gov/snp/organisms/cow_9913/VCF/vcf_*vcf.gz
for BTA in `seq 1 29 ` X MT
zcat vcf_chr_${BTA}.vcf.gz >$BTA.vcf
#Run Vep
perl variant_effect_predictor.pl --offline --species bos_taurus -i ${BTA}.vcf --vcf --html --sift b --dir .vep --output_file Vep${BTA}

Note :

  • We use the Vep in local mode, so you’ll have to declare the .vep directory explicitly “–dir .vep”
  • Output will be in vcf format (to avoid handling too many different file format)  “–vcf”
  • sift score are available for cow since ensembl 71 nevertheless you must ask for them in Vep “–sift b”

Location of “deleterious” variant

The vcf now have some annotations appended. We just go back to last post’s R code, but  wonder this time where the variations supposed to be deleterious are ?

#Open an output file
#split the screen in 30
#Create a vector of chromosomes
#run a loop on chromosomes
for( i in chr){
#Read file
#plot the snp density
plot(density(Pos$V2[grep("snp",Pos$V8,perl=TRUE)]),col="grey",main=paste("BTA : ",i,sep=""))
#Add in-del line

Note: A shortcut would be to download the (not so up-to-date) vcf available at ensembl ftp site. The code is essentially the same as the one used in the previous post.

And you should obtain something like the following plot. DeleteriousDistribution

Once again this is still a quick and very dirty result. I wonder if there are any good story in these graphs (I mean one story that would not instantly vanished due to assembly problem or obvious bias !).

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: