Home > Agriculture, Awk, Funny science, Misc, NGS, R, Shell, SNP > Hourglass distribution

Hourglass distribution

Some days ago, I went through an article on mutation rate mentioning the “hourglass distribution”. The illustration of this distribution is pretty obvious from the plot in the article, with for some chromosome a pretty clear reduction in number of SNP around centromeres. I just wondered if we could also observe this phenomenon on bos taurus. So i just had a look on dbSNP138 data to have some clues.

First version


#Retrieve dbSNP
 wget ftp://ftp.ncbi.nih.gov/snp/organisms/cow_9913/VCF/vcf_*vcf.gz

And then just run the following R code.

#Open an output file
png("hourglass.png",1920,1080)
#split the screen in 30
par(mfrow=c(3,10))
#Create a vector of chromosomes
 chr=c(seq(1:29),"X")
 #run a loop on chromosomes
 for( i in chr){
 #Read file
 Pos=read.table(paste("vcf_chr_",i,".vcf.gz",sep=""),skip=15,fill=TRUE)
 #plot the snp density
 plot(density(Pos$V2),col="grey",main=paste("BTA : ",i,sep=""))
 polygon(density(Pos$V2),col=rgb(0,0,0.3,0.04))
 }
 dev.off()

Notes on code :

  • read.table is used directly on a gzipped file (very handy trick)
  • dbsnp file have a 15 lines long header, so I use the skip =15 option
  • I had some glitches while reading some files, (a problem with # of fields) option fill=TRUE, just fix it
  • Plot are nice….but polygon are even better, so I first plot the density and then add a polygon on it
  • rgb function is a simple way to obtain transparency, so after the three values for red, green and blue, I add the alpha setting to a value of 0.04

And after a while….you ‘ll obtain something like this

hourglass

I must say I was a bit disappointed. At least, there are no clear pattern as can be seen in human. All the bovine chromosomes are acrocentric  this may explain why generally no clear decrease in SNP density can be seen. The pattern observed on chromosome 12, 18 and 10 were even more surprising. I am wondering if there could be some sampling bias. Concerning the pattern on BTA23, the latter could be due to MHC, known to exhibit a great diversity. Density computation may also blur a bit things.

Second version

The basic work being done, we can try to investigate others hypotheses. As instance, are SNP and Indel distributed the same way along the genome ? With some slight changes, the code become :

#Open an output file
png("SNPandindelDistribution.png",1920,1080)
#split the screen in 30
par(mfrow=c(3,10))
#Create a vector of chromosomes
 chr=c(seq(1:29),"X")
 #run a loop on chromosomes
 for( i in chr){
 #Read file
 Pos=read.table(paste("vcf_chr_",i,".vcf.gz",sep=""),skip=15,fill=TRUE)
 #plot the snp density
 plot(density(Pos$V2[grep("snp",Pos$V8,perl=TRUE)]),col="grey",main=paste("BTA : ",i,sep=""))
 polygon(density(Pos$V2[grep("snp",Pos$V8,perl=TRUE)]),col=rgb(0,0,0.3,0.04))
 #Add in-del line
 dense=density(Pos$V2[grep("in-del",Pos$V8,perl=TRUE)])
 lines(dense$x,0.5*dense$y,col="red")
 }
dev.off()

Notes on code :

  • in dbsnp variant are coded either as snp or in-del, we extract line with the grep function accordingly
  • I tweaked a bit the indel line in order to avoid scale problems.

SNPandindelDistributionWe observe roughly the same pattern between snp and indel, albeit indel distribution may be smoother. I was expecting some discrepancies (relying on the article by Montgomery et al. but here again, we are only dealing with 1 base indel, which is not really representative of short indel  in general). I may try to check this results with my own results.

Advertisements
  1. No comments yet.
  1. August 12, 2013 at 6:31 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: