Archive

Archive for the ‘Misc’ Category

Recent articles

February 28, 2014 Leave a comment

Just a small review of the articles I went through recently :

  • A surprising results on sex bias in Holstein. Huge impact on the industry could be thought of, if the figure are confirmed and impact better estimated (I mean, conventional semen give 50% of females. So the worst case scenario against which +450 kg more milk at the end of the second lactation could be produced…is not the common situation).
  • Analysis of selective sweep in cattle with sequence data.
  • Some hints on way to correct for population stratification
  • The funny article of the month, a compression exercise with all the results included (Side effect of the OpenData policy ? …lots of information you’ll never really use ?)
Advertisements

Recent articles

January 21, 2014 Leave a comment

Just a small review of the articles I went through recently :

Recent articles

October 31, 2013 Leave a comment

Just a small review of the articles I went through recently :

Hourglass distribution

August 6, 2013 1 comment

Some days ago, I went through an article on mutation rate mentioning the “hourglass distribution”. The illustration of this distribution is pretty obvious from the plot in the article, with for some chromosome a pretty clear reduction in number of SNP around centromeres. I just wondered if we could also observe this phenomenon on bos taurus. So i just had a look on dbSNP138 data to have some clues.

First version


#Retrieve dbSNP
 wget ftp://ftp.ncbi.nih.gov/snp/organisms/cow_9913/VCF/vcf_*vcf.gz

And then just run the following R code.

#Open an output file
png("hourglass.png",1920,1080)
#split the screen in 30
par(mfrow=c(3,10))
#Create a vector of chromosomes
 chr=c(seq(1:29),"X")
 #run a loop on chromosomes
 for( i in chr){
 #Read file
 Pos=read.table(paste("vcf_chr_",i,".vcf.gz",sep=""),skip=15,fill=TRUE)
 #plot the snp density
 plot(density(Pos$V2),col="grey",main=paste("BTA : ",i,sep=""))
 polygon(density(Pos$V2),col=rgb(0,0,0.3,0.04))
 }
 dev.off()

Notes on code :

  • read.table is used directly on a gzipped file (very handy trick)
  • dbsnp file have a 15 lines long header, so I use the skip =15 option
  • I had some glitches while reading some files, (a problem with # of fields) option fill=TRUE, just fix it
  • Plot are nice….but polygon are even better, so I first plot the density and then add a polygon on it
  • rgb function is a simple way to obtain transparency, so after the three values for red, green and blue, I add the alpha setting to a value of 0.04

And after a while….you ‘ll obtain something like this

hourglass

I must say I was a bit disappointed. At least, there are no clear pattern as can be seen in human. All the bovine chromosomes are acrocentric  this may explain why generally no clear decrease in SNP density can be seen. The pattern observed on chromosome 12, 18 and 10 were even more surprising. I am wondering if there could be some sampling bias. Concerning the pattern on BTA23, the latter could be due to MHC, known to exhibit a great diversity. Density computation may also blur a bit things.

Second version

The basic work being done, we can try to investigate others hypotheses. As instance, are SNP and Indel distributed the same way along the genome ? With some slight changes, the code become :

#Open an output file
png("SNPandindelDistribution.png",1920,1080)
#split the screen in 30
par(mfrow=c(3,10))
#Create a vector of chromosomes
 chr=c(seq(1:29),"X")
 #run a loop on chromosomes
 for( i in chr){
 #Read file
 Pos=read.table(paste("vcf_chr_",i,".vcf.gz",sep=""),skip=15,fill=TRUE)
 #plot the snp density
 plot(density(Pos$V2[grep("snp",Pos$V8,perl=TRUE)]),col="grey",main=paste("BTA : ",i,sep=""))
 polygon(density(Pos$V2[grep("snp",Pos$V8,perl=TRUE)]),col=rgb(0,0,0.3,0.04))
 #Add in-del line
 dense=density(Pos$V2[grep("in-del",Pos$V8,perl=TRUE)])
 lines(dense$x,0.5*dense$y,col="red")
 }
dev.off()

Notes on code :

  • in dbsnp variant are coded either as snp or in-del, we extract line with the grep function accordingly
  • I tweaked a bit the indel line in order to avoid scale problems.

SNPandindelDistributionWe observe roughly the same pattern between snp and indel, albeit indel distribution may be smoother. I was expecting some discrepancies (relying on the article by Montgomery et al. but here again, we are only dealing with 1 base indel, which is not really representative of short indel  in general). I may try to check this results with my own results.

Serial snapshots with IGV batch

July 16, 2013 Leave a comment

IGV is a very handy tool. Nevertheless, scrolling from one position to another may be fastidious. Second, the bad aspect of this very user-friendly software, is that you can spend hours looking here and there, with at last no backtrack of your discoveries.

Thankfully, IGV can run in batch mode, allowing, for a targeted list of positions, to take screenshots and store the later in a folder. We’ll illustrate in the following with a small example :

Setting up a session

To test our script, we will first download some publicly available data.

#Download a vcf file (on beagle4 website, you may have to change file name according to last release)
wget http://faculty.washington.edu/browning/beagle/test.r1099.vcf.gz
gunzip test.r1099.vcf.gz
#Create an index
igvtools index test.r1099.vcf
#(You can alternatively prefer to use igvtools from igv GUI)
#Launch igv
igv.sh

To check if everything is right, load the vcf file test.r1099.vcf. Then move to positions chr22:20,000,000-20,100,000.

IGV

Set everything to your taste, and save the session :  File, save session, you should obtain a xml file.

Running IGV in batch mode

Now, let’s consider you are interested in some particular position. Let’s say we’ve stored several positions of interest in a csv file. Our aim is to create an IGV batch file.

Basically, we’ll have to load the session, set a directory to store the screenshots, and then move from one position to the other. A very crude version could therefore be. (I know : Why csv ? Because a lot of person still use excel 😦 )

#Create a fake positions list
cat >Liste.csv <<EOF
chr22;20070000
chr22;20081000
EOF

#Create a ss directory
mkdir IMG
#Write the header of the script
cat >Batch.igv <<EOF
new
load igv_session.xml
snapshotDirectory IMG
EOF
#Now parse the csv file
gawk -F ";" -v R=10000 '{print "goto "$1":"$2 -R"-"$2 + R"\nsnapshot Screen"NR".png"}END{print "exit"}' Liste.csv >>Batch.igv

From, igv, go to Tools, Run a batch script. Load Batch.igv, when all the process will be done, IGV will terminate and you’ll find your screenshots in IMG.

For an even more automated version you can use the script “PrepIgvBatch.sh” available in the scriptotheque

Recent articles

July 8, 2013 Leave a comment

Just a small review of the articles I went through recently :

Sufficiently rare to be mentioned, Bayes theorem in Science  by  Efron , with a nice follow-up post on the og

Although the lab technicalities were far beyond my understanding, the questions raised by this article on Evolution of  essential gene, stroke me !

I wish I had time to have a look on these kind of procedure during my  Ph’D, a simple permutation algorithm to compute significance threshold. By the way, I also learned a new distribution : the Rademacher distribution

I was eager to see this article, the Rat Genome Sequencing and Mapping consortium  made a very interesting piece of work combining sequence and genetic mapping in outbred rats. A lot of questions came to my mind based on these results…yeah hunting the so called “causal mutation” may not be that easy.

And last the funny  article of the month ! This kind of question could have been seen on  Freakonometrics

Recent articles

June 5, 2013 Leave a comment

Just a small review of the articles I went through recently :

  • A good article on covariance matrices regularization, the latter point to another article that turns out to be a generalization of methods we’ve tested for genomic selection (I wish I knew them some times ago).
  • Two articles on Genomic Selection, the first one made me think about discussion we had with David after my PhD defense. The second one  although  “less humourous than the original submission”, make a nice review of “the Bayesian alphabet”, explaining and pointing out a lot of interesting facts on Bayesian analysis.
  • A nice article on a minute plant genome, that could be related to the ongoing encode’s discussion.
  • An interesting article on LongRNA .
  • The funny article of the month