Home > Fortran, Shell > Tips : Reading compressed file with fortran and named pipe

Tips : Reading compressed file with fortran and named pipe

As genomic data are accumulating at some point we have to find solutions to reduce data size as much as possible. Working with beagle (java software), made me noticing that most of “modern language” have standard library to read directly compressed file. Alas most of my tools are written in fortran and the few example I’ve seen of direct reading of compressed files used socket or bindings to C system calls….which are far from the ease of use of python or java library.

Fortunately, I discover recently the named pipe. The idea is to create a “file” that will behave like a pipe. One process will read whatever enter in the pipe and another will write in it.

The advantage here are :

  1. You don’t have to make  changes in  your fortran program
  2. You don’t need to decompress entirely a file to read it (and therefore you spare some disk space).

One example :

You have a compressed file containing genotypes. You need to read the genotype from your program, but you don’t have enough space available to decompress entirely your file. The idea is then to create a named pipe.

Named pipe creation

Named pipe can be created with the following command :

mkfifo MyNamedPipe

A simple ls -al should confirm the creation of the named pipe with the following line :

prw-rw-r– 1 francois francois     0 2012-01-07 19:53 MyNamedPipe

Redirections to named pipe

Suppose you want to compress directly a big file that you generate with any (let say fortran) program. We’ll proceed following these steps :

  1. In a first terminal, start the compression process on the pipe file (e.g. gzip MyNamedPipe)
  2. In a second terminal, write to the named pipe some data

Example with fortran

program testfifo

implicit none

integer :: i,j

open(11,file="MyNamedPipe")

do i=1,10000
 do j=i,10000
  write(11,*)i,j,j*i
 end do
end do

end program testfifo

Compilation

gfortran -o testfifo testfifo.f90

Execution

./testfifo

=> As the named pipe is empty the program is waiting for data

In another shell :

 cat MyNamedPipe | gzip >T.gz

And then voilà the decompression start and by turn fortran program continue its  normal execution.

Fortran

Things are working, but it would be nice to use mkfifo more smoothly. This can be achieve with the EXECUTE_COMMAND_LINE subroutine, within a fortran program.

Two problems should be fixed then :

  1. We should not wait the call system to finish in order to start the reading process.
  2. The read loop should know when file is totally decompressed (in fact as any other stream could be redirected to the named pipe, no end of file signal will be received by the loop).

The first point is obtained by the “&” sign to put the process in background. The second point, is obtained by echoing the “end of file” sign to the named pipe before removing it. So basically, we end-up with something like the following code.

program testfifo
implicit none
integer:: i,j,k,io,unit,iostat
character(len=30)::File,filename
character(len=600)::Instruction
write(*,*)Name of the xz file ?"
read(*,*)filename
!//Create a named pipe//
call EXECUTE_COMMAND_LINE("rm -f MyUnCompressedFile ;mkfifo MyUnCompressedFile")
!//Open a connection to it//
open(11,file="MyUnCompressedFile",iostat=iostat)
!//Write decompression instruction//
write(Instruction,'(a,a,a)')"(xz -dc  ",filename," >MyUnCompressedFile; echo \x4 >MyUnCompressedFile )"
!//Execute//
call EXECUTE_COMMAND_LINE(Instruction)
!//Execute a normal reading loop//
do
 read(11,*,iostat=io)i,j,k
 if(io/=0)exit
 write(*,*)i,j,k
end do
!//remove named pipe//
write(Instruction,'(a)')"( rm -f MyUnCompressedFile)"
CALL EXECUTE_COMMAND_LINE(Instruction)
end program testfifo

Compilation

gfortran -o testfifo testfifo.f90

Execution

echo T1.xz | ./testfifo

Note : We compile the program with gfortran. In fact, ifort doesn’t recognize (for the moment), the EXECUTE_COMMAND_LINE subroutine, which is part of the new fortran 2008 standard. In order to have the following program working, you should use “call system” instead of “call EXECUTE_COMMAND_LINE” (the first beeing an extension to fortran recognized by both ifort and gfortran).

One Last step beyond

In order to get an even smoother way to read compressed data, we can write a module with function that will work as open and read….you can find here such a module….by the way the reading part work fine, I still haven’t found a good way to open a named pipe for output compression (any hints would be very appreciated !)

Advertisements
Categories: Fortran, Shell
  1. FC
    February 28, 2014 at 4:13 pm

    Nice article, thanks for writing it up ! It is certainly neat to be able to read / write to zipped formats directly. You should point out that it relies on Fortran 2008 features (EXECUTE_COMMAND _LINE), which may not be available in all compilers.

    If you are going to be doing this regularly in your code, why not consdier a data library such as HDF5 or NETCDF which natively support compression inside their archives (as well as metadata and many other benefits)?

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: