Saturday, July 6, 2013

3rd International Whale Shark Conference

The conference web site is here.


Tools for NGS analysis - fastq file processing

As I have recently been working (although very slowly) in person on output of the HiSeq1500 in our facility, I needed to look for, test and validate some tools to handle fastq files for various purposes. Below I list some of them for those who are starting or will start this sort of work.


For various kinds of filtering/trimming

seqtk - a fixed version that retains full sequence names (or 'comments') is here
                      [see a post at BioStar]

fastx-tools (of many tools therein, I am using fastx_trimmer and fastq_quality_filter)

prinseq (of many options, I use 'trim_left/right' and 'derep')

condetri - ... I could not get this working in the way I wanted


For merging overlapping paired-end reads

cope

See this external blog post (from Nov. 2012) for more info


For removing adaptor sequences etc.

tagdust

cutadapt - Can't this tool accept multiple adapter sequences in a multifasta file?


For retrieving paired reads after read filtering

cmpfastq_pe


For removing 'duplicates'

filterPCRdupl (I will not use this any more because 'prinseq -derep 4' does the exactly the same thing much faster)


For validating the tools' functions

fastqc [ also, a tutorial movie available at YouTube ]

fastx_quality_stats

prinseq -stats_all


There should be more useful tools that I did not list here. Please first google with some key words and look into the 'Bioinformatics' forum at SEQanswers to get latest info. Its Wiki page there also provides a list of tools.