Saturday, December 28, 2013

Japanese lamprey gene set now available at aLeaves

As announced in the change log, I have included the predicted gene set of the Japanese lamprey (Lethenteron japonicum) in the database #6 of aLeaves, the homolog collection tool I have been maintaining.

Thursday, September 12, 2013

Sea Lamprey Consortium Gene Set at aLeaves

The other day I updated the web server I am maintaining, aLeaves, to include latest information for protein-coding genes based on public genome databases.

The database #6 at aLeaves now includes the Ensembl gene set and the consortium gene set for the sea lamprey (Petromyzon marinus) genome. It also contain the set of predicted proteins for the elephant shark genome, although the genome assembly for this species is highly incomplete.

Saturday, July 6, 2013

3rd International Whale Shark Conference

The conference web site is here.

Tools for NGS analysis - fastq file processing

As I have recently been working (although very slowly) in person on output of the HiSeq1500 in our facility, I needed to look for, test and validate some tools to handle fastq files for various purposes. Below I list some of them for those who are starting or will start this sort of work.

For various kinds of filtering/trimming

seqtk - a fixed version that retains full sequence names (or 'comments') is here
                      [see a post at BioStar]

fastx-tools (of many tools therein, I am using fastx_trimmer and fastq_quality_filter)

prinseq (of many options, I use 'trim_left/right' and 'derep')

condetri - ... I could not get this working in the way I wanted

For merging overlapping paired-end reads


See this external blog post (from Nov. 2012) for more info

For removing adaptor sequences etc.


cutadapt - Can't this tool accept multiple adapter sequences in a multifasta file?

For retrieving paired reads after read filtering


For removing 'duplicates'

filterPCRdupl (I will not use this any more because 'prinseq -derep 4' does the exactly the same thing much faster)

For validating the tools' functions

fastqc [ also, a tutorial movie available at YouTube ]


prinseq -stats_all

There should be more useful tools that I did not list here. Please first google with some key words and look into the 'Bioinformatics' forum at SEQanswers to get latest info. Its Wiki page there also provides a list of tools.

Tuesday, January 29, 2013

Complicated! - lamprey genome resource availability

I have been asked by many of people around me about publicly available resources of the sea lamprey genome. I have just written down some facts and information on an independent new page inside this blog, titled 'Lamprey genome guide'. The situation is somewhat complicated, I see.

I will try to keep this page updated for the lamprey researchers' convenience.
I believe this third-party guide does not interfere any database function and research activity by other researchers. But, please let me know if there is some problem.

Saturday, January 26, 2013

'aLeaves' launched !

My team has launched a new tool 'aLeaves' to allow researchers to collect sequences that are homologous to a query (with NCBI Blast). The motivation to launched it is visualized in an image below. I hope it leads to wider uses of molecular phylogenetics and better understanding about how sequences evolve and how gene families have diversified. I would appreciate any feedback from users of this tool.