Thursday, January 14, 2016

Tree-thinking for non-evolutionary biologists

Today I found by chance the book below titled "Tree-thinking" in a bookshelf in my lab. At that time, I was just looking for other books, but I was attracted by it. This is a trifle, but I wonder if it is just a coincidence that the family name of one of the two authors is 'Baum' (that means 'a tree' in German).

(Here I stop writing about the 'Tree-thinking' book)

Recently, I published a review paper titled "Incorporating tree-thinking and evolutionary time scale into developmental biology" with current and past members of my lab, as I was invited to write a review by the journal Development Growth & Differentiation, to contribute to a special issue titled "Time in Development", serving as proceedings of the symposium held in Kobe, Japan, earlier in Spring 2015.

I know that herein 'time' means something else, like developmental time course, to many other authors contributing to this issue. To me, it was an ideal form and timing to write about my long-term irritation  - underappreciation of basic evolutionary concepts and facts in non-evolutionary studies.

In the review article, I tried to include as many useful ideas and knowledge as possible. It covers an overview of evolutionary distances between model species frequently used in life sciences with a time scale. I also included some examples of changes of gene repertories among vertebrates - such as Bmp16 and Pax10 genes that I usually call "cryptic pan-vertebrate genes". I hope the review paper help foster evolutionary concepts and skills among a wide range of molecular biologists.

Wednesday, December 2, 2015

Optimizing de novo RNA-seq: coordinating library insert length with read length

My lab recently published an article on RNA-seq library preparation and assessment of de novo transcriptome results. For this technical improvement, we used the Madagascar ground gecko (or ocelot gecko, Paroedura picta) which is bred in our institute for the purpose of developmental biological studies.

Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation

BMC Genomics 2015, 16:977doi:10.1186/s12864-015-2007-1

We showed the difference between the length distribution of prepared library inserts and that of actually sequenced fragments. We provide some tips to coordinate the length of library inserts with the read length. For example, we prepared a library starting with RNA fragmentation of 2 and 4 minutes' duration (instead of 8 minutes, according to the TruSeq RNA lib prep protocol), to be sequenced with 2x 171 cycles on HiSeq Rapid Mode (also see this article for the reason why we do '171 cycles').

The paper deals with post-sequencing steps, too. You may find the de novo assembly part insufficiently explored, and we admit that there would be more programs and settings to test. The strength of the paper rather lies in the assessment of assembly results. For your prior information, a long-standing solution for assembly completeness assessment was the program pipeline, CEGMA, developed by the Korf Lab. It was announced in May 2015 that CEGMA is no longer supported, and its function is taken over by BUSCO. In our paper, we derived a reference gene set consisting of 233 genes conserved throughout vertebrates including sea lamprey and elephant shark (or ghost shark), which can be fed into CEGMA and BUSCO. This new gene set, CVG (core vertebrate genes), enables more accurate completeness assessment, and especially when used with BUSCO, it saves a lot of time. In fact, in the course of our benchmarks, we noticed suboptimal performances of BUSCO, one of which is the exclusion of cyclostomes and cartilaginous fishes from its original reference gene set, 'vBUSCO' that is supposedly targeting vertebrates.

Saturday, May 9, 2015

Our mini-paper introducing iMate Protocol

The modified protocol from my unit is introduced in a mini-paper published online in the journal Biotechniques.

Optimization and cost-saving in tagmentation-based mate-pair library preparation and sequencing
Kaori Tatsumi, Osamu Nishimura, Kazu Itomi, Chiharu Tanegashima, and Shigehiro Kuraku
BioTechniques 58:253-257 (May 2015) doi 10.2144/000114288

Therein, we also introduce our strategy to get 80nt-long, 127nt-long or 171nt-long reads using only SBS kits for 50 cycles with HiSeq Rapid Run mode (v1).

Thursday, April 30, 2015

mate pair library prep made 'easy' (?)

The word 'easy' is probably not appropriate ... . My team has made some effort to achieve substantial cost-savings and modification to the illumina's Nextera Mate Pair Lib Prep kit.

It is a 'handy' guide with some tips, based still largely on the manufacturer's protocol. Our guide document is here, and the journal Biotechniques will publish our mini-paper on this soon.

Saturday, December 28, 2013

Japanese lamprey gene set now available at aLeaves

As announced in the change log, I have included the predicted gene set of the Japanese lamprey (Lethenteron japonicum) in the database #6 of aLeaves, the homolog collection tool I have been maintaining.

Thursday, September 12, 2013

Sea Lamprey Consortium Gene Set at aLeaves

The other day I updated the web server I am maintaining, aLeaves, to include latest information for protein-coding genes based on public genome databases.

The database #6 at aLeaves now includes the Ensembl gene set and the consortium gene set for the sea lamprey (Petromyzon marinus) genome. It also contain the set of predicted proteins for the elephant shark genome, although the genome assembly for this species is highly incomplete.

Saturday, July 6, 2013

3rd International Whale Shark Conference

The conference web site is here.

Tools for NGS analysis - fastq file processing

As I have recently been working (although very slowly) in person on output of the HiSeq1500 in our facility, I needed to look for, test and validate some tools to handle fastq files for various purposes. Below I list some of them for those who are starting or will start this sort of work.

For various kinds of filtering/trimming

seqtk - a fixed version that retains full sequence names (or 'comments') is here
                      [see a post at BioStar]

fastx-tools (of many tools therein, I am using fastx_trimmer and fastq_quality_filter)

prinseq (of many options, I use 'trim_left/right' and 'derep')

condetri - ... I could not get this working in the way I wanted

For merging overlapping paired-end reads


See this external blog post (from Nov. 2012) for more info

For removing adaptor sequences etc.


cutadapt - Can't this tool accept multiple adapter sequences in a multifasta file?

For retrieving paired reads after read filtering


For removing 'duplicates'

filterPCRdupl (I will not use this any more because 'prinseq -derep 4' does the exactly the same thing much faster)

For validating the tools' functions

fastqc [ also, a tutorial movie available at YouTube ]


prinseq -stats_all

There should be more useful tools that I did not list here. Please first google with some key words and look into the 'Bioinformatics' forum at SEQanswers to get latest info. Its Wiki page there also provides a list of tools.

Tuesday, January 29, 2013

Complicated! - lamprey genome resource availability

I have been asked by many of people around me about publicly available resources of the sea lamprey genome. I have just written down some facts and information on an independent new page inside this blog, titled 'Lamprey genome guide'. The situation is somewhat complicated, I see.

I will try to keep this page updated for the lamprey researchers' convenience.
I believe this third-party guide does not interfere any database function and research activity by other researchers. But, please let me know if there is some problem.

Saturday, January 26, 2013

'aLeaves' launched !

My team has launched a new tool 'aLeaves' to allow researchers to collect sequences that are homologous to a query (with NCBI Blast). The motivation to launched it is visualized in an image below. I hope it leads to wider uses of molecular phylogenetics and better understanding about how sequences evolve and how gene families have diversified. I would appreciate any feedback from users of this tool.

Wednesday, October 3, 2012

Sunday, August 26, 2012

Recent addition to my collection

After I moved to Japan in early Spring, I visited aquarium parks in Japan several times. The most impressive one was that in Okinawa, Churaumi Aquarium.

The newest one I visited was that in Tokyo, Sumida Aquarium. This one was just opened in May 2012, beside the Tokyo Skytree also just opened at the same time.

The below are the miniature figures I bought in the souvenir shop in the Sumida Aquarium. On the left you see the zebra shark (Stegostoma fasciatum), and on the right, the leopard shark (Triakis semifasciata).

The one on the right is my favorite. I didn't expect that I can find figures for these species at all. I got very excited and ended up with spending two thousand yen for them.

Friday, August 17, 2012

EEA 2012 in Milan, Italy

European Elasmobranch Association’s 16th Annual Conference will be held in Milan, Italy, 22th-25h November 2012. [ conference web site ]