Wednesday, December 2, 2015

Optimizing de novo RNA-seq: coordinating library insert length with read length

My lab recently published an article on RNA-seq library preparation and assessment of de novo transcriptome results. For this technical improvement, we used the Madagascar ground gecko (or ocelot gecko, Paroedura picta) which is bred in our institute for the purpose of developmental biological studies.

Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation

BMC Genomics 2015, 16:977doi:10.1186/s12864-015-2007-1


We showed the difference between the length distribution of prepared library inserts and that of actually sequenced fragments. We provide some tips to coordinate the length of library inserts with the read length. For example, we prepared a library starting with RNA fragmentation of 2 and 4 minutes' duration (instead of 8 minutes, according to the TruSeq RNA lib prep protocol), to be sequenced with 2x 171 cycles on HiSeq Rapid Mode (also see this article for the reason why we do '171 cycles').

The paper deals with post-sequencing steps, too. You may find the de novo assembly part insufficiently explored, and we admit that there would be more programs and settings to test. The strength of the paper rather lies in the assessment of assembly results. For your prior information, a long-standing solution for assembly completeness assessment was the program pipeline, CEGMA, developed by the Korf Lab. It was announced in May 2015 that CEGMA is no longer supported, and its function is taken over by BUSCO. In our paper, we derived a reference gene set consisting of 233 genes conserved throughout vertebrates including sea lamprey and elephant shark (or ghost shark), which can be fed into CEGMA and BUSCO. This new gene set, CVG (core vertebrate genes), enables more accurate completeness assessment, and especially when used with BUSCO, it saves a lot of time. In fact, in the course of our benchmarks, we noticed suboptimal performances of BUSCO, one of which is the exclusion of cyclostomes and cartilaginous fishes from its original reference gene set, 'vBUSCO' that is supposedly targeting vertebrates.

Saturday, May 9, 2015

Our mini-paper introducing iMate Protocol

The modified protocol from my unit is introduced in a mini-paper published online in the journal Biotechniques.


Optimization and cost-saving in tagmentation-based mate-pair library preparation and sequencing
Kaori Tatsumi, Osamu Nishimura, Kazu Itomi, Chiharu Tanegashima, and Shigehiro Kuraku
BioTechniques 58:253-257 (May 2015) doi 10.2144/000114288



Therein, we also introduce our strategy to get 80nt-long, 127nt-long or 171nt-long reads using only SBS kits for 50 cycles with HiSeq Rapid Run mode (v1).

Thursday, April 30, 2015

mate pair library prep made 'easy' (?)

The word 'easy' is probably not appropriate ... . My team has made some effort to achieve substantial cost-savings and modification to the illumina's Nextera Mate Pair Lib Prep kit.

It is a 'handy' guide with some tips, based still largely on the manufacturer's protocol. Our guide document is here, and the journal Biotechniques will publish our mini-paper on this soon.