Today I found by chance the book below titled "Tree-thinking" in a bookshelf in my lab. At that time, I was just looking for other books, but I was attracted by it. This is a trifle, but I wonder if it is just a coincidence that the family name of one of the two authors is 'Baum' (that means 'a tree' in German).
(Here I stop writing about the 'Tree-thinking' book)
I know that herein 'time' means something else, like developmental time course, to many other authors contributing to this issue. To me, it was an ideal form and timing to write about my long-term irritation - underappreciation of basic evolutionary concepts and facts in non-evolutionary studies.
In the review article, I tried to include as many useful ideas and knowledge as possible. It covers an overview of evolutionary distances between model species frequently used in life sciences with a time scale. I also included some examples of changes of gene repertories among vertebrates - such as Bmp16 and Pax10 genes that I usually call "cryptic pan-vertebrate genes". I hope the review paper help foster evolutionary concepts and skills among a wide range of molecular biologists.
My lab recently published an article on RNA-seq library preparation and assessment of de novo transcriptome results. For this technical improvement, we used the Madagascar ground gecko (or ocelot gecko, Paroedura picta) which is bred in our institute for the purpose of developmental biological studies.
We showed the difference between the length distribution of prepared library inserts and that of actually sequenced fragments. We provide some tips to coordinate the length of library inserts with the read length. For example, we prepared a library starting with RNA fragmentation of 2 and 4 minutes' duration (instead of 8 minutes, according to the TruSeq RNA lib prep protocol), to be sequenced with 2x 171 cycles on HiSeq Rapid Mode (also see this article for the reason why we do '171 cycles').
The paper deals with post-sequencing steps, too. You may find the de novo assembly part insufficiently explored, and we admit that there would be more programs and settings to test. The strength of the paper rather lies in the assessment of assembly results. For your prior information, a long-standing solution for assembly completeness assessment was the program pipeline, CEGMA, developed by the Korf Lab. It was announced in May 2015 that CEGMA is no longer supported, and its function is taken over by BUSCO. In our paper, we derived a reference gene set consisting of 233 genes conserved throughout vertebrates including sea lamprey and elephant shark (or ghost shark), which can be fed into CEGMA and BUSCO. This new gene set, CVG (core vertebrate genes), enables more accurate completeness assessment, and especially when used with BUSCO, it saves a lot of time. In fact, in the course of our benchmarks, we noticed suboptimal performances of BUSCO, one of which is the exclusion of cyclostomes and cartilaginous fishes from its original reference gene set, 'vBUSCO' that is supposedly targeting vertebrates.
As announced in the change log, I have included the predicted gene set of the Japanese lamprey (Lethenteron japonicum) in the database #6 of aLeaves, the homolog collection tool I have been maintaining.
The other day I updated the web server I am maintaining, aLeaves, to include latest information for protein-coding genes based on public genome databases.
The database #6 at aLeaves now includes the Ensembl gene set and the consortium gene set for the sea lamprey (Petromyzon marinus) genome. It also contain the set of predicted proteins for the elephant shark genome, although the genome assembly for this species is highly incomplete.
As I have recently been working (although very slowly) in person on output of the HiSeq1500 in our facility, I needed to look for, test and validate some tools to handle fastq files for various purposes. Below I list some of them for those who are starting or will start this sort of work.
For various kinds of filtering/trimming
seqtk - a fixed version that retains full sequence names (or 'comments') is here
[see a post at BioStar]
fastx-tools (of many tools therein, I am using fastx_trimmer and fastq_quality_filter)
prinseq (of many options, I use 'trim_left/right' and 'derep')
condetri - ... I could not get this working in the way I wanted
There should be more useful tools that I did not list here. Please first google with some key words and look into the 'Bioinformatics' forum at SEQanswers to get latest info. Its Wiki page there also provides a list of tools.
I will try to keep this page updated for the lamprey researchers' convenience.
I believe this third-party guide does not interfere any database function and research activity by other researchers. But, please let me know if there is some problem.
My team has launched a new tool 'aLeaves' to allow researchers to collect sequences that are homologous to a query (with NCBI Blast). The motivation to launched it is visualized in an image below. I hope it leads to wider uses of molecular phylogenetics and better understanding about how sequences evolve and how gene families have diversified. I would appreciate any feedback from users of this tool.