Towards an improved apple reference transcriptome using RNA-seq (Bai et. al. 2014)
Overview
Abstract from Bai. et. al, 2014: The reference genome of apple (Malus × domestica) has been available since 2010. Despite being a milestone in apple genomics, the reference genome is difficult to be used as a reference in RNA-seq (RNA sequencing) analysis, a widespread technology in transcriptomic studies. One of the major limitations appears to be the low coverage of the reference transcriptome in RNA-seq mapping of reads. To improve the reference transcriptome, we obtained 14 sets of strand-specific RNA-seq data of 168.5 million reads (filter passed) in total from fruit of Golden Delicious (GD, the source of the reference genome) in varying growth and developmental stages. Using a combination of genome-guided assembly and de novo assembly, the apple reference transcriptome was improved to a collection of 71,178 genes or transcripts, which includes 53,654 genes predicted originally (with MDP prefixed in their IDs) and 17,524 novel transcripts. Of these novel transcripts, 8,144 were identified from reads directly mapped to the reference genome while the remaining 9,380 were extracted from de novo assemblies of reads that could not be initially mapped to the reference genome. Evaluating the improved apple reference transcriptome with reads from Golden Delicious and other genotypes used in this and other studies showed that it allowed 62.5 ± 9.3-82.3 ± 2.7 % of reads to be mapped, a marked increase from the low rates of 37.4 ± 7.7-46.6 ± 7.1 % offered by the original reference transcriptome. The improved reference transcriptome therefore represents a step forward towards a complete reference transcriptome in apple.
Downloads
All assembly and annotation files are available for download by selecting the desired data type in the right-hand side bar. Each data type page will provide a description of the available files and links do download. Alternatively, browse the project's FTP directory for all files. Assembly
Reads and contigs from the RNA-seq data that did not map to the original M. x domestica v1.0 genome contigs were assembled into novel contigs. This set consisted of 9,605 new contigs on which de novo transcripts were later identified. A FASTA file of these new contigs is available below. Additionally, a file containing 131,712 contigs (122,107 original contigs + 9,605 new contigs) is also available:
Procedures
Three rounds of analyses were performed to confirm existing gene models or to identify new transcripts:
GenesTranscripts
Three types of gene models/transcripts are available from this project:
Gene models from the original M. x domestica v1.0 assembly have had their names shortened for brevity but the unique numerical identifier is the same (e.g. MDP0000122515 is shortened to M122515). These gene models/novel transcripts are each available below as well as a single file containing all models.
Alignments
There are two alignment files in BAM format that can be accessed using samtools or other BAM file viewers. These files are large but it may not be necessary to download them. If the desired tool (e.g. samtools) supports remote access of BAM files, simply cut-and-paste the URLs below for use by the tool. Alignment of transcriptome to genomic contigs
Alignment of all RNA-Seq reads (from the 14 samples) to genes
Acknowledgements
The data and information for this project were provided to GDR by the Kenong Xu lab of Cornell University and were formatted by the GDR team for public access through GDR. |