Sorghum bicolor RTx430
Sorghum bicolor RTx430 is a grain sorghum inbred commonly used as pollinator in hybrid production, whose genome is known to be rich in repeats.
Plant Introduction (PI) number for Sorghum bicolor (L.) Moench subsp. bicolor, ‘RTx430’ in the U.S. National Plant Germplasm System (GRIN – Global): PI 655996.
Image source: The GRIN database, https://npgsweb.ars-grin.gov/gringlobal/ImgDisplay?id=1125783
Statistics (Source: NCBI, April 2021)
|Sequencing description||Sequencing technologies:||Oxford Nanopore MiniION|
|Assembly description||Assembly methods:||SMARTdenovo v. June-2017; CANU v. 1.6|
|Construction of pseudomolecules|
|Publication:||Deschamps et al (2018)|
|Number of contigs||723|
|Total assembly length (Mb)||666|
|Contig N50 (Mb)||3|
|Total number of genes||36,937|
|Total number of transcripts||49,928|
|Average gene length||3,252|
|Exons per transcript||4|
The Sorghum bicolor Tx430 genome was generated by combining Oxford Nanopore sequences generated on a MinION sequencer with Bionano Genomics Direct Label and Stain (DLS) optical maps, as described by Deschamps et al (Deschamps et al. 2018)(see also https://www.corteva.com). The final chromosome-scale de novo assembly consists of 29 scaffolds, encompassing mostly entire chromosome arms. It has a scaffold N50 of 33.28 Mbps and covers 90% of the expected genome length.
NCBI accession: https://www.ncbi.nlm.nih.gov/assembly/GCA_003482435.1
Genome annotation was carried out as described by (Deschamps et al. 2018). First, genome repeats were masked using RepeatMasker and a curated sorghum specific repeats file from Repbase. The repeat-masked genome was used as input to two categories of gene predictors. De novo gene prediction programs Fgenesh (Solovyev et al, 1994), Augustus (Stanke and Waack, 2003), and SNAP (Bromberg and Rost, 2007) were run under default parameters and the training sets used were monocots, maize, and rice, respectively. The EST, cDNA, long-read evidence-based gene structure modelers GMAP and PASA, as well as the protein evidence-based gene structure modeler SPLAN were also run. Long read sequences of BTx623 line of sorghum from NCBI, along with other sorghum EST’s and cDNA were used as the evidence set to PASA. Other non-sorghum Poales EST, cDNA sequences from NCBI, and monocot transcripts from phytozome were used as additional closely related species evidence for gene prediction with GMAP. Uniref100 plant protein sequences were used as an evidence dataset for gene structure prediction using SPLAN. All gene annotation files were run through EvidenceModeler and the output used to polish the gene boundaries in PASA. The final PASA annotation file was combined with tRNA predictions file from tRNA-ScanSE to obtain the final structural annotation file, along with fasta sequences of protein, CDS, cDNA and gene. For additional details, see (Deschamps et al, 2018).
Bromberg, Yana, and Burkhard Rost. 2007. “SNAP: Predict Effect of Non-Synonymous Polymorphisms on Function.” Nucleic Acids Research 35 (11): 3823–35. PMID: 17526529. https://doi.org/10.1093/nar/gkm238.
Casa, Alexandra M., Gael Pressoir, Patrick J. Brown, Sharon E. Mitchell, William L. Rooney, Mitchell R. Tuinstra, Cleve D. Franks, and Stephen Kresovich. 2008. “Community Resources and Strategies for Association Mapping in Sorghum.” Crop Science 48 (1): 30–40. https://doi.org/10.2135/cropsci2007.02.0080.
Deschamps, Stáphane, Yun Zhang, Victor Llaca, Liang Ye, Gregory May, and Haining Lin. 2018. “A Chromosome-Scale Assembly of the Sorghum Genome Using Nanopore Sequencing and Optical Mapping.” PMID: 30451840. https://doi.org/10.1101/327817.
Solovyev, V. V., A. A. Salamov, and C. B. Lawrence. 1994. “Predicting Internal Exons by Oligonucleotide Composition and Discriminant Analysis of Spliceable Open Reading Frames.” Nucleic Acids Research 22 (24): 5156–63. PMID: 7816600. https://doi.org/10.1093/nar/22.24.5156
Stanke, Mario, and Stephan Waack. 2003. “Gene Prediction with a Hidden Markov Model and a New Intron Submodel.” Bioinformatics 19 Suppl 2 (October): ii215–25. PMID: 14534192. https://doi.org/10.1093/bioinformatics/btg1080.