The draft genome sequence of an upland wild rice species, oryza granulata
- Select a language for the TTS:
- UK English Female
- UK English Male
- US English Female
- US English Male
- Australian Female
- Australian Male
- Language selected: (auto detect) - EN

Play all audios:

ABSTRACT Exploiting novel gene sources from wild relatives has proven to be an efficient approach to advance crop genetic breeding efforts. _Oryza granulata_, with the GG genome type,
occupies the basal position of the _Oryza_ phylogeny and has the second largest genome (~882 Mb). As an upland wild rice species, it possesses renowned traits that distinguish it from other
_Oryza_ species, such as tolerance to shade and drought, immunity to bacterial blight and resistance to the brown planthopper. Here, we generated a 736.66-Mb genome assembly of _O.
granulata_ with 40,131 predicted protein-coding genes. With Hi-C data, for the first time, we anchored ~98.2% of the genome assembly to the twelve pseudo-chromosomes. This chromosome-length
genome assembly of _O. granulata_ will provide novel insights into rice genome evolution, enhance our efforts to search for new genes for future rice breeding programmes and facilitate the
conservation of germplasm of this endangered wild rice species. Measurement(s) DNA • RNA • transcriptome • genome coverage • sequence_assembly • sequence feature annotation Technology
Type(s) DNA sequencing • RNA sequencing • flow cytometry method • computational modeling technique • sequence assembly process • sequence annotation Sample Characteristic - Organism Oryza
granulata Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.12063198 SIMILAR CONTENT BEING VIEWED BY OTHERS A PANGENOME REFERENCE OF WILD AND
CULTIVATED RICE Article Open access 16 April 2025 A HIGH-QUALITY CHROMOSOME-LEVEL WILD RICE GENOME OF _ORYZA COARCTATA_ Article Open access 14 October 2023 CHROMOSOME-LEVEL GENOME ASSEMBLY
OF _ZIZANIA LATIFOLIA_ PROVIDES INSIGHTS INTO ITS SEED SHATTERING AND PHYTOCASSANE BIOSYNTHESIS Article Open access 11 January 2022 BACKGROUND & SUMMARY As one of the most important
crops in the world, rice is the most water-consuming cereal. Rice cultivation and yield depend greatly on water resources. The genetic breeding of drought-tolerant rice is a promising
direction under the currently mounting water shortage. However, most species in the genus _Oryza_ prefer moist and even aquatic habitats, and thus, upland rice breeding is very demanding due
to the scarcity of gene sources with drought tolerance in the genus _Oryza_. The genus _Oryza_ contains more than twenty species, including two cultivated species domesticated independently
from different wild species1. Compared to the majority of other grass species, _Oryza_ species have relatively small genomes and abundant morphological and ecological diversity. _Oryza
granulata_ occupies the basal position of the _Oryza_ phylogeny, first diverging from other members of the genus 8.8–10.2 million years ago2. _O. exasperata_ (A. Braun) Heer was identified
according to a spikelet fossil, which was found in an excavation site of Miocene age in Germany and appears to resemble the spikelet of extant _O. granulata_ based on its morphology3,4. As
an upland wild rice species, _O. granulata_ possesses renowned traits that distinguish it from other _Oryza_ species, such as tolerance to shade and drought, immunity to bacterial blight and
resistance to the brown planthopper5,6,7. Because of the distant evolutionary relationship of this species with cultivated rice, it has long been challenging to apply conventional methods
used in rice breeding programmes to it. Compared to that for other wild species closely related to cultivated rice, little effort has been made to perform genetic studies and germplasm
exploitation in _O. granulata_ due to the lack of a high-quality genome assembly. Among the diploid _Oryza_ species, _O. granulata_ (GG genome type) has the second largest genome (~882 Mb),
smaller than only that of _O. australiensis_ (~965 Mb, EE genome type)8, which is approximately two times larger than the rice genome (~420 Mb, AA genome type)9. The two-fold increase in
genome size is mainly due to the accumulation of transposable elements (TEs) in _O. granulata_, which may have seriously eroded genome collinearity compared with that in other related rice
species8,10,11. In the last decade, great progress has been made in comparative genomics of cultivated rice and its wild relatives1,12,13,14,15,16,17, with much of this work performed at the
chromosome scale. In the first released genome assembly of _O. granulata_, the assembled genome sequences were not anchored to the chromosomes11. This undoubtedly limits the use of _O.
granulata_ as a basic _Oryza_ lineage to accurately infer the genome evolution of _O. granulata_ compared to other rice species at the chromosome level. _O. granulata_ is naturally
distributed in South Asia, including China, India, Cambodia, Indonesia, Laos, Myanmar, Nepal, the Philippines, Sri Lanka, and Thailand18. It is seriously threatened due to ongoing human
disturbance and rapid deforestation19. Previous population genetic studies revealed that this species possesses fairly low levels of genetic diversity within populations but high genetic
differentiation among populations20,21. The considerable genomic diversity detected through pan-genome analysis demonstrates that _de novo_ assembly of more than one genome helps reveal the
origin and evolutionary forces of population structure and levels of genomic diversity22. Thus, sequencing an additional genome of _O. granulata_ from a genetically different population
compared with the previously sequenced accession collected in India11 is needed. The availability of a chromosome-scale genome of _O. granulata_ will lay the foundation for further
evolutionary studies as well as the improvement of desired agronomic traits relevant to rice breeding programmes. Here, we present a new chromosome-scale genome of _O. granulata_ assembled
_de novo_ using the Illumina and Hi-C sequencing platforms. In contrast to the previously sequenced _O. granulata_ accession (IRGC Acc. No. 102117) from India, the sequenced plant was
collected in Yunnan, China, and thus, the plants were geographically separated. The obtained genome assembly will provide novel insights into the genomic diversity and genome evolution of
the genus _Oryza_ and enhance the exploration of precious wild rice germplasm resources. METHODS PLANT MATERIAL COLLECTION, TOTAL DNA ISOLATION AND GENOME SEQUENCING For genome sequencing,
we collected dozens of _O. granulata_ plants from Menghai County, Yunnan Province, China, which were planted in the greenhouse of the Kunming Institute of Botany, Chinese Academy of
Sciences. Fresh and healthy leaves were harvested from the best-growing individual and immediately frozen in liquid nitrogen, followed by preservation at −80 °C in the laboratory prior to
DNA extraction. High-quality genomic DNA was extracted from leaves using a modified CTAB method23. RNase A was used to remove RNA contaminants. The quality and quantity of the extracted DNA
were examined using a NanoDrop 2000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA) and electrophoresis on a 0.8% agarose gel, respectively. A total of three 260-bp
short-insert libraries and five long-insert libraries (3 kb, 10 kb and 20 kb) were prepared following Illumina’s instructions. Then, the Illumina HiSeq. 2000 (PE100 and PE101) and HiSeq.
2500 (PE125 and PE150) platforms were employed for whole-genome sequencing according to the standard Illumina protocols (Illumina, San Diego, CA, USA). In total, we generated approximately
133.38 Gb (~168.41×) of raw data (Table 1). HI-C DATA PREPARATION We constructed Hi-C libraries using young leaves collected from the same individual plant of _O. granulata_ for high-quality
DNA isolation by following the standard protocol described previously with certain modifications24. Approximately 5-g leaf samples were cut into minute pieces and cross-linked by 2%
formaldehyde solution at room temperature for 15 minutes. Then, the sample was mixed with excess 2.5 M glycine to stop the cross-linking reaction and neutralize the remaining formaldehyde.
The Hi-C library was constructed and sequenced by Annoroad Genomics (Beijing, China) with the standard procedure described as follows. The cross-linked DNA was extracted and then digested
with _Mbo_I restriction enzyme. The sticky ends of the digested fragments were biotinylated and proximity ligated to form ligation junctions that were enriched for and then ultrasonically
sheared to a size of 200–500 bp. The biotin-labelled DNA fragments were pulled down and ligated with Illumina paired-end adapters and then amplified by PCR to produce the Hi-C sequencing
library. The library was sequenced with the Illumina HiSeq. X Ten (PE150) platform, and a total of ~109.41 Gb (~138.15×) of raw sequencing data was produced (Table 1). RNA ISOLATION AND
TRANSCRIPTOME SEQUENCING A total of seven tissues representing different developmental stages of _O. granulata_ were sampled to generate the RNA-Seq data needed for subsequent genome
annotation. These tissues included panicles at three different stages of flower development, flag leaves, and stems and the shoots and roots of three-leaf seedlings. Because of the low
germination rate of _O. granulata_, seedlings were germinated from seeds harvested from multiple plants, while the remaining tissues were sampled from the individual used for genome
sequencing. All collected samples were quickly frozen in liquid nitrogen and stored in a refrigerator at −80 °C before RNA extraction. RNA was individually extracted from each tissue using
TRI reagent (Molecular Research Centre, Inc., Cincinnati, OH, USA), according to the instructions provided with the reagents. Seven libraries were constructed and sequenced by Biomarker
Technologies (Beijing, China) on the Illumina HiSeq. 2500 platform with a read length of 126 bp. In total, ~21.8 Gb of high-quality data was obtained and used for subsequent assembly after
filtering the low-quality and duplicated reads caused by PCR amplification (Table 2). ESTIMATION OF GENOME SIZE The genome size of _O. granulata_ was estimated using two methods, including
_k_-mer frequency distribution and flow cytometric analysis. We first estimated and validated the genome size of _O. granulata_ using flow cytometric analysis. A total of 40-50 mg of fresh
leaves was collected for sample preparation using the OTTO method25,26. Nuclear samples were analysed using a BD FACSCalibur (BD Biosciences, USA) flow cytometer. CellQuest software (BD
Biosciences, USA) was used to analyse the flow cytometry results and gate all cells of interest. Here, CV = D/M × 100%, where D is the standard deviation of the cell distribution and M is
the average of the cell distribution. The average coefficient of variation (CV) was used to evaluate the results, with CV < 5% considered reliable. Nuclear DNA content was calculated as a
linear relationship between the ratios of 2C-value peaks of the sample and standard. When _O. sativa_ ssp. _japonica_ cv. Nipponbare (~389 Mb)9,12 and _Zea mays_ ssp. _mays_ var. B73 (2,300
Mb)27 were employed as inner standards, the estimated genome size of _O. granulata_ was ~672 Mb and ~707 Mb, respectively, both of which were smaller than the previous estimate (882 Mb)
(Fig. 1). Meanwhile, we generated the 17-mer occurrence distribution of sequencing reads from short libraries using the _k_-mer method (Fig. 2). Then, we estimated the genome size to be ~792
Mb, and the proportion of repeat sequences and heterozygosity rate of the genome were determined to be approximately 70.7% and 0.76%, respectively, using GCE28. GENOME ASSEMBLY We assembled
the _O. granulata_ genome using ALLPATHS-LG29 and SSPACE30. First, the high-quality paired-end Illumina reads from short-insert-size libraries were assembled into contig sequences using
ALLPATHS-LG. This process yielded assembly results with a contig N50 value of 22,359 bp and total length of ~732.33 Mb. Second, all mate-pair reads with large insert sizes (≥2 kb) were
aligned onto the preassembled contigs. According to the order and distance information, the assembled contigs were further elongated and eventually combined into scaffolds using SSPACE. We
closed the gaps that might be repeat sequences masked during the construction of scaffolds using GapCloser31. Briefly, all paired-end sequencing reads were first mapped onto the assembled
scaffolds, and then those read pairs with one read well aligned to the contigs and another located in a gap region were retrieved and locally assembled to close gaps. Consequently, the _O.
granulata_ genome assembly had a total length of ~736.66 Mb, which accounted for ~93% of the genome size estimated by _k_-mer analysis, containing 2,393 scaffolds (N50 = 916.3 kb; N90 =
239.8 kb) and 29,963 contigs (N50 = 43.9 kb; N90 = 12.0 kb). There were 1,146 scaffolds with lengths >100 kb, among which the largest scaffold had a sequence length of 4,040,447 bp (Table
3). Three approaches were used to evaluate the completeness and accuracy of this genome assembly. First, we mapped all high-quality reads (~186.8 million, ~87×) from short-insert-size
libraries back to the assembly using BWA (Burrows-Wheeler Aligner)32, showing good alignments with an average mapping rate of ~99.46%. Second, the completeness of genome assembly and gene
prediction was assessed with BUSCO (Benchmarking Universal Single-Copy Orthologs)33 according to collections from the Embryophyta lineage. Our gene predictions revealed 1,390 (96.53%) of the
1,440 highly conserved core proteins in the Embryophyta lineage. Third, the RNA sequencing reads generated in this study were assembled into a total of 137,380 transcripts using Trinity34,
which had an N50 value of 1,035 bp and a total length of ~88.8 Mb. Then, they were aligned back to the genome assembly using GMAP35. Our results showed that a total of 89,977 transcripts
could be successfully aligned to the genome assembly with a mapping rate of 65.5%. After filtering the low-quality reads using Trimmomatic36, clean paired-end reads of Hi-C data were mapped
to the assembled scaffolds by BWA-MEM32. Finally, 1,265 (723.2 Mb, 98.2% of the assembled length) of 2,393 scaffolds were mapped, grouped and ordered into 12 chromosomes using LACHESIS37
(Table 4). ANNOTATION OF PROTEIN-CODING GENES We predicted protein-coding genes of the _O. granulata_ genome using three methods, including _ab initio_ gene prediction, homology-based gene
prediction and RNA-Seq-aided gene prediction. Prior to gene prediction, the assembled _O. granulata_ genome was hard and soft masked using RepeatMasker38. We adopted Augustus39,40,41 and
SNAP42 to perform _ab initio_ gene prediction. Models used for each gene predictor were trained from a set of high-quality proteins generated from the RNA-Seq dataset. We used Exonerate43
and GeneWise44,45 to conduct homology-based gene prediction. First, the protein sequences were aligned to the _O. granulata_ genome assembly using Exonerate with the default parameters.
Second, given that GeneWise is a time-consuming program to run, we mapped the protein sequences from _O. sativa_ ssp. _japonica_ cv. Nipponbare (MSU 7.0) to the _O. granulata_ genome using
GenBlastA46 prior to GeneWise prediction. Homologous genomic fragments of the target genes together with their 5-kb upstream and downstream flanking sequences were then extracted using an
in-house Perl script. Finally, GeneWise was used to align them against the corresponding proteins to determine gene structures. To carry out RNA-Seq-aided gene prediction, we first assembled
clean RNA-Seq reads into transcripts using Trinity34, which were then aligned to our genome assembly using PASA47. The output included a set of consistent and non-overlapping sequence
assemblies, which were used to describe the gene structures. We combined all gene structures obtained from the three above-mentioned sets of predictions, including _ab initio_ gene
predictions and protein and transcript alignments, with the weighted consensus gene set using EVidenceModeler (EVM)48. To perform further filtering, the genes with peptide lengths shorter
than 50 amino acids and/or containing inner stop codons were removed. In total, 40,131 protein-coding genes with an average length of 3,152 bp were predicted in the assembled _O. granulata_
genome. To assess the quality of gene prediction, we compared the length distributions of protein-coding genes, coding sequences (CDS), exons and introns with those from the other four
species (_Arabidopsis thaliana, Sorghum bicolor, Z. mays_ and _O. sativa_), among which we did not observe any obvious differences in the length distribution of gene features (Fig. 3; Table
5). Then, we surveyed the proportion of our predicted _O. granulata_ gene sets supported by RNA-Seq and homologous proteins. We aligned the assembled transcripts against our gene predictions
using the BLAST program49,50. Only hits with a coverage ≥80% and an identity ≥90% were retained. Our analysis showed that approximately 47.58% (19,094) of the predicted gene models were
supported by RNA-Seq data. Next, we downloaded protein sequences of _O. sativa_ ssp. _japonica_ cv. Nipponbare and aligned them to the predicted gene models using BLAST. We filtered those
hits with an identity <30% or a gene coverage <80%. We found that 23,871 gene models, accounting for approximately 59.48% of the total genes, were supported by evidence of homologous
proteins in rice. Combining genes validated by the two above-described methods, 28,823 genes, representing ~71.82% of the total _O. granulata_ gene set, were supported by RNA-Seq and/or
homologous proteins (Table 6). Gene functions were inferred according to the best match of the alignments to the National Center for Biotechnology Information (NCBI) Non-Redundant (NR) and
Swiss-Prot protein databases using BLASTP49,50 and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database with an E-value threshold of 1E-5. The motifs and domains within gene models
were identified by PFAM databases51. Gene Ontology (GO) IDs for each gene were obtained from Blast2GO52. In total, approximately 85.81% of the predicted protein-coding genes of _O.
granulata_ could be functionally annotated with known genes, conserved domains, and Gene Ontology terms (Table 6). ANNOTATION OF NON-CODING RNA GENES Five different types of non-coding RNA
genes, namely, transfer RNA (tRNA) genes, ribosomal RNA (rRNA) genes, small nucleolar RNA (snoRNA) genes, small nuclear RNA (snRNA) genes and microRNA (miRNA) genes, were predicted using _de
novo_ and homology search methods. We used tRNAscan-SE algorithms53 with default parameters to identify the genes associated with tRNA, which is an adaptor molecule composed of RNA used in
biology to bridge the three-letter genetic code in messenger RNA (mRNA) with the twenty-letter code of amino acids in proteins. The rRNA genes (8S, 18S, and 28S), which are the RNA
components of the ribosome and associated with the enzyme representing the site of protein synthesis in all living cells, were predicted using RNAmmer algorithms54 with default parameters.
snoRNAs are a class of small RNA molecules that guide chemical modifications of other RNAs, mainly ribosomal RNAs, transfer RNAs and small nuclear RNAs. The snoRNA genes were annotated using
Snoscan55 with the yeast rRNA methylation sites and yeast rRNA sequences provided by the Snowscan distribution. snRNA is a class of small RNA molecules that are found within the nucleus of
eukaryotic cells. They are involved in a variety of important processes, such as RNA splicing (removal of introns from hnRNA), regulation of transcription factors (7SK RNA) or RNA polymerase
II (B2 RNA), and maintenance of telomeres. The snRNA genes were identified by Infernal software against the Rfam database with default parameters56,57. The miRNA genes were annotated in two
steps. First, we downloaded the existing rice miRNA entries from miRBase58. Then, the conserved miRNAs were identified by mapping all miRBase-recorded rice miRNA precursor sequences against
the assembled _O. granulata_ genome using BLASTN with cut-offs at an identity >60% and a query coverage >60%. Second, when a miRNA was mapped to the target _O. granulata_ genome, the
surrounding sequence was checked for hairpin structures. The loci with miRNA precursor secondary structures were annotated as miRNA genes. We annotated a total of 1,003 tRNA genes, 221 rRNA
genes, 295 snoRNA genes, 101 snRNA genes and 257 miRNA genes belonging to 50 miRNA families in the _O. granulata_ genome (Table 7). To investigate miRNA-target genes involved in important
biological pathways, the target genes of miRNAs were predicted using the psRNATarget server with default parameters. Finally, 963 miRNA-target sites were identified. The protein sequences of
these target genes were blasted against _O. sativa_ proteins in the Rice Genome Annotation Project Database59 using the BLASTP program. The results were imported into the agriGO60 server by
comparing them with the whole set of protein-coding genes of _O. sativa_ as a background. KO (KEGG Orthology) annotation of target genes was implemented using the BlastKOALA program61 with
the eukaryote gene database. ANNOTATION OF REPEAT SEQUENCES We identified the known TEs within the _O. granulata_ genome using RepeatMasker with the Repbase TE library62,63.
RepeatProteinMask searches were also conducted using the TE protein database as a query library. The annotation of repeat sequences of the _O. granulata_ genome is summarized in Table 8. The
annotation showed that approximately 61.98% (456.6 Mb) of the assembled genome consisted of repeat sequences, and the proportion of repeat sequences varied largely from one type to another.
DNA transposons and other repeats contributed only ~9.83% and ~1.1% to the assembled genome, respectively. In contrast, retrotransposons represented half (~51.05%) of the genome assembly.
We constructed a _de novo_ repeat library of the _O. granulata_ genome using RepeatModeler, which can automatically execute two core _de novo_ repeat-finding programs, namely, RECON64 and
RepeatScout65, to comprehensively conduct, refine and classify consensus models of putative interspersed repeats for the _O. granulata_ genome. Furthermore, we performed a _de novo_ search
for long terminal repeat (LTR) retrotransposons against the _O. granulata_ genome sequences using LTR_STRUC66. All intact LTR retrotransposons were classified into Ty1/_copia_, Ty3_/gypsy_
and unclassified groups according to both reverse transcriptase (RT) sequence similarity and the order of ORFs using Pfam51. The RT sequences were retrieved from each retrotransposon element
and further checked by homology searches using ClustalW67 against the published RTs that were downloaded from the _Gypsy_ Database (GyDB)68. LTR retrotransposons (~50.83%) represented most
of the RNA transposons in the _O. granulata_ genome, accounting for approximately 43.41% of the assembly. They belonged to two types of LTR retrotransposon superfamilies: Ty1/_copia_ and
Ty3/_gypsy_ (~5.58% and ~37.83%, respectively) (Table 8). We also identified tandem repeats using the Tandem Repeat Finder (TRF) package69 and the non-interspersed repeat sequences,
including low-complexity repeats, satellites and simple repeats, using RepeatMasker (Table 8). A total of six types of simple sequence repeats (SSRs), from mono- to hexa-nucleotides, were
identified using the MISA (MIcroSAtellite) identification tool70. The minimum repeat unit size was set at twelve for mono-nucleotides, at six for di-nucleotides, at four for tri-nucleotides,
and at three for tetra- to hexa-nucleotides. As a result, a total of 183,339 SSRs were detected in the _O. granulata_ genome. Of these, tri-nucleotide SSRs accounted for the largest
proportion, both in quantity and sequence length, followed by tetra-nucleotides, di-nucleotides and other types (Table 9). These SSRs will provide valuable molecular markers to assist rice
breeding programmes. DATA RECORDS All sequencing reads have been deposited into the NCBI Sequence Read Archive (SRA)71 and BIG Genome Sequence Archive72. The assembled genome sequence is
available from the NCBI73,74 and BIG Genome Warehouse75. The protein-coding gene, non-coding gene, and repeat sequence annotation results and functional prediction results are available from
the Figshare database76. TECHNICAL VALIDATION RNA INTEGRITY Before constructing RNA-Seq libraries, the concentration and amount of total RNA were separately evaluated using a NanoDrop 2000
UV-VIS spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA), and the rRNA ratio and RNA integrity were estimated using an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto,
CA, USA). For each tissue, only total RNAs with a total amount ≥15 μg, a concentration ≥400 ng/μl, an RNA integrity number (RIN) ≥7, and an rRNA ratio ≥1.4 were used to construct a cDNA
library according to the manufacturer’s instructions (Illumina, USA). QUALITY FILTERING OF ILLUMINA SEQUENCING RAW READS To eliminate adapter contaminants and potential sequencing errors,
using Trimmomatic36, we removed the following five types of reads: (1) reads with ≥10 bp derived from the adapter sequences (allowing ≤10% mismatches); (2) reads with unidentified bases (Ns)
constituting ≥10% of their length; (3) reads with ≥40% low-quality bases (Phred score ≤5); (4) reads caused by PCR duplications (i.e., read 1 and read 2 of two paired-end reads that were
completely identical); and 5) reads with a _k_-mer frequency ≤3 (aiming to minimize the influences of sequencing errors). These five filtering processes resulted in a total of ~108.72 Gb
(~137.27×) of high-quality data, which were retained and used for subsequent analysis (Table 1). COMPARISONS OF THE GENOME ASSEMBLIES AND ANNOTATION We produced high-depth sequencing data
for _O. granulata_ using the Illumina and Hi-C sequencing platforms. Then, we _de novo_ assembled an ~736.66 Mb genome assembly of _O. granulata_ comprising 2,393 scaffolds with a scaffold
N50 of ~916.3 kb (Online-only Table 1). The contig N50 was ~43.9 kb, which was higher than that obtained for the genome assemblies of other _Oryza_ species with similar second-generation
sequencing technology17. With Hi-C data, for the first time, we anchored approximately 98.2% of the genome assembly into the twelve pseudo-chromosomes. Due to the short reads sequenced by
the Illumina platform and a large number of repeat sequences, the total lengths of genome assembly (~736.66 Mb) and repetitive sequences (~456.57 Mb) are shorter than those in the previous
genome assembly (~776.96 Mb and ~528.04 Mb, respectively)11 (Online-only Table 1). This may be attributed to the ~21 × PacBio data overcoming the above-mentioned difficulties to some extent,
resulting in the assembly of an additional portion of repetitive sequences. However, we obtained fewer scaffolds but a longer scaffold N50 compared to those in the previous genome
assembly11. We predicted 40,131 protein-coding genes and observed that the gene and ncRNA annotations were somewhat better than those for the previous genome assembly (Online-only Table 1).
This was also evidenced by the evaluation using BUSCO, showing that 1,390 genes (~96.53%) were completely identified, which is somewhat better than the number for the previous genome
assembly11. Thus, the newly released genome assembly, which has good continuity and integrity, is comparable to other sequenced _Oryza_ genomes. CODE AVAILABILITY The sequence data were
generated using the software provided by the sequencing platform manufacturer and processed with commands provided for the public software cited in the manuscript. No custom computer code
was generated in this work. The following bioinformatic tools and versions were used to generate all results as described in the main text. Default parameters were used if not stated. 1.
CellQuest version 5.1. 2. GCE (Genome Characteristics Estimation) version 1.0.0 was used to estimate genome size, ftp://ftp.genomics.org.cn/pub/gce/. 3. ALLPATHS-LG version 48894 was used
for genome assembly, http://software.broadinstitute.org/allpaths-lg/blog/. 4. SSPACE version 3.0 was used for genome assembly scaffolding,
https://www.baseclear.com/services/bioinformatics/basetools/sspace-standard/. 5. GapCloser version 1.12 was used to fill the gaps between scaffolds, http://soap.genomics.org.cn/about.html.
6. BWA (Burrows-Wheeler Aligner) version 0.7.15 was used for short read mapping, https://github.com/lh3/bwa/. 7. BUSCO (Benchmarking Universal Single-Copy Orthologs) was used to check the
completeness of the genome assembly, with coverage ≥ 90% and identity ≥ 90% parameters, https://gitlab.com/ezlab/busco/. 8. Trinity version v2.0.6 was used to assemble the RNA sequencing
reads, https://github.com/trinityrnaseq/trinityrnaseq. 9. GMAP version 2014-10-2 was used to map the assembled transcripts to the genome sequence with coverage ≥ 90% and identity ≥ 90%
parameters, http://research-pub.gene.com/gmap. 10 LACHESIS was used for ultra-long-range scaffolding with Hi-C data with CLUSTER_N = 12, CLUSTER_MIN_RE_SITES = 300, CLUSTER_MAX_LINK_DENSITY
= 8, ORDER_MIN_N_RES_IN_TRUNK = 100, and ORDER_MIN_N_RES_IN_SHREDS = 10 parameters, http://shendurelab.github.io/LACHESIS/. 11. RepeatMasker version 4.0.3 was used to mask the repeat
sequences in the genome, http://repeatmasker.org/. 12. Augustus version 2.7 was used for _de novo_ gene prediction, http://augustus.gobics.de/. 13. SNAP version 2006-07-28 was used for _de
novo_ gene prediction, https://github.com/KorfLab/SNAP. 14. Exonerate version 2.2.0 was used to align proteins to the genome sequence, https://www.ebi.ac.uk/~guy/exonerate/. 15. GeneWise
version 2-2-0 was used to predict gene structure using similar protein sequences, http://www.ebi.ac.uk/~birney/wise2. 16. GenBlastA version 1.0.1 was used to link the high-scoring pairs
(HSPs), http://genome.sfu.ca/genblast/download.html. 17. PASA (Program to Assemble Spliced Alignments) was used to exploit gene structure using transcripts, http://pasapipeline.github.io/.
18. EVidenceModeler (EVM) version 1.1.1 was used to combine gene predictions generated from different methods into consensus gene structures, http://evidencemodeler.github.io/. 19. BLAST
version 2.2.28 was used to find regions of local similarity between sequences, https://blast.ncbi.nlm.nih.gov/Blast.cgi/. 20. KEGG (Kyoto Encyclopedia of Genes and Genomes),
https://www.kegg.jp/. 21. Pfam database: http://pfam.xfam.org/. 22. Blast2GO: https://www.blast2go.com/. 23. The tRNAscan-SE algorithm (version 1.23) was used for the identification of tRNA
genes, http://lowelab.ucsc.edu/tRNAscan-SE. 24. The RNAmmer algorithm was used for the identification of rRNA genes, http://www.cbs.dtu.dk/services/RNAmmer/. 25. Snoscan version 1.0 was used
for the identification of snoRNA genes, http://lowelab.ucsc.edu/snoscan/. 26. INFERNAL version 1.1.2 was used for the identification of snRNA genes, http://eddylab.org/infernal/. 27. Rfam
database release 9.1, rfam.xfam.org/. 28. miRBase release 21, www.mirbase.org/. 29. psRNATarget server; parameters: maximum expectation = 3.0, length for complementary scoring = 20 bp,
target accessibility – allowed maximum energy to unpair the target site = 25.0, flanking length around the target site for target accessibility analysis: 17 bp upstream and 13 bp downstream,
and range of central mismatch leading to translational inhibition = 9~11 bp, http://plantgrn.noble.org/psRNATarget/. 30 Rice Genome Annotation Project Database,
http://rice.plantbiology.msu.edu/. 31. agriGO server, http://bioinfor.cau.edu.cn/agiGO/. 32. BlastKOALA, http://kegg.jp/blastkoala/. 33. Repbase TE library (version released on January 31,
2014). 34. RepeatProteinMask, http://www.repeatmasker.org/RepeatProteinMask.html. 35. RepeatModeler version 1.0.10 was used for _de novo_ repeat family identification and modelling,
http://www.repeatmasker.org/RepeatModeler/. 36. RECON version 1.08. 37. RepeatScout version 1.0.5. 38. LTR_STRUC was used for the identification of LTR retrotransposons,
http://www.mcdonaldlab.biology.gatech.edu/ltr_struc.htm. 39. ClustalW was used to perform multiple sequence alignment, https://www.genome.jp/tools-bin/clustalw. 40. _Gypsy_ Database (GyDB),
http://gydb.org/. 41. Tandem Repeat Finder (TRF) version 4.07b was used to find the tandem repeats in the genome with the parameters Match = 2, Mismatch = 7, Delta = 7, PM = 80, PI = 10,
Minscore = 50, and MaxPeriod = 12, https://tandem.bu.edu/trf/trf.html. 42. RepeatMasker version 4.0.3 was used to mask the repeat sequences in the genome with the parameter -noint,
http://www.repeatmasker.org. 43. The MISA (MIcroSAtellite) identification tool was used for the identification and localization of microsatellites, http://pgrc.ipk-gatersleben.de/misa/. 44.
Trimmomatic version 0.33 was used for the quality filtering of sequencing reads, http://www.usadellab.org/cms/index.php?page=trimmomatic. REFERENCES * Wang, M. _et al_. The genome sequence
of African rice (_Oryza glaberrima_) and evidence for independent domestication. _Nat. Genet._ 46, 982–988 (2014). Article CAS PubMed PubMed Central Google Scholar * Guo, Y. L. &
Ge, S. Advances in the study of systematics and evolution of the tribe Oryzeae (Poaceae). _Acta Phytotaxon. Sin._ 44, 211–230 (2006). Article Google Scholar * Heer, O. _Flora Tertiaria
Helvetiae - Die tertiäre Flora der Schweiz_. (J. Würster & Compagnie, 1855). * Tang, L. _et al_. Phylogeny and biogeography of the rice tribe (Oryzeae): evidence from combined analysis
of 20 chloroplast fragments. _Mol. Phylogenet. Evol._ 54, 266–277 (2010). Article CAS PubMed Google Scholar * Department of Agronomy, Kwangtung Agrieultural and Forestry College. The
species of wild rice and their geographical distribution in China. _J. Genet. Genomics_ 2, 31–36 (1975). Google Scholar * The Cooperative Team of Wild Rice Resources Survey and Exploration
of China. A general survey and exploration of wild rice germplasm resources in China. _Sci. Agric. Sinica_ 17, 27–34 (1984). Google Scholar * Fan, S. G., Zhang, Z. J., Liu, L., Liu, H. X.
& Liang, C. Y. The species, geographical distribution of wild rice and their characteristics in China. _J. Wuhan Bot. Res._ 18, 417–425 (2000). Google Scholar * Ammiraju, J. S. S. _et
al_. The _Oryza_ bacterial artificial chromosome library resource: construction and analysis of 12 deep-coverage large-insert BAC libraries that represent the 10 genome types of the genus
_Oryza_. _Genome Res._ 16, 140–147 (2006). Article PubMed PubMed Central Google Scholar * Goff, S. A. _et al_. A draft sequence of the rice genome (_Oryza sativa_ L. ssp. _japonica_).
_Science_ 296, 92–100 (2002). Article ADS CAS PubMed Google Scholar * Piegu, B. _et al_. Doubling genome size without polyploidization: Dynamics of retrotransposition-driven genomic
expansions in _Oryza australiensis_, a wild relative of rice. _Genome Res._ 16, 1262–1269 (2006). Article CAS PubMed PubMed Central Google Scholar * Wu, Z. G. _et al_. _De novo_ genome
assembly of _Oryza granulata_ reveals rapid genome expansion and adaptive evolution. _Commun. Biol._ 1, 84 (2018). Article PubMed PubMed Central CAS Google Scholar * International Rice
Genome Sequencing Project. The map-based sequence of the rice genome. _Nature_ 436, 793–800 (2005). Article CAS Google Scholar * Chen, J. _et al_. Whole-genome sequencing of _Oryza
brachyantha_ reveals mechanisms underlying _Oryza_ genome evolution. _Nat. Commun._ 4, 1595 (2013). Article ADS PubMed CAS Google Scholar * Li, W. _et al_. Improved hybrid _de novo_
genome assembly and annotation of African wild rice, _Oryza longistaminata_, from Illumina and PacBio sequencing reads. _Plant Genome-US_, e20001 (2020). * Li, W. _et al_. SMRT sequencing of
the _Oryza rufipogon_ genome reveals the genomic basis of rice adaptation. _Commun. Biol._ 3, 167 (2020). * Stein, J. C. _et al_. Genomes of 13 domesticated and wild rice relatives
highlight genetic conservation, turnover and innovation across the genus _Oryza_. _Nat. Genet._ 50, 285–296 (2018). Article CAS PubMed Google Scholar * Zhang, Q.-J. _et al_. Rapid
diversification of five _Oryza_ AA genomes associated with rice adaptation. _P. Natl. Acad. Sci. USA_ 111, E4954–E4962 (2014). Article CAS Google Scholar * Vaughan, D. A. _The Wild
Relatives of Rice: A Genetic Resources Handbook_. (IRRI, 1994). * Gao, L. Z., Zhang, S. Z., Zhou, Y., Ge, S. & Hong, D. Y. A survey of the current status of wild rice in China. _Biodiv.
Sci_ 4(3), 160–166 (1996). Google Scholar * Gao, L. Z., Ge, S. & Hong, D. Y. Low levels of genetic diversity within populations and high differentiation among populations of a wild
rice, _Oryza granulata_ Nees et. Arn. ex. Watt. from China. _Int. J. Plant Sci._ 161, 691–697 (2000). Article CAS Google Scholar * Gao, L. Z. _et al_. Studies on population genetic
structure of _Oryza granulata_ Nees et. Arn. ex. Watt. from Yunnan and its _in situ_ conservation significance. _Sci. China Ser. C_, 297–302 (1999). * Zhao, Q. _et al_. Pan-genome analysis
highlights the extent of genomic variation in cultivated and wild rice. _Nat. Genet._ 50, 278–284 (2018). Article CAS PubMed Google Scholar * Porebski, S., Bailey, L. G. & Baum, B.
R. Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. _Plant Mol. Biol. Rep._ 15, 8–15 (1997). Article CAS Google Scholar
* Belton, J. M. _et al_. Hi-C: a comprehensive technique to capture the conformation of genomes. _Methods_ 58, 268–276 (2012). Article CAS PubMed Google Scholar * Loureiro, J.,
Rodriguez, E., Dolezel, J. & Santos, C. Comparison of four nuclear isolation buffers for plant DNA flow cytometry. _Ann. Bot.-London_ 98, 679–689 (2006). Article CAS Google Scholar *
Huang, H., Tong, Y., Zhang, Q. J. & Gao, L. Z. Genome size variation among and within _Camellia_ species by using flow cytometric analysis. _Plos One_ 8, e64981 (2013). Article ADS CAS
PubMed PubMed Central Google Scholar * Schnable, P. S. _et al_. The B73 maize genome: complexity, diversity, and dynamics. _Science_ 326, 1112–1115 (2009). Article ADS CAS PubMed
Google Scholar * Liu, B. H. _et al_. Estimation of genomic characteristics by analyzing _k_-mer frequency in _de novo_ genome projects. Preprint at, http://arxiv.org/abs/1308.2012v1 (2013).
* Gnerre, S. _et al_. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. _P. Natl. Acad. Sci. USA_ 108, 1513–1518 (2011). Article ADS CAS Google
Scholar * Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. _Bioinformatics_ 27, 578–579 (2011). Article CAS PubMed
Google Scholar * Luo, R. B. _et al_. SOAPdenovo2: an empirically improved memory-efficient short-read _de novo_ assembler. _GigaScience_ 1, 18 (2012). Article PubMed PubMed Central
Google Scholar * Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at, http://arxiv.org/abs/1303.3997v2 (2013). * Simao, F. A., Waterhouse, R. M.,
Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. _Bioinformatics_ 31, 3210–3212 (2015). Article
CAS PubMed Google Scholar * Grabherr, M. G. _et al_. Full-length transcriptome assembly from RNA-Seq data without a reference genome. _Nat. Biotechnol._ 29, 644–652 (2011). Article CAS
PubMed PubMed Central Google Scholar * Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. _Bioinformatics_ 21, 1859–1875 (2005).
Article CAS PubMed Google Scholar * Bolger, A. M., Usadel, B. & Lohse, M. Trimmomatic: a flexible trimmer for Illumina sequence data. _Bioinformatics_ 30, 2114–2120 (2014). Article
CAS PubMed PubMed Central Google Scholar * Burton, J. N. _et al_. Chromosome-scale scaffolding of _de novo_ genome assemblies based on chromatin interactions. _Nat. Biotechnol._ 31,
1119–1125 (2013). Article CAS PubMed PubMed Central Google Scholar * Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. _Curr.
Protoc. Bioinformatics_ Chapter 4, Unit 4.10. (2009). * Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. _Nucleic Acids Res_
32, W309–312 (2004). Article CAS PubMed PubMed Central Google Scholar * Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows
user-defined constraints. _Nucleic Acids Res_ 33, W465–W467 (2005). Article CAS PubMed PubMed Central Google Scholar * Stanke, M. _et al_. AUGUSTUS: _ab initio_ prediction of
alternative transcripts. _Nucleic Acids Res_ 34, W435–W439 (2006). Article CAS PubMed PubMed Central Google Scholar * Korf, I. Gene finding in novel genomes. _BMC Bioinformatics_ 5, 59
(2004). Article PubMed PubMed Central Google Scholar * Slater, G. S. C. & Birney, E. Automated generation of heuristics for biological sequence comparison. _BMC Bioinformatics_ 6, 31
(2005). Article PubMed PubMed Central CAS Google Scholar * Birney, E. & Durbin, R. Using GeneWise in the _Drosophila_ annotation experiment. _Genome Res_ 10, 547–548 (2000).
Article CAS PubMed PubMed Central Google Scholar * Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. _Genome Res._ 14, 988–995 (2004). Article CAS PubMed PubMed Central
Google Scholar * She, R., Chu, J. S. C., Wang, K., Pei, J. & Chen, N. S. genBlastA: enabling BLAST to identify homologous gene sequences. _Genome Res._ 19, 143–149 (2009). Article
CAS PubMed PubMed Central Google Scholar * Haas, B. J. _et al_. Improving the _Arabidopsis_ genome annotation using maximal transcript alignment assemblies. _Nucleic Acids Res_ 31,
5654–5666 (2003). Article CAS PubMed PubMed Central Google Scholar * Haas, B. J. _et al_. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to
assemble spliced alignments. _Genome Biol._ 9, R7 (2008). Article PubMed PubMed Central CAS Google Scholar * Altschul, S. F. _et al_. Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs. _Nucleic Acids Res_ 25, 3389–3402 (1997). Article CAS PubMed PubMed Central Google Scholar * Camacho, C. _et al_. BLAST plus: architecture and
applications. _BMC Bioinformatics_ 10 (2009). * Finn, R. D. _et al_. The Pfam protein families database. _Nucleic Acids Res_ 36, D281–D288 (2008). Article CAS PubMed Google Scholar *
Conesa, A. & Gotz, S. Blast2GO: a comprehensive suite for functional analysis in plant genomics. _Int. J. Plant Genomics_ 2008, 1–12 (2008). Article CAS Google Scholar * Lowe, T. M.
& Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. _Nucleic Acids Res_ 25, 955–964 (1997). Article CAS PubMed PubMed Central
Google Scholar * Lagesen, K. _et al_. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. _Nucleic Acids Res_ 35, 3100–3108 (2007). Article ADS CAS PubMed PubMed Central
Google Scholar * Lowe, T. M. & Eddy, S. R. A computational screen for methylation guide snoRNAs in yeast. _Science_ 283, 1168–1171 (1999). Article ADS CAS PubMed Google Scholar *
Griffiths-Jones, S. _et al_. Rfam: annotating non-coding RNAs in complete genomes. _Nucleic Acids Res_ 33, D121–D124 (2005). Article CAS PubMed Google Scholar * Nawrocki, E. P., Kolbe,
D. L. & Eddy, S. R. Infernal 1.0: inference of RNA alignments. _Bioinformatics_ 25, 1335–1337 (2009). Article CAS PubMed PubMed Central Google Scholar * Kozomara, A. &
Griffiths-Jones, S. miRBase: integrating microRNA annotation and deep-sequencing data. _Nucleic Acids Res_ 39, D152–D157 (2011). Article CAS PubMed Google Scholar * Kawahara, Y. _et al_.
Improvement of the _Oryza sativa_ Nipponbare reference genome using next generation sequence and optical map data. _Rice_ 6, 1–10 (2013). Article Google Scholar * Du, Z., Zhou, X., Ling,
Y., Zhang, Z. & Su, Z. agriGO: a GO analysis toolkit for the agricultural community. _Nucleic Acids Res_ 38, W64–W70 (2010). Article CAS PubMed PubMed Central Google Scholar *
Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. _J. Mol. Biol._ 428, 726–731 (2016).
Article CAS PubMed Google Scholar * Jurka, J. Repbase Update - a database and an electronic journal of repetitive elements. _Trends Genet_ 16, 418–420 (2000). Article CAS PubMed
Google Scholar * Jurka, J. _et al_. Repbase update, a database of eukaryotic repetitive elements. _Cytogenet. Genome Res._ 110, 462–467 (2005). Article CAS PubMed Google Scholar * Bao,
Z. R. & Eddy, S. R. Automated _de novo_ identification of repeat sequence families in sequenced genomes. _Genome Res_ 12, 1269–1276 (2002). Article CAS PubMed PubMed Central Google
Scholar * Price, A. L., Jones, N. C. & Pevzner, P. A. _De novo_ identification of repeat families in large genomes. _Bioinformatics_ 21, I351–I358 (2005). Article CAS PubMed Google
Scholar * McCarthy, E. M. & McDonald, J. F. LTR_STRUC: a novel search and identification program for LTR retrotransposons. _Bioinformatics_ 19, 362–367 (2003). Article CAS PubMed
Google Scholar * Larkin, M. A. _et al_. Clustal W and clustal X version 2.0. _Bioinformatics_ 23, 2947–2948 (2007). Article CAS PubMed Google Scholar * Llorens, C. _et al_. The Gypsy
Database (GyDB) of mobile genetic elements: release 2.0. _Nucleic Acids Res_ 39, D70–D74 (2011). Article CAS PubMed Google Scholar * Benson, G. Tandem repeats finder: a program to
analyze DNA sequences. _Nucleic Acids Res_ 27, 573–580 (1999). Article CAS PubMed PubMed Central Google Scholar * Thiel, T., Michalek, W., Varshney, R. K. & Graner, A. Exploiting
EST databases for the development and characterization of gene-derived SSR-markers in barley (_Hordeum vulgare_ L.). _Theor. Appl. Genet._ 106, 411–422 (2003). Article CAS PubMed Google
Scholar * _NCBI Sequence Read Archive_, https://identifiers.org/ncbi/insdc.sra:SRP189057 (2019). * _BIGD Genome Sequence Archive_, http://bigd.big.ac.cn/gsa/browse/CRA001486 (2019). * _NCBI
Assembly_, https://identifiers.org/ncbi/insdc.gca:GCA_005223365.2 (2020) * Li, W. _et al_. _Oryza meyeriana_ var. _granulata_, whole genome shotgun sequencing project. _GenBank_,
https://identifiers.org/ncbi/insdc:SPHZ02000000 (2019). * _BIGD Genome Warehouse_, http://bigd.big.ac.cn/search?dbId=gwh%26q=GWHAAKB00000000 (2019). * Shi, C. _et al_. Annotation results of
_Oryza granulata_ genome. _figshare_, https://doi.org/10.6084/m9.figshare.8191316 (2019). Download references ACKNOWLEDGEMENTS This work was supported by the Yunnan Innovation Team Project
and Natural Science Foundation of Yunnan (to L.-Z.G.) and the Natural Science Foundation of China (31501025 to Y.-L.L. and 31601045 to Q.-J.Z.). AUTHOR INFORMATION Author notes * These
authors contributed equally: Cong Shi, Wei Li, Qun-Jie Zhang. AUTHORS AND AFFILIATIONS * Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwestern China, Kunming
Institute of Botany, Chinese Academy of Sciences, Kunming, 650204, China Cong Shi, Yun Zhang, Yan Tong, Yun-Long Liu & Li-Zhi Gao * University of Chinese Academy of Sciences, Beijing,
100039, China Cong Shi * Institution of Genomics and Bioinformatics, South China Agricultural University, Guangzhou, 510642, China Wei Li, Qun-Jie Zhang, Kui Li & Li-Zhi Gao Authors *
Cong Shi View author publications You can also search for this author inPubMed Google Scholar * Wei Li View author publications You can also search for this author inPubMed Google Scholar *
Qun-Jie Zhang View author publications You can also search for this author inPubMed Google Scholar * Yun Zhang View author publications You can also search for this author inPubMed Google
Scholar * Yan Tong View author publications You can also search for this author inPubMed Google Scholar * Kui Li View author publications You can also search for this author inPubMed Google
Scholar * Yun-Long Liu View author publications You can also search for this author inPubMed Google Scholar * Li-Zhi Gao View author publications You can also search for this author inPubMed
Google Scholar CONTRIBUTIONS L.-Z.G. conceived and designed the study; C.S. contributed to the collection and preparation of the samples; C.S. and Y.T. performed the flow cytometry
experiment; W.L. and K.L. performed the genome assembly; C.S. performed RNA preparation and transcriptome sequencing; W.L. assembled and analysed the RNA-Seq data; C.S. performed the Hi-C
experiment and high-throughput sequencing; W.L. and K.L. analysed the Hi-C data; W.L., Q.-J.Z., Y.Z. and Y.-L.L. performed genome annotation; C.S. and W.L. drafted the manuscript; and
L.-Z.G. wrote and revised the manuscript. CORRESPONDING AUTHOR Correspondence to Li-Zhi Gao. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL
INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. ONLINE-ONLY TABLE RIGHTS AND PERMISSIONS
OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in
the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver
http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Shi, C., Li, W.,
Zhang, QJ. _et al._ The draft genome sequence of an upland wild rice species, _Oryza granulata_. _Sci Data_ 7, 131 (2020). https://doi.org/10.1038/s41597-020-0470-2 Download citation *
Received: 30 May 2019 * Accepted: 31 March 2020 * Published: 29 April 2020 * DOI: https://doi.org/10.1038/s41597-020-0470-2 SHARE THIS ARTICLE Anyone you share the following link with will
be able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt
content-sharing initiative