Striped unifrac: enabling microbiome analysis at unprecedented scale

Nature

Select a language for the TTS:
UK English Female
UK English Male
US English Female
US English Male
Australian Female
Australian Male
Language selected: (auto detect) - EN

$Striped unifrac: enabling microbiome analysis at unprecedented scale$

Play all audios:

Access through your institution Buy or subscribe TO THE EDITOR — The UniFrac metric is used frequently in microbiome research, but it does not scale to today’s large datasets. We propose a

new algorithm, Striped UniFrac, which produces results identical to those of previous algorithms but requires dramatically less memory and computing power. A BSD-licensed implementation is

available that produces a C shared library linkable by any programming language (Supplementary Software and https://github.com/biocore/unifrac). UniFrac1 is a phylogenetic distance metric

used to compare pairs of microbiome profiles. Microbiome studies now encompass tens of thousands of samples, such as the 27,751-sample Earth Microbiome Project (EMP)2 and the 15,096-sample

American Gut Project3. Existing algorithms for UniFrac computation cannot scale in time or space to these study designs. For example, Fast UniFrac with the EMP was projected to take months.

Striped UniFrac produces results identical to those of other existing algorithms, shows >30-fold improvement in single-threaded performance and near-linear parallel scaling (Supplementary

Fig. 1a,b), and can process the EMP dataset on a laptop in less than 24 hours. It can enable scientists to derive new biological insights, as shown by a meta-analysis3 of the American Gut

Project and EMP. To demonstrate the utility of the algorithm, we computed UniFrac on 113,721 public samples in Qiita4 in less than 48 hours using 256 CPUs (an interactive plot is available

at https://bit.ly/2LHMDFC). This is a preview of subscription content, access via your institution ACCESS OPTIONS Access through your institution Access Nature and 54 other Nature Portfolio

journals Get Nature+, our best-value online-access subscription $32.99 / 30 days cancel any time Learn more Subscribe to this journal Receive 12 print issues and online access $259.00 per

year only $21.58 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy now Prices may be subject to local taxes which are calculated

during checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer support DATA AVAILABILITY The datasets analyzed during the

current study are available in the Qiita repository with the specific study accessions in Supplementary Data 1, and were extracted with Qiita’s redbiom interface. REFERENCES * Lozupone, C.

& Knight, R. _Appl. Environ. Microbiol._ 71, 8228–8235 (2005). Article CAS Google Scholar * Thompson, L. R. et al. _Nature_ 551, 457–463 (2017). Article CAS Google Scholar *

McDonald, D. et al. _mSystems_ 3, e00031-18 (2018). Article Google Scholar * Gonzalez, A. et al. _Nat. Methods_ 15, 796–798 (2018). Article CAS Google Scholar * Caporaso, J. G. et al.

_Nat. Methods_ 7, 335–336 (2010). Article CAS Google Scholar * Chang, Q., Luan, Y. & Sun, F. _BMC Bioinformatics_ 12, 118 (2011). Article Google Scholar * Chen, J. et al.

_Bioinformatics_ 28, 2106–2113 (2012). Article CAS Google Scholar * McMurdie, P. J. & Holmes, S. _PLoS One_ 8, e61217 (2013). Article CAS Google Scholar * Amir, A. et al.

_mSystems_ 2, e00191-16 (2017). Article Google Scholar Download references ACKNOWLEDGEMENTS This work was supported by the NSF (grant DBI-1565100 to D.M., Y.V.-B., Z.X., A.G., and R.K.;

award 1664803 to D.K and J.M.), the Alfred P. Sloan Foundation (G-2017-9838 to D.M., Y.V.-B., A.G., and R.K.; G-2015-13933 to A.G. and R.K.), ONR (grant N00014-15-1-2809 to D.M., A.G., and

R.K.), and NIH–NIDDK (grant P01DK078669 to A.G. and R.K.). This work was partially supported by XSEDE resource grant BIO150043. Additional support was provided by CRISP, one of six centers

in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA. AUTHOR INFORMATION Author notes * Nicolai Reeve Present address: Biota Technology Inc., La Jolla, CA, USA

AUTHORS AND AFFILIATIONS * Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA Daniel McDonald, Yoshiki Vázquez-Baeza, Nicolai Reeve, Zhenjiang Xu, Antonio

Gonzalez & Rob Knight * Mathematics Department, Oregon State University, Corvallis, OR, USA David Koslicki & Jason McClelland * Department of Computer Science and Engineering,

University of California, San Diego, La Jolla, CA, USA Rob Knight * Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA Rob Knight * Department of

Bioengineering, University of California, San Diego, La Jolla, CA, USA Rob Knight Authors * Daniel McDonald View author publications You can also search for this author inPubMed Google

Scholar * Yoshiki Vázquez-Baeza View author publications You can also search for this author inPubMed Google Scholar * David Koslicki View author publications You can also search for this

author inPubMed Google Scholar * Jason McClelland View author publications You can also search for this author inPubMed Google Scholar * Nicolai Reeve View author publications You can also

search for this author inPubMed Google Scholar * Zhenjiang Xu View author publications You can also search for this author inPubMed Google Scholar * Antonio Gonzalez View author publications

You can also search for this author inPubMed Google Scholar * Rob Knight View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS D.M. designed

Striped UniFrac, planned the study, analyzed data, and wrote the manuscript. Y.V.-B. integrated Striped UniFrac with QIIME 2 and contributed to the manuscript. D.K. and J.M. contributed to

the proof. N.R. contributed language interface code. Z.X. contributed to the manuscript. A.G integrated Striped UniFrac with Qiita. R.K. planned the study and wrote the manuscript.

CORRESPONDING AUTHOR Correspondence to Rob Knight. ETHICS DECLARATIONS COMPETING INTERESTS R.K. is a founder and CSO of Biota Technology Inc. D.M. is a consultant with Biota Technology Inc.

INTEGRATED SUPPLEMENTARY INFORMATION SUPPLEMENTARY FIGURE 1 PARALLEL SCALING AND HEURISTIC CORRELATIONS. (A-B) Walltime and memory distributions of independent processes operating on the

full Earth Microbiome Project dataset (_n_ = 26,181) executing on shared compute nodes. An individual partition represents a single independent process, and each process was run with two

threads; 32 partitions indicates 32 processes using two threads each. A higher partition count means each individual process is doing less work. Box plots show the median, whiskers are 1.5

times the proportion of the interquartile range past the 25th and 75th percentiles; the number of data points in each box plot is the number of partitions in the processing run. (C) An

empirical assessment of the number of proportion vectors required to be retained in memory over increasing tree sizes. This assessment was performed by randomly sampling tips from the

Greengenes 99% OTU tree, and counting the maximum number of nodes required to hold proportion vectors resident in memory. Box plots show the median, whiskers are 1.5 times the proportion of

the interquartile range past the 25th and 75th percentiles; each box plot represents 10 independent experiments. (D) Empirical assessment of the runtime of Striped UniFrac for 1,024 samples

over increasing numbers of tips in a phylogeny. (E) Mantel tests (Pearson) between Striped UniFrac in exact mode, which produces identical results to UniFrac, versus fast mode, in which the

UniFrac distances are not computed at the tips of the tree during traversal. Each data point represents _n_ = 10 random subsets (independent experiments) of the Earth Microbiome Project

Deblur 90-nt dataset, with the mean R2 value depicted. Error bars are 95% CI around the mean. The figure data can be found in Supplementary Data 3. SUPPLEMENTARY INFORMATION SUPPLEMENTARY

TEXT AND FIGURES Supplementary Figure 1 and Supplementary Note 1 REPORTING SUMMARY SUPPLEMENTARY DATA 1 table_s1.xlsx, the Qiita study accessions used. SUPPLEMENTARY DATA 2

figure1-data.xlsx, the data necessary to re-create panels C and D in Fig. 1. SUPPLEMENTARY DATA 3 figureS1-data.xlsx, the data necessary to re-create Supplementary Fig. 1. SUPPLEMENTARY

SOFTWARE Supplementary SoftwareUnifrac.tar.gz, the version of UniFrac used in the study. RIGHTS AND PERMISSIONS Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE McDonald, D.,

Vázquez-Baeza, Y., Koslicki, D. _et al._ Striped UniFrac: enabling microbiome analysis at unprecedented scale. _Nat Methods_ 15, 847–848 (2018). https://doi.org/10.1038/s41592-018-0187-8

Download citation * Published: 30 October 2018 * Issue Date: November 2018 * DOI: https://doi.org/10.1038/s41592-018-0187-8 SHARE THIS ARTICLE Anyone you share the following link with will

be able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt

content-sharing initiative

Latest News

Striped unifrac: enabling microbiome analysis at unprecedented scale

Access through your institution Buy or subscribe TO THE EDITOR — The UniFrac metric is used frequently in microbiome res...
Hypoxia-induced mir-210 modulates the inflammatory response and fibrosis upon acute ischemia

ABSTRACT Hypoxia-induced miR-210 is a crucial component of the tissue response to ischemia, stimulating angiogenesis and...
Molecular spikes: a gold standard for single-cell rna counting

ABSTRACT Single-cell sequencing methods rely on molecule-counting strategies to account for amplification biases, yet no...
Nelsen: new zealand should join asian confederation

"I think New Zealand should go the whole hog and apply to follow Australia into the Asian confederation," defe...
UR Ananthamurthy : Here lies the man who changed Bangalore’s name

"If I am a writer," said UR Ananthamurthy once in an interview, "it is because of the memories of my youth. It is like a...
An ideally designed deep trust network model for heart disease prediction based on seagull optimization and Ruzzo Tompa algorithm

Download PDF Article Open access Published: 19 February 2025 An ideally designed deep trust network model for heart dise...
Efficient tuberculosis treatment in mice using chemotherapy and immunotherapy with the combined dna vaccine encoding ag85b, mpt-64 and mpt-83

ABSTRACT Although most cases of tuberculosis (TB) can be cured with antibiotics, relapse is common if patients do not co...