|
A rapid and accurate MinION-based workflow for tracking species biodiversity in the fieldAbstract Genetic markers (DNA barcodes) are often used to support and confirm species identification. Barcode sequences can be generated in the field using portable systems based on the Oxford Nanopore Technologies (ONT) MinION platform. However, to achieve a broader application, current proof-of-principle workflows for on-site barcoding analysis must be standardized to ensure reliable and robust performance under suboptimal field conditions without increasing costs. Here we demonstrate the implementation of a new on-site workflow for DNA extraction, PCR-based barcoding and the generation of consensus sequences. The portable laboratory features inexpensive instruments that can be carried as hand luggage and uses standard molecular biology protocols and reagents that tolerate adverse environmental conditions. Barcodes are sequenced using MinION technology and analyzed with ONTrack, an original de novo assembly pipeline that requires as few as 500 reads per sample. ONTrack-derived consensus barcodes have high accuracy, ranging from 99,8% to 100%, despite the presence of homopolymer runs. The ONTrack pipeline has a user-friendly interface and returns consensus sequences in minutes. The remarkable accuracy and low computational demand of the ONTrack pipeline, together with the inexpensive equipment and simple protocols, make the proposed workflow particularly suitable for tracking species under field conditions. 1. Introduction Recent advances in molecular biology allow the use of genetic markers (DNA barcodes) to support and confirm morphological evidence for species identification and to quantify interspecific differences in order to compare species in terms of evolutionary distance. Most barcodes are still generated using the Sanger sequencing method, which requires access to a well-equipped molecular biology laboratory. Second-generation sequencing technologies are also used for barcoding, but they depend on expensive equipment and the reads are often too short to distinguish species reliably. The third-generation sequencer Oxford Nanopore Technologies (ONT) MinION based on nanopores has proven successful for sequencing under extreme field conditions such as the tropical rainforests of Tanzania, Ecuador and Brazil [1¨C3], the hot savannah of West Africa [4], and the ice floes of Antarctica [5]. Bringing the laboratory to the field avoids the transport of samples to sequencing facilities, thus greatly reducing the analysis time and the need to export genetic material from collection sites. Although several groups have reported successful on-site barcoding, it remains difficult to perform molecular biology procedures in sub-optimal and extreme environments. In our first expeditions, the quality of sequences generated in the field was consistently lower than achieved in the laboratory, suggesting that reagents and flow cells were affected by the unstable shipping and/or environmental conditions [1]. Furthermore, a recent on-site MinION run produced a low output consisting primarily of adapter sequences, probably reflecting the deterioration of the ligation enzyme and flow cells during suboptimal storage [2]. Some groups used lyophilized reagents to overcome adverse environments [1]. However, also equipment can be affected by extreme conditions, as we found on two different expeditions to Borneo during which one of the two models of portable PCR machine we brought with us lost temperature calibration resulting in the overheating and consequent failure in barcode amplification. The identification of robust protocols and equipment that tolerates suboptimal transport and operating conditions (but remains simple, inexpensive and portable) is therefore highly desirable in order to exploit the full potential of barcode sequencing in the field. MinION-based sequencing is advantageous because it is portable, but it has a higher error rate than other methods and thus appropriate analysis workflows are therefore needed to generate high-quality barcode sequences [1,6]. High accuracy is particularly important in DNA-based taxonomy, as the threshold for intra-versus interspecific divergence of the COI gene is usually at about 2% [7] and in evolutionary ¡®young¡¯ species even lower [8]. We have previously attempted to reduce the high error rate of MinION by using more accurate 2D reads derived from the consensus of the forward and reverse strands. However, 2D sequencing kits are no longer available and have been replaced by 1D2 kits, which have yet to be optimized for amplicon sequencing. Even so, new ONT chemistries and software updates have greatly improved the throughput and 1D-read accuracy of nanopore sequencing in the last 2 years [8, 9]. Based on this reduced error rate (10¨C15%, R9.4 chemistry), several groups developed their own data analysis pipelines for barcoding, but none of the methods has yet achieved the status of ¡®the gold standard¡¯ [1,2,6,9]. Two main strategies are used to generate high-quality barcode sequences: reference-based and de novo pipelines. During the early development of nanopore sequencing, the high error rate in homopolymer runs made reference-based methods the better approach [1,2]. In a typical workflow, sequence reads are mapped to a reference sequence selected according to a priori knowledge, and the consensus sequence is ultimately determined based on the majority rule. Reference-based pipelines are useful when matching a target sequence to similar existing ones, but they struggle to reconstruct an accurate barcode if the organism of interest has not been sequenced before. Notably, if the target species carries an insertion compared to the reference species, the additional nucleotides are not included in the final consensus sequence [2]. Unlike the reference-based approach, de novo assembly pipelines rely only on the newly-generated reads. Therefore, they suffer more sequencing errors, especially if they are distributed in a nonrandom manner, and ad hoc error correction methods are needed to generate the barcodes using de novo assembly [2]. Recently, hybrid methods incorporating aspects of both approaches have been described [1,6]. One example is our ONtoBAR pipeline [1]. This creates a draft consensus sequence by assembling MinION reads de novo and uses the draft to retrieve the most similar sequence from the NCBI nt database, allowing the final consensus to be generated. Given the assumption that closely-related species differ mainly due to the accumulation of single-nucleotide polymorphisms (SNPs) rather than insertion/deletion polymorphisms (INDELs) that can generate frameshifts, the pipeline uses the reference sequence as a scaffold, allowing the correction of mismatches derived from MinION errors. Another hybrid method known as the aacorrectionpipeline [6] is based on similar principles, in that a draft consensus sequence is used to recover matching sequences from the NCBI nt database. These are used to determine the correct reading frame, and generic bases (N) are introduced into the MinION-derived consensus in order to preserve amino acid assignments. A recent study compared reference-based and de novo approaches, finding that the de novoapproach was more accurate because the reference-based approach can introduce bias by missing INDELs [2]. However, the filtering step in the proposed pipeline relied on quality scores (Q-scores) that are often recalibrated after basecaller updates, making the results strongly dependent on the sequencing chemistry and the basecaller version. To fully exploit the potential of barcoding in the field, the proof-of-principle workflows reported thus far must be translated into standardized systems allowing on-site sequencing by professional users. Our involvement in conservation projects has motivated us not only to continuously improve the analytical precision of the pipeline in order to track biodiversity at the species level more accurately, but also to identify simple, rapid and inexpensive protocols. Here we demonstrate the results achieved using an updated barcoding workflow that features improvements both to the molecular biology field laboratory components and the subsequent data analysis. 2. Materials and Methods 2.1 Portable genomics laboratory The portable genomics laboratory included the following equipment: three micropipettes (P1000, P200 and P20, Eppendorf), a mini-microcentrifuge (Labnet Prism Mini Centrifuge, Labnet), a thermal cycler (MiniOne PCR System, MiniOne), an electrophoresis system (MiniOne Electrophoresis System, MiniOne), a fluorometer (Qubit 2.0, Thermo Fisher Scientific), the nanopore sequencer (MinION, ONT) and an ASUS laptop (i7 processor, 16 GB RAM, 500 GB SSD) (Figure 1). The equipment was wrapped in air-bubble packaging, transported in a single Peli case (55¡Á45¡Á20 cm) (Figure 1) and checked as standard hold baggage in domestic and international flights (except the laptop, which was carried in the cabin). Standard molecular biology reagents were selected and used as described below. Reagents that required storage at 4 ¡ãC or ¨C20 ¡ãC were transported in a foam box containing ice packs, and MinION flow cells were stored in a thermal bag in the same box. PCR primers were transported lyophilized and subsequently resuspended in 10 mM Tris-HCl (pH 8.0) supplemented with 1 mM EDTA and kept at room temperature.
3. Results 3.1 COI barcode sequencing To perform barcode sequencing in the field, the portable genomics laboratory we previously described [1] was optimized further to include equipment and reagents with greater stability and better performance in tropical environments (up to 35¡ãC and 90% humidity) after transport on standard domestic and international flights. Currently, the laboratory comprises seven portable devices that can be fitted in one standard luggage item with dimensions of 55¡Á45¡Á20 cm (Figure 1). After collecting two snails and five insects during a workshop held by Taxon Expeditions (https://taxonexpeditions.com/) at the Ulu Temburong National Park (Borneo, Brunei) in October 2018, we dissected the tissue and extracted DNA. PCR products obtained by amplifying ∼710 bp of the COI gene were sequenced in the field using the MinION device with R9.4 sequencing chemistry. The MinION flow cell showed 995 active pores during the pre-run quality control (starting from 1005 on delivery by the manufacturer) and produced 600,000 reads in 3.5 h. Raw fast5 reads were basecalled, demultiplexed and trimmed offline, resulting in 9,000¨C77,000 reads per sample (Table 1). When we returned to Europe, the same genomic fragments were amplified and sequenced from the same DNA extracts using the Sanger method to evaluate the accuracy of the MinION-based barcoding pipeline.
4. Discussion We have described the implementation of a new workflow for barcoding in the field, from DNA extraction to the generation of consensus sequences. The selected protocols allowed the extraction of DNA from tiny snail-tissue biopsies and from whole beetles after cutting the abdomen to release soft tissues, as required to preserve the integrity of the specimens for detailed morphological evaluation. PCR products were successfully obtained despite the transport of our equipment in a standard Peli case and the storage of molecular biology reagents in local fridges and freezers powered for only 10 h per day. The MinION flow cells, which were not adversely affected by the transportation and storage conditions, retained most of their active pores and produced a good number of reads in a few hours. These results indicate that the molecular biology field laboratory workflow was robust, allowing us to barcode organisms at the collection site even under adverse environmental conditions (in this case a rainforest characterized by high temperatures and humidity). On the software side, the new bioinformatics pipeline allowed us to analyze MinION reads using open-source and custom-developed scripts that run locally on a Linux Virtual Machine. The sequencing and data analysis could therefore be combined on a standard Windows laptop with a user-friendly interface. Most importantly, the improvements addressed some of the weaknesses of earlier pipelines, such as their dependence on sequence databases and Q-score calibration. The ONTrack pipeline works with as few as ∼500 reads per sample and achieves high accuracy when applied to MinION sequencing data obtained from COI barcode amplicons. Moreover, starting from processed MinION reads, the ONTrack pipeline returns consensus sequences in a few minutes, making it particularly suitable for work in the field. The residual error rate in our consensus sequences never exceeded ∼0.2%. The proposed workflow can therefore be considered as a powerful tool for species identification given that most species pairs show sequence divergence exceeding 2% [7]. Further improvements may be achieved thanks to the software and chemistry enhancements regularly provided by ONT. A new flip-flop basecalling algorithm (https://github.com/nanoporetech/flappie) was recently implemented in the Guppy production basecaller and it should further reduce the error rate, albeit at the expense of basecalling time. A new sequencing chemistry (R10) will be released soon, increasing the accuracy especially in homopolymer runs and thus bringing on-site sequencing ever closer to the quality of Sanger analysis. Sequencing and basecalling currently remain the most time-consuming steps in the pipeline, but both the hardware and software solutions provided by ONT are likely to become much more agile in the near future. Indeed, ONT recently released MinIT, a rapid analysis and device-control accessory for nanopore sequencing that connects to the MinION sequencer and performs GPU-accelerated and real-time basecalling. Moreover, the Medaka tool (https://github.com/nanoporetech/medaka) is expected to create polished consensus sequences faster than Nanopolish because it starts from basecalled data rather than raw signals. Finally, new MinION flow cells (Flongle) were recently made available and these are suitable for experiments that do not require a massive throughput, thus substantially reducing sequencing costs for small datasets. Because the ONTrack pipeline provides high-quality results with as few as ∼500 reads per sample (0.35 Mbp), multiple samples could be multiplexed in a single run and still fit Flongle specifications (1 Gbp) further reducing the cost. Considering a multiplex of 12 samples in a Flongle run, currently the maximum supported by standard ONT kits, we estimated a cost of about 30 € per sample to generate a barcode sequence with the workflow described herein. This is not far from the costs of standard Sanger sequencing (∼15 € per sample when sequencing both strands, without considering the extra shipment costs). Remarkably, the entire portable genomics laboratory described in this article can be acquired with a modest budget of 6000 €, compared to ∼80,000 € for a Sanger sequencer (ABI capillary). Dedicated, expert personnel are required to run the latter instrument, whereas the MinION sequencer is very simple and requires no special training. An additional significant advantage is that, unlike other sequencing technologies, the real-time MinION device does not require the number of sequenced reads to be set before the experiment begins. Therefore, the sequencing run can be stopped at any time when the necessary number of reads has been generated, achieving further cost and time savings. Author Contributions Conceptualization, M.D. and M.S.; methodology, S.M., E.C., M.M., M.R. and M.D.; software, S.M.; validation, S.M. and E.C.; formal analysis, S.M. and E.C.; investigation, E.C., M.R., M.P., H.F., M.A., I.N., L.M., M.S., F.S., J.G.; writing¡ªoriginal draft preparation, S.M., M.R. and M.D.; writing¡ªreview and editing, M.R. and M.D.; visualization, S.M.; supervision, M.R. and M.D.; project administration, M.R. and M.D.; funding acquisition, M.D., M.S. and I.N. Funding This research received no external funding Conflicts of Interest The authors declare no conflict of interest Acknowledgments We gratefully acknowledge the Ulu Temburong National Park (Brunei, Borneo) for permission to conduct research in the field; export of biological materials was done under permit BioRIC/HoB/TAD/51 from the Ministry of Primary Resources and Tourism, Brunei Darussalam. We thank Davide Canevazzi for the support in bioinformatic analysis. Footnotes
References 1. 1.↵ Menegon M; Cantaloni C; Rodriguez-Prieto A; Centomo C; Abdelfattah A; Rossato M; Bernardi M; Xumerle L;Loader S; Delledonne M. On site DNA barcoding by nanopore sequencing. PLoS ONE 2017, 12. 2. 2.↵ Pomerantz A; Peñafiel N; Arteaga A; Bustamante L; Pichardo F; Coloma LA; Barrio-Amor¨®s CL; Salazar-Valenzuela D;Prost S. Real-time DNA barcoding in a rainforest using nanopore sequencing: opportunities for rapid biodiversity assessments and local capacity building. Gigascience 2018, 7. 3. 3.↵ Faria NR; Quick J; Claro IM; Th¨¦z¨¦ J; de Jesus JG; Giovanetti M; Kraemer MUG; Hill SC; Black A; da Costa AC, et al.Establishment and cryptic transmission of Zika virus in Brazil and the Americas. Nature 2017, 546, 406¨C410. 4. 4.↵ Quick J; Loman N; Duraffour S; Simpson JT; Severi E; Cowley L; Bore JA; Koundouno R; Dudas G; Mikhail A, et al.Real-time, portable genome sequencing for Ebola surveillance. Nature 2016, 530, 228¨C232. 5. 5.↵ Edwards A; Debbonaire AR; Nicholls SM; Rassner SME; Sattler B; Cook JM; Davy T; Soares AR; Mur LAJ; Hodson AJ.In-field metagenome and 16S rRNA gene amplicon nanopore sequencing robustly characterize glacier microbiota. bioRxiv2019, doi: https://doi.org/10.1101/073965. 6. 6.↵ Srivathsan A; Baloğlu B; Wang W; Tan WX; Bertrand D; Ng AHQ; Boey EJH; Koh JJY; Nagarajan N; Meier R. A MinION™-based pipeline for fast and cost-effective DNA barcoding. Mol Ecol Resour 2018. 7. 7.↵ Hebert PDN; Ratnasingham S; deWaard JR. Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proc Biol Sci 2003, 270, S96¨CS99. CrossRefPubMedWeb of ScienceGoogle Scholar 8. 8.↵ Freitag H; Kodaka J. A taxonomic review of the genus Ancyronix Erichson, 1847 from Sulawesi (Insecta: Coleoptera: Elmidae). Journal of Natural History 2017, 51, 561¨C606. 9. 9.↵ Krehenwinkel H; Pomerantz A; Henderson JB; Kennedy SR; Lim JY; Swamy V; Shoobridge JD; Patel NH; Gillespie RG;Prost S. Nanopore sequencing of long ribosomal DNA amplicons enables portable and simple biodiversity assessments with high phylogenetic resolution across broad taxonomic scale. Gigascience 2019, doi: 10.1093/gigascience/giz006. 10. 10.↵ Freitag H. Adaptation of an Emergence Trap for Use in Tropical Streams. International Review of Hydrobiology 2004,89, 363¨C374. 11. 11.↵ Folmer O; Black M; Hoeh W; Lutz R; Vrijenhoek R. DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Mol Mar Biol Biotechnol. 1994, 3, 294¨C299. 12. 12.↵ Hebert PDN; Penton EH; Burns J; Janzen DH; Hallwachs W. Ten species in one: DNA barcoding reveals cryptic species in the neotropical skipper butterfly, Astraptes fulgerator. Proc Nat Acad Sci USA 2004, 101, 14812¨C14817. Abstract/FREE Full TextGoogle Scholar 13. 13.↵ Rognes T; Flouri T; Nichols B; Quince C; Mah¨¦ F. VSEARCH: a versatile open source tool for metagenomics. PeerJ2016, 4. 14. 14.↵ Katoh K; Misawa K; Kuma K; Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002, 30, 3059¨C3066. CrossRefPubMedWeb of ScienceGoogle Scholar 15. 15.↵ Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 2018, 191. 16. 16.↵ Li H; Handsaker B; Wysoker A; Fennell T; Ruan J; Homer N; Marth G; Abecasis G; Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics 2009, 25, 2078¨C2079. CrossRefPubMedWeb of ScienceGoogle Scholar 17. 17.↵ Altschul SF; Gish W; Miller W; Myers EW; Lipman DJ. Basic local alignment search tool. J. Mol. Biol 1990, 215, 403¨C410. CrossRefPubMedWeb of ScienceGoogle Scholar 18. 18.↵ Kang AR; Kim MJ; Park IA; Kim KY; Kim I. Extent and divergence of heteroplasmy of the DNA barcoding region in Anapodisma miramae (Orthoptera: Acrididae). Mitochondrial DNA A DNA Mapp Seq Anal. 2016, 27, 3405¨C3414. 19. 19.↵ Meza-L¨¢zaro RN; Poteaux C; Bayona-V¨¢squez NJ; Branstetter MG; Zald¨ªvar-River¨®n A. Extensive mitochondrial heteroplasmy in the neotropical ants of the Ectatomma ruidum complex (Formicidae: Ectatomminae). Mitochondrial DNA A DNA Mapp Seq Anal 2018, 29, 1203¨C1214.
|