Sequencing Environmental Samples

Sequencing Environmental Samples

The development and improvement of high-throughput sequencing has revolutionised methods of studying microbial communities in environmental samples. Sequencing-based methods have enabled a direct study of microbial diversity, functional potential and activity in complex microbiomes like soil, water, plant rhizospheres and host-associated microbiomes, as opposed to a minor portion of the environmental microbes captured by cultivation. Among such techniques, amplicon sequencing, shotgun metagenomic sequencing and metatranscriptomics constitute three conceptually different methods that vary in their targets, resolution and biological questions to which they can be applied.

Amplicon sequencing is the most commonly used technique in the past to profile environmental microbiomes. It is based on PCR amplification of highly conserved phylogenetic marker genes (the 16S rRNA gene of bacteria and archaea, the ITS region of fungi or the 18S rRNA gene of eukaryotes). Through sequencing such amplified fragments, scientists are able to make an inference regarding the taxonomic makeup of a microbial community. Amplicon sequencing is popular because it has gained strength due to its low cost, scalability and methodological maturity. Thousands of samples can be multiplexed in a single run using platforms like Illumina MiSeq, and the approach is especially appealing when large ecological studies or longitudinal studies are required. 16S rRNA gene sequencing in health research. In health research, the 16S rRNA gene sequencing has formed the foundation of human microbiome research, providing the ability to compare gut, oral or skin microbial communities among populations and disease conditions. This has been largely applied in agriculture to compare soil or rhizosphere microbiomes in relation to various crops, fertilisation regimes or environmental stresses.

Amplicon sequencing is a limited perspective of microbial communities despite its usefulness. Its taxonomic resolution is frequently restricted to a genus level, because it only works on a small region of one gene, and species very closely related cannot be distinguished. The dependence on PCR adds another level of bias because the mismatch of primers, the difference in the amplification efficiency and the occurrence of chimera may alter the way the apparent community structure would look. In addition, functional genes are not measured directly by amplicon sequencing. Any conclusion regarding metabolic capacity has to be generalised based on taxonomic identity, a method which is deceiving in an environment where horizontal gene transfer or strain-level variation is prevalent. Such shortcomings have motivated more users to follow shotgun metagenomic sequencing, especially where functional understanding is needed.

Shotgun metagenomics does not require any amplification in specific regions; rather, it sequences all the DNA that has been extracted from an environmental sample. This is an untargeted method that isolates genetic material in bacteria, archaea, viruses, and microbial eukaryotes in vivo. With shotgun metagenomics, it is possible not only profile taxonomically in high-resolution but also to directly identify functional genes, including metabolic, nutrient cycling, virulence and antimicrobial resistance genes. It has repeatedly been demonstrated in comparative studies that shotgun sequencing identifies much higher taxonomic richness than amplicon-based methods, frequently including rare or surprising taxa that are not identified by primer-based methods. Shotgun metagenomics has become an effective instrument in the field of clinical microbiology to monitor the presence of pathogen resistance profiles and to study the resistome, whereas in agriculture and environmental science, it has been applied in the characterisation of the soil functional potentiality, the microbial ecology of wastewater, and the biogeochemical processes of marine ecosystems.

Strengths of shotgun metagenomics are acquired at the expense of added complexity. Since the sequencing of all DNA is performed randomly, host or plant DNA can dominate libraries made of host-associated or host-enriched samples, requiring either depletion techniques applied in the laboratory or large-scale computational filtering. The technique also needs greater depth of sequencing and additional delicate bioinformatics procedures than amplicon sequencing, especially where assembly and genome binning are conducted. However, shotgun sequencing has become one of the most important tools in microbial ecology in the present day by enabling the possibility of reconstituting metagenome-assembled genomes and attributing functions to particular taxa.

Both amplicon and shotgun sequencing explain the genetic capability of a sample, but it is not able to distinguish between active and inactive organisms. The gap is filled by metatranscriptomics, where the sequence of RNA is sequenced as opposed to DNA. Metatranscriptomics targets messenger RNA, thereby capturing those genes that are in the process of transcription at the time of sampling, providing a picture of the microbial activity and metabolic state. This method is especially useful in the study of microbial community response to environmental manipulations, e.g. nutrient addition, stress or host interactions. Metatranscriptomic surveys of the rhizosphere in agricultural systems have found that soil amendments can change the expression of microbial genes, increase beneficial plant-associated activities and reduce pathogenicity. Metatranscriptomics has demonstrated the ability to identify an association between inflammatory responses and disease pathogenesis in health research associated with microbial activity.

Nevertheless, metatranscriptomics is not a simple technique. RNA is a naturally fragile molecule that can easily be destroyed, and so the samples have to be preserved immediately and handled with utmost care. Ribosomal RNA comprises a majority of total RNA pools, and efficient rRNA depletion is important to enrich informative mRNA. Poly A selection is not dependable in microbial communities, unlike eukaryotic transcriptomics, because the mRNAs of bacteria do not have poly A tails. Such issues, in addition to increased depths of sequencing and the increased complexity of data analysis, have restricted the use of metatranscriptomics in comparison to methods based on DNA. However, in case functional activity, as opposed to presence, is the object of research, metatranscriptomics offers an incomparable and invaluable perspective.

In all three methodologies, the selection of the sequencing platform and analytic pipeline has an influence on the data that is obtained. Illumina short-read sequencing is the prevailing technology because of its precision and scale, but long-read technology like PacBio and Oxford Nanopore are progressively being used in order to acquire full-length marker genes or enhance metagenomic assemblies. With bioinformatic analysis, it usually starts with processes of quality control and contaminant elimination, followed by taxonomic classification, and functional annotation or transcript quantification, depending on the analytical technique. The completeness of the reference database is also a constraint, especially for poorly studied environments, in the sense that a fraction of the sequences will not be classified.

Practically, these sequencing strategies cannot be considered as competitors but rather complementary tools. Amplicon sequencing is an economical access point to the large-scale survey of a community, shotgun metagenomics is a powerful tool to gain a complete overview of taxonomic diversity and functional potential, and metatranscriptomics quantifies which of that potential is realised in the environment. They combined to create a methodological continuum, which enables researchers to pass through who is there questions to what they could do questions and to lastly what they are doing to complex environmental systems.