Evolution of DNA Sequencing Technologies

Evolution of DNA Sequencing Technologies

The modern story of DNA sequencing began in 1977, when Frederick Sanger introduced the dideoxy chain termination method, the first practical approach to reading the genetic code. At its core, Sanger sequencing relied on an elegant biochemical principle. During DNA synthesis, specially modified nucleotides known as dideoxynucleotides lack a 3′ hydroxyl group. When incorporated by DNA polymerase, these nucleotides terminate chain elongation. By performing parallel reactions and separating the resulting DNA fragments by size using electrophoresis, researchers could infer the DNA sequence one base at a time. Sanger’s first major achievement was the sequencing of the approximately 5,000 base pair genome of bacteriophage φX174, demonstrating for the first time that entire genomes could be decoded.

Over the following decades, Sanger sequencing was refined through the introduction of fluorescently labelled terminators and automated capillary electrophoresis systems, particularly those developed by Applied Biosystems in the late 1980s. These improvements greatly increased throughput and reproducibility, enabling large-scale projects such as the Human Genome Project, which was completed in 2003. Despite its extraordinary accuracy, approaching 99.99% per base, Sanger sequencing remained slow, labour-intensive, and prohibitively expensive for large-scale genomics, with costs reaching millions of dollars per human genome. These limitations ultimately necessitated a radical technological shift.

That shift arrived in 2005 with the introduction of 454 pyrosequencing by Roche, marking the birth of next-generation sequencing. This technology abandoned electrophoresis entirely and adopted massively parallel sequencing by synthesis. DNA fragments were clonally amplified on microscopic beads and sequenced by detecting light emitted during pyrophosphate release each time a nucleotide was incorporated. Although individual reads were relatively short, typically between 100 and 200 base pairs, hundreds of thousands of fragments could be sequenced simultaneously. Compared with Sanger sequencing, both cost and time dropped by orders of magnitude, transforming sequencing from a multi-year endeavour into a process that could be completed in days. This acceleration enabled early microbial genomics and cancer exome studies. However, pyrosequencing struggled with homopolymer regions and carried a relatively high per-base cost, limiting its long-term adoption.

Building on the momentum of next-generation sequencing, Illumina, then known as Solexa, commercialised its sequencing by synthesis platform between 2006 and 2007. Illumina’s approach combined massive parallelisation with exceptional accuracy by sequencing millions of DNA clusters immobilised on a glass flow cell. Each sequencing cycle incorporated one of four fluorescently labelled reversible terminator nucleotides, which were detected by optical imaging before the terminator was cleaved to allow the next cycle. Early Illumina systems produced short reads of around 30 to 75 base pairs, but rapid technological advances soon extended read lengths to 100 to 150 base pairs and beyond. Crucially, Illumina sequencing achieved very low error rates dominated by substitution errors, making it highly reliable for quantitative genomics. The platform scaled rapidly from the first Genome Analyser to later systems such as MiSeq, HiSeq, NextSeq, and NovaSeq. Key milestones included the HiSeq X Ten system in 2014, which enabled human genomes to be sequenced for approximately one thousand dollars, and the NovaSeq series launched in 2017, which pushed output into the multi-terabase range. These developments firmly established short-read sequencing as the backbone of modern genomics.

In parallel with Illumina’s rise, alternative chemistries were explored. In 2007, Applied Biosystems introduced the SOLiD platform, which used a ligation-based sequencing strategy rather than synthesis. DNA was sequenced through iterative ligation of fluorescently labelled oligonucleotide probes, with each base encoded twice using a two-base encoding scheme. This redundancy produced very high accuracy, but the platform suffered from short read lengths and a complex workflow. Although SOLiD demonstrated the feasibility of high-accuracy parallel sequencing, it remained a niche technology compared with Illumina’s more flexible sequencing-by-synthesis approach.

Another innovation emerged in 2010 with Ion Torrent’s semiconductor sequencing technology. By measuring pH changes caused by hydrogen ion release during nucleotide incorporation, Ion Torrent eliminated the need for optical detection entirely. DNA fragments amplified on beads were sequenced in microwells equipped with semiconductor sensors that directly converted chemical signals into digital data. This design enabled compact instruments, relatively low costs, and rapid run times measured in hours rather than days. Early Ion Torrent systems produced modest data outputs and struggled with homopolymer accuracy, but they demonstrated that sequencing could be fast, accessible, and decoupled from complex optical systems.

A more profound conceptual leap occurred around 2011 with the introduction of single-molecule real-time sequencing by Pacific Biosciences. Unlike previous approaches, this method directly observed individual DNA polymerase molecules synthesising DNA in real time within nanophotonic structures known as zero-mode waveguides. Fluorescently labelled nucleotides emitted brief flashes of light as they were incorporated, allowing sequencing without prior amplification. This approach yielded unprecedented read lengths, often exceeding tens of kilobases, but early implementations suffered from high single-pass error rates. The development of circular consensus sequencing transformed the technology by enabling repeated sequencing of circularised DNA templates, producing highly accurate HiFi reads with accuracies exceeding 99.9%. These reads combined long length with high accuracy, revolutionising genome assembly, structural variant detection, haplotype phasing, and full-length transcript sequencing. Continued improvements in throughput and instrumentation have made long-read, high-accuracy sequencing increasingly practical.

In 2014, Oxford Nanopore Technologies introduced a radically different sequencing paradigm. Instead of synthesising DNA, nanopore sequencing measures changes in electrical current as single DNA or RNA molecules pass through a biological nanopore embedded in a membrane. Each nucleotide sequence produces a characteristic disruption in current, which is decoded computationally. This method has no theoretical read length limit, enabling ultra-long reads that can span hundreds of kilobases or even megabases. Early nanopore reads exhibited lower accuracy than short read technologies, but continuous improvements in pore chemistry, motor proteins, and machine learning based basecalling have dramatically improved performance. Duplex sequencing, which reads both strands of a DNA molecule, now allows nanopore data to approach short read accuracy while retaining its unique advantages. Real-time data streaming, portability through devices such as the MinION, and direct detection of DNA and RNA modifications have enabled applications ranging from field-based outbreak surveillance to epigenomics.

Between 2015 and 2016, BGI Group entered the sequencing market with platforms based on DNA nanoball technology and combinatorial probe anchor synthesis. By circularising DNA fragments and amplifying them into compact nanoballs, BGI created highly uniform arrays that reduced amplification bias and index hopping. Sequencing by synthesis produced short paired-end reads with accuracy comparable to Illumina. Systems such as the BGISEQ 500 and later DNBSEQ instruments offered high throughput at competitive cost, supporting large-scale population genomics and clinical screening applications, particularly non-invasive prenatal testing.

The most recent chapter in this evolution emerged in 2022 with the launch of Element Biosciences’ AVITI system. AVITI introduced avidity sequencing, a novel variant of sequencing by synthesis in which nucleotide binding and extension are separated into distinct steps. This approach improves signal-to-noise ratios and reduces error propagation, enabling exceptionally high per-base accuracy with a large proportion of bases exceeding Q40 quality. Combined with a dual flow cell design that offers operational flexibility, AVITI positions itself as a high-accuracy, cost-competitive alternative within the short-read sequencing landscape.

Modern DNA sequencing technologies can therefore be understood as the outcome of two major evolutionary paths: short-read, massively parallel sequencing by synthesis, and long-read, single-molecule sequencing. Each platform represents a distinct compromise between read length, accuracy, throughput, cost, and biological resolution. Short-read platforms such as Illumina, BGI, and Element excel in applications that require depth, precision, and scale. Long-read platforms such as Pacific Biosciences and Oxford Nanopore reveal genomic structures that short-read platforms cannot resolve, including complex structural variation, haplotype organisation, and long repetitive regions.

Taken together, the evolution of DNA sequencing is not a linear progression towards a single superior technology, but a diversification of tools optimised for different biological questions. Each sequencing method uncovers different layers of genomic information, shaped by its underlying chemistry and physics. In practice, the most powerful insights increasingly arise not from choosing one platform over another, but from strategically combining technologies, aligning sequencing capability with scientific purpose.