Read, Write, Edit, Accelerate

A paradigm for understanding emerging biotechnologies and their impacts

Dec 10, 2024

Liked this piece? Show your support by tapping the “heart” ❤️ in the header above. It’s a small gesture that goes a long way in helping me understand what you value and in growing this newsletter. Thanks so much!

Over the next century, biotechnology is poised to revolutionize how we live, work, and address some of humanity's most pressing challenges. In fact, breakthroughs in biotechnology and related emerging technologies are already producing targeted cancer treatments, engineering more resilient crops, creating sustainable materials, and developing solutions to mitigate pollution.

But why now? After decades of scientific progress, why are we just beginning to see biology reshape industries and daily life? The answer lies in the exponential advancements across three core capabilities: our ability to read, write, and edit DNA, the "source code" for life.

Advancements in sequencing technology have significantly enhanced our ability to read DNA, enabling scientists to decode genomes with unprecedented speed and accuracy. Additionally, improvements in DNA synthesis technologies have allowed scientists to write new genetic code, creating novel biological sequences for the first time. Finally, breakthrough discoveries like CRISPR have revolutionized the field of DNA editing, allowing scientists to reprogram genes with high fidelity and at a scale previously unimaginable.

Together, these tools and discoveries have set the stage for breakthrough biotechnologies, which have further been accelerated by advancements in artificial intelligence (AI). In this article, we'll explore the foundational "read, write, edit" model underpinning innovative biotechnologies and examine how the convergence of biology and AI is driving new future-shaping breakthroughs and technologies at an unprecedented rate.

🧬 Read

To understand why we are just now witnessing a wave of breakthrough discoveries and world-changing biotechnologies, we need to first understand the concept of sequencing, which I briefly wrote about in a previous article, From Biomolecules to Bytes: Next Generation Sequencing and The Digitization Of Biology. Notably, sequencing can refer to the reading of DNA or RNA. However, this section will strictly focus on the former, though RNA-sequencing technologies are no less influential.

DNA reading, or sequencing, is the process of determining the precise order of nucleotide bases (adenine, "A"; thymine, "T"; guanine, "G"; and cytosine, "C") in a DNA sample. By reading the genetic code in a DNA sample, scientists can identify microorganisms, discover disease-causing genes, and infer biological properties and traits of bacteria that form the basis of countless research tools used in molecular biology.

At its core, DNA sequencing is a simple process. A researcher begins by obtaining a DNA sample with an unknown genetic sequence. The sample is then prepared, loaded into a DNA sequencer, amplified, and analyzed. Finally, the sequencer generates a digital representation of the DNA sequence (i.e., a string of A's, T's, C's, and G's), offering insights ranging from detecting genetic mutations to identifying pathogens. This ability to read DNA is foundational to modern biology and fuels innovations in personalized medicine, synthetic biology, and alternative food production.

The modern revolution in DNA sequencing is driven by next-generation sequencing (NGS) technologies, also known as high-throughput sequencing. Unlike traditional Sanger sequencing, which processes one DNA fragment at a time, NGS enables the simultaneous sequencing of millions of DNA fragments, rapidly producing massive quantities of data. This high-throughput approach is essential for large-scale projects like sequencing entire genomes or analyzing complex microbial communities.

Next-generation sequencing technologies follow the same general process: First, DNA or RNA is extracted from biological samples (tissues, blood, cells), which involves breaking cells open and isolating their genetic material from other cellular components. After being extracted, DNA or RNA strands are often too long for many experimental purposes; as a result, they must be fragmented, which involves using enzymes to break the long strands of genetic material into smaller, more manageable pieces. Lastly, adapters, which are short synthetic DNA sequences that serve as attachment points, are attached to the ends of the fragmented DNA or RNA to create a sequencing library, allowing the sequencer to recognize and bind to the DNA or RNA fragments, thus facilitating the sequencing process.

The prepared sequencing library is then loaded onto a flow cell, where the DNA or RNA fragments are copied millions of times through a process called cluster generation. Then, a technique called sequencing by synthesis is performed, where chemically modified nucleotides bind to complementary bases on the sequencing library, which serves as a DNA template. These modified nucleotides are labeled with fluorescent dyes, and a different color dye is used for each of the four nucleotides (A, T, G, C). As the fluorescently labeled nucleotides are added, a fluorescent light signal corresponding to the incorporated base is emitted and recorded with a photodiode, allowing the sequencer to identify and record the specific nucleotide incorporated into the growing DNA strand. The sequencer translates these signals into digital data, recording the order of the nucleotides in the DNA strand. The final output is a digitized file containing the sequence data, quality metrics, and metadata for further analysis.

This explosion in sequencing technology has been nothing short of transformative, with companies like Illumina, Oxford Nanopore Technologies, and Pacific Biosciences driving rapid advancements, enabling the sequencing of longer DNA fragments with unparalleled speed and accuracy. With economies of scale, costs of DNA sequencing have plummeted, and the sequencing of entire genomes now costs only a few hundred dollars—a stark contrast to the millions it cost just a decade ago.

These advancements in sequencing technology, combined with the reduced cost of sequencing, have generated massive quantities of biological data that form the foundation for countless research applications. For example, the data captured via NGS can illuminate the links between genes and their functions, enabling breakthroughs in personalized medicine, use cases in synthetic biology, and the development of resilient crops that can thrive in challenging environments. The insights gained from DNA sequencing are also pivotal in pandemic prevention, offering tools to track pathogens and develop targeted therapies.

By digitizing the blueprint of life, DNA sequencing has ushered in a new era of biotechnology. Its integration with artificial intelligence is further accelerating this transformation, driving innovations that were unimaginable just a decade ago, such as lab-grown meat, personalized cancer therapies, and even de-extinction technologies. The possibilities are as vast as the data we now generate. Understanding how DNA is read is not just a technical endeavor; it allows us to grasp the foundation on which the future of biotechnology is being built.

🧬 Write

DNA writing, or synthesis, allows scientists to create custom genetic sequences tailored to specific research and development needs. For example, synthesis technologies enable researchers to engineer organisms with new properties, such as producing biopharmaceuticals and creating more resilient strains of crops, among other applications.

The process of 'writing' DNA begins with designing a genetic sequence of interest, which is then uploaded to a DNA synthesizer. The DNA synthesizer then assembles the DNA nucleotide bases for the sequence of interest in the correct order, producing a DNA strand that can be used to advance numerous biotechnological applications.

DNA synthesis is the workhorse of modern biotechnology research, enabling techniques like polymerase chain reaction (PCR) and gene editing with CRISPR. These methodologies and technologies rely on oligonucleotides, which are short DNA strands that serve as primers or templates in various molecular biology protocols. The commoditization of oligo synthesis has made these sequences widely accessible, allowing researchers to design and order custom oligos, thus driving the proliferation of oligonucleotide-based technologies and fueling innovation across synthetic biology, medicine, and more.

The technology behind DNA synthesis has seen remarkable advancements over the decades, starting in the 1950s when scientists in academic labs manually synthesized DNA using early chemical methods. By the 1980s, phosphoramidite chemistry was introduced, enabling automated solid-phase DNA synthesis, which was then followed by array-based DNA synthesis methods developed in the 1990s, using spatially localized, light-activated chemistries to produce large pools of DNA sequences. This innovation, initially developed for creating DNA microarrays, mirrored the trajectory of NGS by applying extensive process engineering to massively parallelize a fundamental reaction.

Modern companies, such as Twist Biosciences, have refined DNA synthesis techniques, enabling them to produce longer and more complex strands of oligonucleotides at scale. These advancements are made possible through new enzymatic DNA synthesis methods that use biological catalysts for more efficient and accurate construction of DNA. This innovation has led to the development of benchtop DNA printers and new business models that offer "synthesis as a service."

Synthetic biology companies like Ginkgo Bioworks, which is among the largest consumers of synthetic DNA, have capitalized on these innovations. Over four years, Ginkgo Bioworks has ordered an estimated one billion base pairs of synthetic DNA from Twist Biosciences, demonstrating the demand for custom DNA and its role in driving large-scale biotechnological projects.

Outside of large, venture capital-backed companies like Ginkgo Bioworks, the democratization of DNA synthesis technology and advancements in economies of scale have also enabled researchers to conduct rapid and large-scale genetic design and testing, generating massive datasets ripe for analysis. By integrating artificial intelligence, researchers can further optimize genetic designs and predict their outcomes, significantly enhancing the impact of DNA synthesis technologies. For example, researchers can now design large libraries of synthetic genes, pathways, or genomes and utilize multiplexed assays, which measure multiple analytes simultaneously in a single sample, to evaluate their functions in parallel. This level of experimentation was nearly unimaginable just a decade ago.

Despite these advancements, our ability to read DNA, as discussed in the previous section, still surpasses our ability to write it. However, as the cost of DNA synthesis continues to decline, researchers will continue to push the boundaries of synthetic biology, ultimately expanding the scope of what can be achieved with engineered life. DNA synthesis is not just a tool; it is the driving force behind the biotechnology revolution. By enabling the engineering of genetic material, DNA synthesis lays the foundation for innovations that can transform industries, improve lives, and tackle some of the most pressing challenges of our time.

🧬 Edit

DNA editing is the process of altering an organism's genetic material by changing, adding, or removing specific DNA nucleotide bases. DNA editing allows scientists to modify the properties of a cell and its abilities, alter gene expression patterns, and even eliminate genetic diseases in larger animals, like humans. Editing DNA requires using a gene-editing tool, such as CRISPR-Cas9 or TALENs, to target a specific section of a DNA sample and then alter that section with a new or corrected DNA sequence. These interventions have become transformative tools in biotechnology, enabling advancements across medicine, agriculture, and industry.

While DNA editing represents a modern marvel, it builds on billions of years of natural evolution. Organisms' DNA has evolved through random mutations and natural selection, shaping the genetic diversity we see between and within species today. For millennia, humans have influenced this process through selective breeding, gradually shaping the traits of plants and animals to better suit agricultural and societal needs. Recombinant DNA technologies introduced in the 20th century revolutionized genetic manipulation by allowing scientists to combine DNA from different sources, creating new genes with novel functions.

Gene-editing tools like CRISPR have further advanced this field by offering unprecedented precision in modifying an organism's own DNA. CRISPR, which stands for Clustered Regularly Interspaced Short Palindromic Repeats, relies on a guide RNA (gRNA) and a protein called Cas9 to locate and cut specific DNA sequences. Once the DNA is cut, the cell's natural repair machinery can introduce changes, ranging from single base-pair modifications to inserting entire genetic sequences. This process enables targeted interventions that were unimaginable just a few decades ago, fundamentally altering how we study and engineer life.

The potential applications of DNA editing are vast. In medicine, CRISPR is being used to develop gene therapies for conditions once thought incurable, such as certain types of cancer and genetic disorders like sickle cell anemia. In agriculture, these tools are being used to create plants and animals that are more resilient to pests, diseases, and changing climates. In industrial biotechnology, DNA editing is enabling the creation of microorganisms that can produce advanced materials, biofuels, and environmentally friendly chemicals.

Used responsibly, DNA editing offers solutions to humanity's most pressing challenges while expanding our understanding of biology and what's possible. However, it also raises ethical questions and requires robust governance to ensure that these powerful tools are applied for the benefit of society. As we continue to refine our ability to read, write, and edit DNA, the possibilities for innovation are nearly limitless, ushering in an era of rapid scientific discovery and technological progress.

By integrating the capabilities of reading, writing, and editing DNA, scientists are transforming how we interact with the fundamental building blocks of life. DNA editing represents the final piece in this triad, providing the tools needed to reshape the genetic code with precision and purpose, ultimately driving biotechnology into a future filled with promise and potential.

🧬 Accelerate

The convergence of artificial intelligence (AI) and biotechnology, termed AIxBio, is transforming the pace and scope of advancements in our ability to sequence, synthesize, and edit DNA. By leveraging machine learning algorithms to analyze, interpret, and predict biological phenomena, AI is revolutionizing biology research and accelerating breakthroughs across a wide range of biotechnological applications. Additionally, the relationship between AI and biotechnology is mutually reinforcing, as massive datasets generated through DNA sequencing are the substrate needed to develop more sophisticated AI models.

Within the field of AIxBio, there are a number of types of AI systems that differ in purpose, scope of practice, and accessibility. On one end of the spectrum are Large Language Models (LLMs), such as ChatGPT, which are trained on vast amounts of written data combed from the internet and designed to produce natural language outputs. These models are accessible and user-friendly, requiring little to no technical expertise from end-users. On the other end of the spectrum are Biological Design Tools (BDTs). These specialized AI models are trained on biological data, such as genetic sequences, microscopy images, and 3D molecular structures. Unlike LLMs, BDTs demand that end-users possess expertise in both biology and computer science in order to use them effectively. Additionally, BDTs are designed for solving domain-specific problems in biology, including predictive modeling, running simulations, and interpreting complex datasets. Although BDTs are less accessible to the average user, they highly effective in augmenting researchers ability to solve technical problems and have become essential tools in biology research.

The impact of AIxBio is already profound, and ever increasing. Many large biotechnology firms have already integrated AI and machine learning into their product development pipelines, while startups worldwide are developing AIxBio platforms to solve specific challenges in drug discovery, precision medicine, synthetic biology, and biomanufacturing. For example, AIxBio is transforming drug discovery by enabling algorithms to digitally screen millions of chemical compounds and predict their interactions with biological targets. This reduces the time and cost of development while increasing the likelihood of finding effective treatments for diseases. Similarly, AI-powered diagnostic tools are enhancing the accuracy of medical imaging analysis, processing data from MRIs, CT scans, and X-rays to detect conditions like aneurysms, cancer, and strokes earlier and more reliably than traditional methods, resulting in improved patient outcomes.

Beyond the laboratory, AI is automating routine experimental tasks through systems like those developed by Emerald Cloud Labs. For example, autonomous biology labs can now create, analyze, and validate biological components such as DNA and proteins, freeing researchers to focus on creative and exploratory scientific work. Additionally, Generative AI, commonly associated with creating text or images, is being adapted to produce entirely new biological information. When trained on biological datasets, these platforms can generate novel genetic sequences and molecular designs, pushing the boundaries of synthetic biology and biomolecular engineering.

One of the most groundbreaking applications of AIxBio is in protein structure prediction. Tools like AlphaFold and RoseTTAFold use AI to accurately predict the 3D structures of proteins, a task that once required years of painstaking laboratory work. This capability has unlocked new insights into protein function, enabling researchers to develop therapies for complex diseases such as cancer and rare genetic disorders.

The integration of AI into biotechnology has created a powerful feedback loop. The ability to read and write DNA generates vast quantities of data that feed AI models, making them more robust and capable. In turn, these advanced AI systems enable new discoveries, further expanding the frontiers of what is possible in biology. This symbiotic relationship is not just accelerating progress in the present but also unlocking possibilities for a future defined by rapid innovation, transformative solutions, and unprecedented scientific discovery.

The convergence of AI and biology represents a paradigm shift in life sciences, offering tools to revolutionize industries, improve human health, and tackle global challenges with unparalleled speed and precision. As this field continues to evolve, it will redefine the boundaries of biotechnology and reshape our understanding of the biological world.

Want To Read More?

If you enjoyed this article you may like the following pieces:

Gairik Sachdeva

Dec 10

These incredible tools of biology are ready, though, one wonders if the "how" is preceding the "what."

The highest impact from Write and Edit technologies for DNA would come from a deeper understanding on how the complexity of biology leads to new functions. Although the DNA sequencing and AI for protein structure are good steps in that direction, there seems to be a lot more to learn there.

I can't wait for that knowledge to start emerging, with the help of more powerful computational and empirical tools, so that the tinkering can truly begin!

Expand full comment