When Perfect Code Produces Imperfect Science

Programming Literacy Is More Important Than Ever for Bioinformaticians and Computational Biologists

May 15, 2025

Liked this piece? Show your support by tapping the “heart” ❤️ in the header above. It’s a small gesture that goes a long way in helping me understand what you value and in growing this newsletter. Thanks so much!

The RNA-seq analysis looked perfect. The LLM had generated streamlined code that transformed raw FASTQ files into count matrices, identified differentially expressed genes between treatment and control groups— complete with wonderfully colorful volcano plots— and even performed comprehensive functional enrichment analyses, generating a series of dot plots, upset plots, and alluvial diagrams. It was hard not to be impressed.

Trusting this seemingly flawless output, the researcher submitted the results for publication. It wasn’t until two months later, during peer review, that they made a devastating realization — the LLM-generated code silently reversed the treatment and control groups in their code due to a quirk in the file labeling. Every single biological conclusion in the paper was backwards. The code ran perfectly — it just answered the wrong question.

This scenario, drawn from a real researchers experience shared with me at recent conference, illustrates why asking "Should I still learn to code when AI writes better code than humans?" misses the point entirely. The answer becomes clearer when you consider how modern biology research works.

PIs—What Are They Good For? Actually, A Lot

Before starting my PhD in Bioinformatics & Computational Biology, I worked with a principal investigator who shattered my illusions about scientific leadership. He micromanaged every analysis while offering no constructive feedback, argued endlessly about computational and mathematical nuances he clearly didn't understand, and seemed to collect credit while contributing nothing. I left thinking PIs were glorified figureheads.

My experience early into my PhD revealed how wrong I was. While I could dive deep into specific problems—going an inch wide and a mile deep— I worked with PIs who could operate on a different scale entirely. They didn’t just have breadth. They could maintain technical depth across that breadth. The best PIs rarely do hands-on benchwork anymore, yet their deep understanding of lab methods— earned through years running experiment and troubleshooting failures— allows them to design clever experiments, spot subtle anomalies in data, and guide their teams through complex problems.

The same principle applies to computational biology in the era of LLMs. Like PIs choreographing laboratory research, future bioinformaticians and computational biologist will orchestrate analyses at higher levels of abstraction. But this orchestration requires fluency in the underlying methods. The need to understand how to code is greater than ever — not because you’ll outperform AI— but because programming literacy lets you direct these systems toward meaningful biological insights.

Beyond Code— Understanding Biological Context

Consider what happens when an experienced bioinformatics scientist uses AI. They can rapidly prototype analysis pipelines, delegate routine tasks while focusing on algorithm design, and leverage their domain knowledge to spot when LLM-generated suggestions miss biological constraints. They know enough to ask the right questions, evaluate answers critically, and integrate AI-generated components into their code

Eric Ma, a senior principal data scientist at Moderna Therapeutics, captured this perfectly in a recent post where he wrote "Just last Friday, I implemented a custom contrastive learning model with simulated data in one afternoon, with GitHub Copilot assistance", latter noting that prompting Copilot required his nine years of ML experience to "know what to ask an LLM to do for me, in which order, with a particular code organization." His conclusion? Judgment, experience, and taste—the factors that are uniquely human, make all the difference.

Contrast this with someone lacking programming foundations. They might get functional code that appears to work but can't verify whether it actually solves their biological problem. When edge cases emerge—and in biology, edge cases are the norm—they can't debug their code effectively. More critically, they often don't know what they don't know, leading to fundamental misunderstandings about what their analysis actually does.

This observation reveals something important about bioinformatics. Coding is perhaps the easiest part of the job1. The real challenge is in understanding the vast network of connections—how data flows through analytical pipelines, why certain analyses or have to precede others, and which statistical assumptions underpin different methods. All of these interdependencies make bioinformatics projects less like building a house, and more like baking a cake.

When you build a house specifications translate directly to outcomes. If you add an extra wall the finished product is just the house with an extra wall. When you bake a cake, on the other hand, small variations in technique or timing can produce dramatically different results. Add a little extra butter and the result may not just be an extra buttery cake — instead, it could be an oily glob. Similarly, when performing bioinformatics analysis small details make all the difference — choose Bonferroni correction over Benjamin Hochberg, use a t-test over a Wilcoxon sum rank test, or select a different normalization approach and your biological conclusions can transform entirely, often for the worst. These aren't just technical choices but scientific judgments requiring deep understanding of both computational methods and biological systems.

Developing Scientific Taste

AI excels at generating code but struggles with the meta-cognitive tasks that define real bioinformatics work. When you ask AI to analyze RNA-seq data, it can produce a working pipeline. But does that pipeline account for batch effects specific to your experimental design? Does it align with your biological hypothesis? These questions require what we might call "computational taste"—an intuition developed through hands-on experience coding and conducting biological research.

Tommy Tang, Director of Bioinformatics at AstraZeneca, and author of a great bioinformatics blog— Diving Into Genetics and Genomics — emphasizes this critical distinction in his writing. AI can generate working code and produce directionally correct visualizations, but that doesn't guarantee the code is doing what you want or answering the right question. As Tommy frequently notes, understanding the assumptions of models and statistical methods is what allows bioinformaticians to derive correct biological interpretations2. The ability to interpret data—to extract meaningful biological insights from the outputs of our code—remains fundamentally human.

This taste manifests in subtle ways. Experienced bioinformaticians develop a sixth sense for when results "feel wrong" despite appearing correct. Like PIs who can glance at a gel and immediately spot problems, computational biologists who understand code can quickly identify when AI-generated analyses miss crucial biological nuances. This intuition doesn’t come from memorizing syntax and formulas, but rather from understanding the deep structure, assumptions, and limitations of different computational approaches.

The parallel to laboratory leadership extends further. Just as the most innovative PIs occasionally return to the bench for important experiments or to develop new techniques, the most innovative bioinformaticians will still hand-code new algorithms they’re working on or novel analytical approaches. AI can handle the routine implementation, freeing them to focus on genuinely novel computational methods.

The Computational PI — An Evolution of Expertise

We're witnessing the emergence of the computational PI—scientists who understand coding, and analysis methods well enough to get their hands dirty when needed, but instead focus their energy on the conceptual architecture of analyses rather than the minute details of implementation3. Even individual contributors can adopt this work flow, strategically prompting AI "lab members" to write certain segment of code, using their expertise to review or modify it before integrating it into their analysis, and then interpreting the outputs and working at higher levels of abstraction to see where these new results fit within the big picture.

This process mirrors the evolution that molecular biology went through in the 90’s. Whereas molecular biologies used to produce their own plasmids and sequence DNA manually, today’s researchers can order plasmids online and use automated DNA sequencers. Notably though, this automation or routine tasks didn’t reduce the need for molecular biology expertise — instead, it expanded the scope of questions that experts could ask and freed them to work on other creative pursuits. Similarly, AI-generated code elevates, rather than eliminates, the role of bioinformatics scientists and computational biologists.

Future bioinformaticians may not spend as much time mastering the quirks of R and Python or optimizing the structure of nested loops for better runtime. Instead, they’ll need to be master choreographers, understanding how different components within their code and analysis pipelines interact, which which biological assumptions underlie various algorithms, and how to validate that AI-generated code actually answers their scientific questions. They'll need to read code fluently— but may not write all of it from scratch— audit AI outputs, identify logical flaws, and ensure biological validity.

Navigating the Transition

We're in a unique transition period. Today’s senior bioinformatics scientists learned to code through trial and error, relying on textbooks, courses, and forums like Stack Overflow to help them troubleshoot errors and pick up new skills. They built a firm foundation of skills before LLMs hit the scene, allowing them to take advantage of their deep expertise and leverage these new tools to great effect. Today’s bioinformatics students are coming in a radically different environment. How do we prepare them for an uncertain future that’s still taking shape?

Part of the answer lies in shifting educational focus. After learning fundamentals we should emphasize understanding data structures that mirror biological relationships, developing statistical intuition for biological inference, learning to decompose complex biological questions into computational components, and practicing code reviews and validation skills.

This transition demands new teaching approaches that give students the skills they need to traverse multiple layers of abstraction from thinking about the big picture (what needs to be done and why) down to the details of execution (how do we do it). Students should also be learning how to combine manually written code with strategic prompting and how to debug AI-generated code and identify when it violates biological constraints. The goal isn't to create better programmers than AI, but better directors of AI's programming capabilities.

A New Wave of Computational Leadership

The most successful bioinformaticians of the next decade won't be those who can implement the fastest sorting algorithm or write the most elegant recursive functions. They'll be the scientists who understand computation deeply enough to decompose complex biological questions into analytical components, orchestrate AI tools to implement their vision, and maintain the scientific skepticism to question whether their code truly answers their biological questions.

Much like how experienced PIs developed their scientific intuition through years at the bench, computational biologists need to develop their "computational taste" by understanding the fundamental structures of code, data, and algorithms. This taste allows them to immediately sense when an analysis seems off—despite perfect syntax and beautiful visualizations—because some underlying biological assumption has been violated.

Even with powerful LLMs, learning to code proficiently remains essential—not because you'll outcode AI, but because coding literacy is your passport to directing these powerful tools toward meaningful biological discovery. Like a PI who needs to understand benchwork to lead a lab effectively, you need to understand programming to lead computational analyses in the age of AI.

The focus should shift toward learning data structures that mirror biological relationships, algorithms that capture biological processes, and statistical principles that underlie biological inference. Future bioinformaticians should learn to read code like a PI reads experimental protocols, understanding not just what it does but why it's structured that way and what assumptions it embodies.

As we navigate this transition period, educational approaches need to evolve. Students should practice moving between different layers of abstraction—from big-picture biological questions down to small details of implementation — all while developing the judgment to determine when AI assistance is helpful and when human expertise is irreplaceable. The future belongs not to those who can write the best code, but to those who can think computationally about biology—and teach AI to implement their vision while catching those critical moments when perfect code produces imperfect science.

Did you enjoy this piece? If so, you may also want to check out other articles in Decoding Biology’s Programming Fundamentals collection.

This is part of the reason why you can’t learn bioinformatics from courses and books alone. Coding, tool use, mathematics, and statistics are all important skills for a bioinformatics scientist, but the most important parts of the job can only learned by doing. As a result, aspiring bioinformatics scientists should start working on projects as soon as possible after mastering the basics — Dean Lee’s Figure 1 Lab is a great place to get started. I’ve also created a handful of bioinformatics analysis tutorials and projects guides, which you can find in my Bioinformatics Toolkit.

Tommy published a great piece titled Heatmaps Are Lying to You that hammers this point home. Of all the skills a bioinformatician needs to master, creating a heatmap seems pretty trivial. But, as Tommy points out, little details make all the difference — "If your heatmap uses a nonlinear or poorly scaled color gradient, the viewer might see a pattern that isn’t real. Or miss one that is. For example: Mapping values from -3 to 3 on a red-blue scale sounds okay. But if most of your data lies between -0.5 and 0.5, your heatmap will look empty — mostly white."

To be clear, the "computational PI" need not be an actual laboratory PI. Rather, it’s a framework for thinking about the evolving role of bioinformatics scientists and computational biologists. As AI can handle more and more of the rote aspects of coding, meta-cognitive skills and the ability to see the full picture become increasingly important, even for junior scientists.

Ming Tang

May 15Edited

Evan, this is so well written! We should all learn coding and use deep biology understanding to guide our way to analyze the data.

Tommy

Expand full comment

1 reply by Evan Peikon

Joydeep Nag

May 18

This is amazing read! Thanks Evan...

Couple of years ago I realized that Western Blots in the lab though powerful but can only take me so far. I NEED to learn a little bioinformatics. Today I am (quite) proficient in RNA-seq and scRNA-seq, but deep inside I feel I am still scratching the surface, as I don't really know how a certain function works (with what assumptions) behind the scenes. Always gets me the imposter syndrome.

1 more comment...