DNA nucleobases with LaTeX’s chemfig

Chemistry has never been my strong point. Once you get past electrons, protons, and neutrons, i.e. to anything useful, it all gets a bit complicated. And no-one likes anything to be complicated. You know, you have to start making up names, or defining naming conventions… no thank you. At that point in my undergraduate education I pretty much gave up and left that sort of thing to the chemists and biologists, who seemed to be having a jolly time of it (at least they preferred that to doing the maths. Which is fine.).

However, in radiation biology one of the fundamental things we consider is how radiation interacts with DNA - deoxyribonucleic acid (which I just had to Google to check the spelling). Now, I knew/know that DNA is amazing. That’s a given. The fact that so much information can be encoded using a few different types of molecules - and the chemistry and biology behind all of that - is staggering, and far beyond the two-body interactions of high-energy physics that my brain can (just about) cope with. So, when I took a closer look at how DNA actually works, it became clear that it might be useful to look at the molecular structure of its components - in the first instance, the nucleobases (see, I’m getting the hang of the terminology already!) that make up the base pairs used in the encoding mechanism.

So the big question then is - how do I make this interesting for a physicist? Obviously, the answer is to use LaTeX to draw out beautiful diagrams of the molecular structure with code. To cut a long story short, I found the Tikz-based chemfig CTAN package and had a quick go at drawing them out.

Let’s start with the hexagon-y bit of adenine (technically known as the pyrimidine ring - an aromatic heterocycle of two nitrogen and four carbon atoms). Drawing a hexagon in chemfig is actually pretty straightforward:

\usepackage
\usepackage

\begin[htbp]
\begin
\chemfig
  

\caption{A hexagon in \texttt
\end
\end

This makes the following LaTeX figure:

A hexagon in chemfig, default orientation.

chemfig presumably works out that we’re drawing a hexagonal ring from the *6 notation and does all the hard work with Tikz for us behind the scenes. Nice.

We’re going to do two things next: 1) add the first of two nitrogen atoms to the ring (which is, I think, what makes it a pyrimidine), and 2) rotate the ring so that the nitrogen atom is to the right of the diagram (bonus points if you can guess why that might be useful). I’ll assume you have the figure environment code etc. in what follows, and just update the \chemfig bit:

\chemfig
A rotated hexagon in chemfig, with a nitrogen atom at position 1.

The square brackets specify the rotation angle (in degrees, and the sign specifies the direction) while placing the N before the ring code sets it as the starting atom (i.e. at position 1 of the ring). Rotating the ring by 210 degrees puts the nitrogen atom to the right-most part of the figure.

We can make it a pyrimidine ring by adding the double bonds and second nitrogen atom (at position 5) with the following:

\chemfig
A pyrimidine ring as made by chemfig.

You can see how the double bonds have been made by replacing the “-”s with “=”s, and how the nitrogen atom has been inserted at position 5 in the curvy-bracketed “ring string”. Satisfyingly, it’s pretty straightforward to add the amino group at position 2 using the same syntax:

\chemfig
The pyrimidine ring with an amino group added at position 2.

Adenine is the purine of the purine-pyrimidine pair of adenine and thymine (A-T). This is the nucleobase pairing that features two hydrogen bonds. Purines consist of two rings - pyrimidine and imidazole - fused together. We can add the pentagon-y imidazole to our molecule as follows:

\chemfig
Adding the imidazole ring.

You can see how the imidazole ring code has been inserted into the pyrimidine ring code, and how chemfig cleverly interprets this to fuse the rings together without us having to think about it. That’s a clever touch.

Finally, we’ll use a neat little trick from here to indicate where our adenine connects to the deoxyribose molecule without showing the whole “D” molecule itself as follows. First, you need define some things in the LaTeX preamble - see the final code for this (there’s too much to put inline). One of these things is a “sub-molecule” that we can then use as shorthand for the wavy line representing the deoxyribose. Then it’s simply a case of adding this to the imidazole ring at position 9 in the fused ring system:

\chemfig

And that’s adenine! You have to hand it to chemfig - it looks beautiful. The process of constructing the molecule in the code was also very useful for me to see how it’s actually constructed, I didn’t have to worry too much about things like bond angles and spacings (at least not for now), and I learned a bit of chemical terminology along the way to boot.

I’ve put the full LaTeX code here if you’re interested - making the other nucleobases is, of course, left as an exercise for the reader.

Enjoy!

Tom W