spaceSpace and Physics

Jane Austen Quote Encoded Into Plastic Polymer Sequence


Francesca Benson

Junior Copy Editor and Staff Writer

clockApr 21 2021, 18:07 UTC
Austen Quote Sequence

This molecular method of storing data could be a big improvement on prior methods. Image Credit: Sarah Moor

Considering the sheer amount of data humanity is producing in the digital age, improving how we store it is a task scientists are racing to tackle. Encoding data in strings of molecules – aka polymers – is a promising avenue, overcoming issues of longevity and durability associated with the usual silicon-based storage. One method of doing this involves coding info into DNA, with things such as music videos and the entirety of Wikipedia represented in genetic code.

However, although promising, DNA digital data storage is slow, expensive, and could have issues with long-term stability. Researchers at the University of Texas at Austin have showcased how information could be stored using polymers of a urethane-like plastic, potentially a more stable, efficient, and cost-effective medium than DNA.


In a paper published in Cell Reports Physical Sciences, they demonstrate their method by coding and then decoding a passage from the novel Mansfield Park by Jane Austen. The quote reads: "If one scheme of happiness fails, human nature turns to another; if the first calculation is wrong, we make a second better: we find comfort somewhere."

In a statement, paper author Samuel Dahlhauser explained that "This particular passage was chosen because we felt it was uplifting in these trying times, and it is easily understood without the context in the book."

First, the text was converted to binary code, then this binary code was converted into hexadecimal code. The team produced 16 individual molecules called oligourethanes to act as the building blocks of the polymer string, aka monomers. Using the Mol.E-coder software developed by the team, this hexadecimal code was then converted into 18 polymer strings, made up of a total of 176 monomers.


A process called self-immolation was used to decode the sequence. Luckily, despite the name, this doesn’t involve anyone setting themselves on fire – instead, this process involves one monomer at a time being cleaved off the polymer. The molecular weight of the string before and after the cleavage allows the individual monomers – and eventually, the entire polymer sequence – to be identified.

To test whether their system would be usable by someone without intimate knowledge of their coding system, the team tasked a colleague who was not involved in the project to decode the quote. On their second attempt (the first having two mistakes due to human error), the passage was decoded fully.

This molecular method of storing data could be a big improvement on prior methods. The limited amount of monomers that previous methods use (for example, the four bases of DNA) mean that long strings of monomers must be used to represent relatively small amounts of information. The sixteen monomers of the current method, as well as the team using a more efficient way to convert the information into binary, means that this method can be more information-dense and efficient.



 This Week in IFLScience

Receive our biggest science stories to your inbox weekly!

spaceSpace and Physics