In the previous tutorial, we talked about the
ways in which biotechnology is revolutionizing the production of important materials like
plastics. Instead of having to perform chemical reactions in the traditional way, like we do
in a laboratory, we can refer to a catalog of enzymes, and select a series of enzymes that
catalyze a sequence of chemical reactions which will transform some readily available
starting material, often plant-based, into an industrial material of our choosing.
We can then encode those enzymes in a plasmid, insert the plasmid into an
appropriate microorganism, and allow it to express the genes that result in
the enzymes of interest. With these enzymes now present inside the microorganism, it will rapidly
generate mass quantities of the desired product, in a way that has more in common with brewing
beer than with traditional chemical manufacturing. As far as industrial processes go, it’s cheap,
it’s environmentally friendly, it’s everything we want in new technology.
So now let’s get a little
bit more specific. Let’s examine the production of a particular compound that is currently being
optimized, as well as the challenges associated with this process and its optimization.
As we know, petroleum, or crude oil, is pumped from the ground and refined so as
to isolate individual organic components. Some of these components are used to
synthesize materials like plastics. One such component is methyl methacrylate, an
alpha-beta unsaturated methyl ester, abbreviated as MMA. This compound can undergo polymerization
to generate polymethyl methacrylate, or PMMA, which is a durable transparent plastic also
known by brand names like Plexiglas, Acrylite, and others, though we can generally refer to it as
acrylic plastic. This is a useful material in that it acts as a lightweight and shatter-resistant
alternative to glass, as well as an additive in acrylic paints.
However, this method of producing
PMMA is non-sustainable. Extraction of petroleum from the ground is harmful to the environment,
as is indicated by the 1.8 billion metric tons of carbon dioxide emission due to plastics
production in 2015. Also, oil in the ground will eventually run out, meaning this process can’t be
carried out indefinitely. So what is the solution? One solution is being developed by a
protein engineering company named Arzeda, and I reached out to project lead Aaron Korkegian,
who was gracious enough to fill me in on some of the details. As it happens, there is a compound
named alpha-methylene gamma-butyrolactone, or MBL, also known as Tulipalin A. It is named
as such because it is produced in tulips. This compound has properties that
are extremely similar to those of MMA. Like MMA, it can polymerize to form
PMBL, which is also a transparent plastic. But unlike PMMA, PMBL is more thermal and scratch
resistant, which makes it more durable than PMMA, so this is a more desirable process.
The only
problem is that it is not a major product of tulips. The amount of tulips that would have
to be grown to then harvest and purify an amount of Tulipalin A that would be industrially
useful is not even remotely practical. However, given our new technique utilizing enzymes, we
should be able to find a biological approach to synthesizing this compound ourselves. After all,
tulips use enzymes to make it, so why can’t we? Ok, so let’s get to work. As per the scheme
we outlined in the previous tutorial, we can start with some biologically-derived
sugars that are found in plants. Then we select a fermentation host, which is some microorganism
like yeast or a bacterial species. Then we have to engineer a metabolic pathway. We need some number
of enzyme-catalyzed reactions which will transform that sugar, or more likely some metabolite the
organism naturally generates from the sugar, into mass quantities of Tulipalin A, which we can
then purify and polymerize to make our plastics. As we mentioned, these enzymes do not need
to be native to this microorganism.
They can be any enzymes from any biological species,
which can be referred to as source organisms. In fact, they don’t even have to exist in nature
at all, completely novel enzymes can be designed. They just have to do the chemistry we are
looking for, because we can insert the genes that encode these enzymes into the host organism
via a plasmid, and it will make those enzymes, thereby promoting the relevant chemistry.
Now of course, this task should not be portrayed as trivial.
In fact, it represents the bulk of
the challenge. What is the sequence of enzymes that will be successful? Finding the answer to
this question would have been near-impossible just a few decades ago. But thanks to advancements in
computer science, many new avenues have opened up for solving such a problem. Recent developments
in computation have allowed us to take massive enzyme databases and combine them with efficient
algorithms which can search and identify potential enzymatic pathways from any starting material to
any potential product.
It should be made clear that although enzymes tend to have a highly
specific substrate within biological systems, when removed from their typical role in
biosynthesis, they can technically operate on a variety of molecules. Anything that fits into the
active site and promotes the achievement of the transition state for the enzymatic reaction will
be a suitable substrate. This means that slight structural variants of the natural substrate may
be totally viable. For example, say there is an enzyme class that is known to operate on phenol.
It may be the case that some enzymes within that class work just as well on 2-methylphenol, as it
is possible that when inserting into the active site, this additional methyl group can point
outwards and not interfere with the chemistry. By testing a wide variety of enzymes within
that class, we may find some that function on 2-methylphenol, even though they are
only documented as working on phenol. Therefore, this adds an element of testing for the
researcher, as if we are set on this compound as an intermediate in our pathway, we may need to
screen a multitude of enzymes that are known to operate on phenol and see how they work with this
slightly altered substrate.
For some of them the methyl group will clash in the active site and
it won’t work. For others there will be plenty of space, and the reaction will proceed without a
problem. Such an enzyme can be described as being “promiscuous” between the substrates.
Now imagine the multitude of potential intermediates, and the ways in which they could
resemble a known enzymatic substrate or another. The reality of the matter is that it will almost
never be the case that each intermediate in some desired pathway lines up flawlessly with the
precise structures that are documented as known substrates for enzymes in a catalog.
We just need them to be close enough that the chemistry may work the same way. So
when building the pathway, for each step, it may be necessary to select hundreds or even
thousands of different enzymes that each carry out the transformation you are looking for, but
on a substrate that is similar to what you’ve got. Again, say we have 2-methylphenol.
Enzymes that
are documented to promote the desired reaction on phenol could be tested, as could any number of
other untested proteins with sequence identities close to these known ones. Or perhaps activity on
2-methylphenol is also documented for a particular enzyme, but the activity is poor. Then perhaps
the structure of the enzyme can be modified, again driven by computational design. We can
model the active site and predict where that additional methyl group must be sitting,
such that it interferes with the activity. Perhaps we can see that there is a bit of
space for the methyl group, but one amino acid in particular seems to be getting in the
way, perhaps due to a bulky side chain.
We can swap that residue out for a different one with a
smaller side chain, and this singular modification may result in a hundred-fold better reactivity.
In fact, such modifications may result in the enzyme preferring the novel substrate over
the native one. And it doesn’t have to be just one residue. Our developing understanding of
protein structure and function relationships also allows for more aggressive designs that
change twenty to thirty residues at once, thereby significantly influencing enzyme
activity, selectivity, and expression. So as you can see, designing the pathway
is a mountain of work.
The possibilities are staggering, given how many enzymes there
are, and the wide range of substrates they could potentially act upon, particularly once
enzyme modification is taken into account. But one way or another, the pathway is
planned with specific reactions in mind, like for example, the reduction of an aldehyde to
an alcohol. For each step, some number of enzymes which perform that reaction on substrates similar
to yours are screened meticulously, and options that work for each step are either identified or
designed. This process continues until the whole pathway is covered. Pathways with fewer steps
are preferred, and once a pathway is selected, each enzyme is tested individually to ensure that
it can perform the transformation intended for it. To confirm their activity, the enzymes first
have to be generated. This entails identifying DNA sequences that encode each enzyme, and then
optimizing them for expression within the host organism. There are many ways we can do this,
and one example involves codon-optimizing for the intended host organism, like perhaps E.
Coli.
What this means is that since multiple codons can code for the same amino acid residue, as we recall
from learning about transcription and translation, there is significant variability that is possible
in the DNA sequence without altering the resulting protein. But certain sequences may be preferential
over others in a particular host organism for subtle reasons, such as the ratios of specific
tRNA production in that host. Essentially, in a particular organism, some codons are
used frequently, and some very infrequently. So if introducing a gene that contains many
instances of the codons that are almost never used, the necessary tRNA molecules may not be
available and gene expression becomes difficult. Once this is finalized, each enzyme can
then be individually expressed and purified, and the desired reaction is tested in vitro to
confirm that the intended chemistry is occurring. Thousands of different variants of a given
enzyme or enzyme expression system can be screened all at once in massive screening sets
so that quantitative data on their ability to promote the desired reaction can be gathered
quickly and efficiently. Once enzymes have been identified that can conduct the desired series
of reactions, their properties can be further optimized by computational design.
This entails
using computers to get ideas about how we might be able to modify an enzyme so as to improve reaction
rates, stability at a certain temperature, binding affinity, or selectivity. In
other words, we can use our growing knowledge of the relationship between enzyme
structure and function to tweak the enzymes so as to improve their activity on the desired
substrate, and therefore their ability to generate the desired product. These novel designs can again
be converted into corresponding DNA sequences for expression and testing, and this process can
be repeated, with further refinement each time. Once the activity of the complete enzymatic
pathway is confirmed in vitro, their genes can be assembled together on an operon, and transformed
into the manufacturing host for further testing. What this means is that because the genes
are part of the same operon, all the enzymes will be expressed by the microorganism at
once, in a singular act of transcription. After the initial test, further refinement and
iteration will typically be required to tune the expression of each enzyme in the context
of the overall efficiency of the pathway. This could involve modifying the individual
ribosomal binding site sequences in front of each enzyme sequence in order to adjust the translation
initiation rate separately for each enzyme, so as to modify the relative rates of enzyme production.
Additional work may also be required to engineer the host organism to better push carbon flow into
the desired product and away from other existing metabolic processes to increase the yield of the
target.
Once the numbers look good, scale up can occur sequentially with further refinement,
until in the long run, the large-scale process just looks pretty much like a vat of beer brewing,
and is essentially just as simple to execute. Though this technology is in its budding
stages, it promises to completely change the way entire industries operate. Whether aiming for a
target molecule like Tulipalin A or any other, the better these designer organisms become,
the closer we can get to supplanting current manufacturing techniques, thereby leading to
more diverse products that can be produced not only cheaply and efficiently, but from
simple plant-based starting materials which are grown from nothing but sunlight, water, and
air.
Any reaction can be run in the same vat with precisely the same materials, all that
changes is the specific engineered microorganism that is employed. This means we don’t need
a variety of expensive industrial machinery, there are no significant reagent expenses, nor any
reagent waste. Scaling up in this case just means buying a bigger vat. And again, the
whole process is carbon-capturing, or at the very least carbon-neutral, rather than
carbon-releasing.
It’s as clean and sustainable as manufacturing can get, and with companies like
Arzeda leading the way, it’s only a matter of time before we begin to see the societal
ramifications of these daring innovations..