Hi, my name is Jim Wells. I'm with the departments of Pharmaceutical Chemistry and Cellular and Molecular Pharmacology at UCSF. And I'm going to tell you today about the process of drug discovery and development in two parts: part one it will be screening of compounds and in that regard I'll be joined by my colleague Michelle Arkin. Hi, I'm going to talk after Jim gives an early history of drug discovery and talks a little bit about target identification, then I'll talk about the process of screening and hit identification. Great, see you in a bit. This slide shows some of the products of modern drug discoveries, such as Lipitor, which is used for cholesterol lowering, or a more recent drug, Gleevec, which an anti-cancer drug.
These compounds were discovered through a very rational, systematic process, involves a lot of exciting scientific discoveries as well as a lot of serendipity, luck and hard work. To understand how we found these compounds, it's useful for us to review how drug discovery came to be, what's the sort of brief history of drug discovery, as I'll show you on the following slide. To understand the modern drug discovery development process today it's useful to review the history, briefly, of drug discovery. Prior to 1900, most drugs, in fact only a few really, were identified through human screening. Natural products, for instance, aspirin, was discovered from tree bark. Quinine was discovered. And even illicit drugs like cocaine were discovered. Long about the turn of the century, 1906, the Food and Drug Administration was established because a number of these kinds of potions or elixirs were found to neither be safe nor efficacious and it was necessary to regulate these in a systematic way. And this led to the development of animal based screening, for example, to discover anesthetics, bacterial screening to identify antibiotics, and the like, tissue screening to identify compounds that could react with neurological receptors, like GPCRs, HTS, high throughput screening, now very common discovery technology, as you'll hear a lot more about in this talk, for discovering target-based compounds.
And then, lastly, mechanism-based discovery, which was used for HIV drugs and the like as well as molecular and cellular based screening for kinase inhibitors. And finally, genomics, to actually profile patients to determine who will be affected and who won't be affected. So, in fact, this process, the history of drug discovery, had gone from the human to the molecular target and this now in reverse reflects what we actually do today. Shown on this slide is what, in sort of general terms, the modern drug discovery process. And this process starts off with a disease, and from that disease one tries to, through a lot of biochemical cell-based, genetic and other means identify what is the target or the molecular species in a cell, in an organism, that's causing that disease. One then develops a drug to that target, as you can see here, and then, having identified that target, one then needs to identify a compound that will interact with that target in a phase called lead discovery. This is a chemical process where we identify the first compounds that are actually important in modifying a disease and then once compounds are identified, typically in cells and then in animals, they're prepared for clinical trials in this process called drug development, which is, this phase is really about interfacing the compounds that we discovered here to the human biology that we wish to effect here.
And if successful, we'll come out of this process with a drug. Now, this process is a long winded process. It typically takes about now about 10-15 years to discover a drug and it's expensive too. It's about half a billion to a billion dollars to develop a drug. So, when you're thinking about the pills that you take in a bottle, think about a shopping mall, because that's easily the cost that it takes to get to the drugs that we end up using.
Ok, I wanted to just review quickly what kind of classes, what kind of molecules constitute drugs. There's actually three basic classes and they include the small molecule, organic compounds, typically, these are compounds whose molecular weight is less than five hundred and they're taken, generally, orally, although they can also be taken as an injectable. And they represent the kind of classic drug that you think of when you go to a pharmacy, that you would buy over the counter, for example.
There's another class of very important drugs known as the protein therapeutic drugs. These are typically injectable drugs, molecular weights of over ten thousand, often up to a hundred thousand, or even higher. And they are the important class of biotherapeutics and they represent about thirty percent of drug sales today. The other class of drugs, actually one of the very first to be developed are the vaccines. And these are basically viruses, pieces of viruses, that are used to elicit an immune response to a disease. So these are the basic categories and today I'm going to focus on small molecule drug discovery, leaving these other two categories alone for another talk. So, the process of small molecule discovery is a long and winding road. And it starts off with identifying what is the most critical target that's involved in mediating the disease. So, identifying the disease target. Having identified that target, generally a protein target, one then goes through a process known as hit identification, shown here, and the role of hit identification is to get the first compounds that actually engage the target.
Which compounds actually bind to the protein of interest and can begin our drug discovery process. From there, taking that isolated protein in a test tube, we need to show that that compound actually works in a cell. And so, this begins this process called hit to lead which is to generate a compound which has cellular potency. The next stage, sort of drawing from there, to a larger scale, is the lead optimization stage. This is a critical stage in which one actually shows that these compounds that have been generated have animal efficacy and actually work in a pharmacological model for the disease. The next stage after that is the IND enabling stage, this is the stage that is preparing compounds for clinical trials.
Primarily, it involves animal tox experiments, in addition, chemical synthesis, scale, and formulations experiments, and at the end of this process, one would hope to have a package that you could convince the food and drug administration that you have a compound that is going to be both safe and efficacious when administered to humans. Then begins the all-important human clinical trials if the FDA agrees with you. In the first trail, is for human safety. This is typically done in a dose escalation, kind of trial, with healthy volunteers, although in certain disease settings, like cancer, you can use people with cancer. And the goal of this is really to find out what is the circulatory lifetime of the drug in humans and how safe is the drug if its dose is increased. The next phase, phase two, is involved in determining the efficacy of the drug in a disease setting. So, this would be taking patients with the disease, treating them with your drug at a level that's below any toxicity that was observed in phase one and in ranging doses to find out what is the efficacy of the drug as a function of its dose and what's the best dose to best effect the disease.
So, from these small trials, then, one then moves into a much larger, what's called registration trial, phase three, in which one then fixes the dose, fixes the disease, fixes the formulation, and then treats a large number of cohorts, both with and without the drug to determine how effective the drug is. And at the end of this time, if your drug is safe and efficacious, you'll submit what's known as a new drug application, an NDA, to the FDA. They will either, they will review it and agree with you or not, that you have a drug that's ready to go into humans and at the end of that process, you have this pill down here, which will then be launched with great fanfare, because this process is, as you'll see, a very long and arduous one. Ok so I'm going to start at the beginning here with target ID. What's causing the disease? How is it, what is the actual molecular target that we want to go after? This link to the disease of interest. And this is actually a very, very, can be a very long process to find out what causing the disease.
Many diseases we don't have a clue as to what their cause is. And in fact, ironically, even with all the tests that we might do to validate a target, the final validation of a target is not known until you get down here with the pill itself, to see if that is actually effective in a human. Ok, so just briefly, what are the general causes of disease, what are the things that we think about. First thing is, I like to think is it a bug or is it in the body? Is it an infectious agent or is it a host imbalance? So, for instance, if it's an infectious agent, that's causing the disease, generally these days, we have sequences of the pathogenic bacteria.
We'll find a target that's not in humans and then we'll take that protein target and go after that in the drug discovery process. If however, it's a disease like host imbalance, maybe it's a metabolic disease or cardiovascular disease or cancer you first have to decide is it due, is the disease caused by an underactive protein, for instance, people with diabetes, they're not as responsive to insulin and so by giving them back insulin you can hope to modify and ameliorate that condition.
Other diseases, for instance, here, many cancers are caused by overactive proteins such as kinases, and so there's a lot of interest in discovering drugs that would inhibit specific kinases for cancer. Ok, that is just sort of a very skimmed view of this process, but just to give you a sense. Once you have identified the target, this target process actually can be very complex. So, the human genome is vast, there's some twenty thousand genes that code for proteins. And finding exactly which one is causing the disease can be challenging. So, one you've come up with the protein, and the gene that encodes it for that particular disease, you're ready to go on to another very important consideration. Which is that not only do you need to go after the biology of the target, the target itself, which is causing the disease, but that target is itself has to be amenable to small molecule discovery.
And by that I mean it has to be something that we think could bind a small molecule. because that in the end is ultimately what we want to do. What I show here is a recent drug target known as the BCR-Abl protein, shown in this space filling view here, and in it is the small molecule known as Gleevec, which was found to bind to it. And you can see that these are, it was thought that this would be a good drug target because it was known that kinases such as this bind ATP, and ATP targets have pockets in them for which we can find small molecule surrogates. And indeed, they did find them for Gleevec. Another property to look at is does the site have a cavity or a hot-spot, an energetic region in the molecule that can bind a small molecule So these kinds of considerations sort of define the druggability of the target. So, first you have to determine the biology of it, and its link to a disease, and next the druggability of the target.
Based on whether or not we think we can find compounds that will engage the target. Ok, now certain targets really like to bind small molecules. And in fact, one of the favored disease targets are GPCRs, G-coupled protein receptors. They naturally bind small molecules and they've been traditionally great targets for going after for small molecules. and in fact about forty five percent of the approved drugs are GPCRs. Other targets which are of great interest in terms of the biology are kinases, that I had mentioned proteases and protein-protein targets. These targets can bind small molecules, but in general, they bind them weaker than GPCRs and that probably accounts for why we see so much activity in finding GPCR inhibitors than others. We then want to talk about what is it that we're looking for at the end? What is it we hope this drug will be? Well, first of all, most oral drugs, we'd love them to be just a single, daily dose.
That you only have to take it once and typically, that would mean something less than a hundred milligrams of the drug that you would take. Just for example, here's a pill bottle of Ibuprofen, in here are tablets which are about in total two hundred milligrams or so, less than half of that is the drug itself because the rest is the formulation for the drug. Now, in order for that to be the case, in order for us to be at a drug dose of a hundred milligrams per day, there's certain molecular properties that a compound's going to have to show. And one of them is it's going to have to bind to the target with a high affinity and selectivity, so that it binds just one target ideally, so that, and does so with a great deal of potency, so that you don't need much compound to trigger that. We can measure that in this process over here, where we show direct binding of a compound to a protein. So, what we can do is we can titrate the compound in, increasing concentration from left to right of the compound and measure the binding ability of the compound to the target and this case we can measure, then, at what concentration we get fifty percent binding and that's called the Kd and in this case it's ten nanomolar.
That's a nice binding compound. We would like our compounds to be that potent or more. In addition, we can measure the potency of the compound in a cell, because that's obviously, going to have to bind to the protein in the cell. And in order to do that, we would like potency of this in a cell based assay to be at least a hundred nanomolar for the midpoint here in binding to the cell. If it meets those criteria, then it has the potency potential to be at a once daily, less than a hundred milligrams dose. But there's some other very important considerations too. And we'll get to that in the lead optimization part of this talk. Which is the half-life of the compound in the body. So, we would like the compound to not be cleared too rapidly, the body wants to clear small molecule compounds, does so through the liver and the kidney, and in other means.
And we would like that compound to have a half-life in blood of greater than about three hours in order to have an effectiveness over twenty four hours for the drug. Also, we do want the drug once we take it to be orally active, so that we would want the oral uptake to be at least fifty percent of the drug ingested. to be taken up. Ok, well with these considerations in mind, let me just go also into the chemical considerations of the drug that we want. So those are some of the biological considerations, what are some of the chemical considerations. And here we have a list of four guidelines that were provided by Chris Lipinski and his colleagues at Pfizer that studied a whole variety of orally active drugs and identified several properties that are important for making good, potent and orally active drugs. So, for example, one of the things that most orally active drugs one of their properties, is that they have a molecular weight less than five hundred Daltons. So, we would look to be building compounds that are less than that. They have good solubility, things that are not very soluble don't dissolve so well so they don't go through the gut.
And so we'd like to have this parameter called cLog P less than 5. Looking at the structure of the drug too, they found that drugs that had fewer than ten hydrogen bond acceptors such as these moieties here, and here, were better at being orally active drugs as were compounds that had fewer than 5 hydrogen bond donors, such as these species here. And so, these are just general guidelines, they're not, they're many drugs that actually break these rules, and more recently, people are realizing that we can finesse these rules, finesse compounds so that they can actually exceed these guidelines. But they do serve as useful guidelines in thinking about it. Ok, the next consideration that one needs to make is that chemical space, that is we now have to move on to finding compounds that will actually engage our target. The chemome, it turns out, as it were, these are compounds less than five hundred molecular weight is a huge chemical space. If we were to calculate, as has been done, all the compounds that can be made with a molecular weight five hundred from carbon, hydrogen, nitrogen and oxygen, typical constituents in an organic drug, one could build compounds that would be about ten to the sixty in diversity.
Now, this number ten to the sixty two is greater than the number of particles in the universe, so we're not going to be able to build all these things. And so what we want is we want methods where we can actually search chemical space effectively to find the area where we want the drug to be. Generally, it turns out that most targets will yield hits to the target when we screen on the range of a thousand or a few thousand compounds we'll find a handful of hits from that process. So the more compounds you screen, the better the likelihood is of getting to hits to your target. And there's no way that we're going to screen all the compounds that are out there, that could be made out there because there's just far too much chemical diversity for us to sample everything. In this regard, bringing a small molecule to a big molecule, involves a lot of considerations that I talked about and also involves a lot of faith, the same of kind of faith that Michelangelo had when drawing this picture.
And I'd now like to turn it over to my colleague, Michelle Arkin, who's going to tell you about how we actually make this happen in the laboratory. Alright, so today we're going to talk about hit identification so I say this, getting on the board. So, getting a small molecule that has some of the initial properties that you want in your final drug. And as you see, it's very early in the process. And so having a good starting point here with good molecular properties and good selectivity is really going to help as you winnow down based on other properties that Jim will talk about in the second park of his lecture. So, how are we going to identify that initial chemical starting point. One way is to start with a natural substrate and make it drug-like. If you know the substrate, for example, the ATP-like analogues that Jim gave as an example earlier, you can make these more drug like. That's a sensible place to start. You can start with somebody else's starting point . We call this, jokingly, patent busting, but it's also a very interesting way of getting a late stage compound.
If you know the liabilities of your competitors or your own compounds, then you can improve those liabilities and really get a better molecule right off the bat. You can also design a hit from scratch, de novo, using structure based design if you really know a lot about your binding site and really know a lot about that protein's structure and this is difficult but it is an area of a lot of improvement and a lot of research. And finally, probably the most widely used method nowadays is screening. HTS stands for high throughput screening, a more technical, and a little bit new approach is called fragment based screening, but we're going to focus today on screening by high throughput screening.
And there are two reasons to do this; one is it's very generally applicable, widely used in the industry, and also more and more people in academics are using this technique as well, to find molecules, not only as starting points for drugs, but also as tools to investigate their biology. So you can think, yourself, your own biology, would it be helped by having a small molecule that specifically interrogates that biology and high throughput screening is one approach for getting those tool compounds.
So, especially when you're starting with high throughput screening, you have to screen a lot of molecules to find a drug. So, we're starting here, a high throughput screen can be anywhere from a hundred thousand to a million compounds. Maybe even more than a million compounds. And from that you're going to have some hit rate, so most of those molecules will not be active against your biology. But then a large subset will, say five hundred, to five thousand. Then we have other metrics that we're going to use to winnow those compounds down, based on potency, other in vitro properties, to a hundred to five hundred and you can see that the process keeps going down, here you can see that we're doing toxicology and safety studies, and now through clinical trials, we have attrition all the way until we get to that final molecule.
So, we're going to be testing a lot of compounds, we're going to be synthesizing a lot of compounds. So, we need to have robust ways of doing that, precisely and with high throughput. So, what do these molecules look like? We start with libraries of drug-like molecules, and they can come from several places. Three common places that these libraries can be assembled from include natural products, so compounds that come out of extracts or now partially purified compounds from fungi and other species, structure design, so if we know something about the molecular targets that we're looking at, for example, GPCRs are a very common target, as Jim mentioned, and there are a lot libraries that have compounds that look like other GPCR agonists or antagonists. So, you again have a designed library based on particular biological and structural classes. And then finally, there are diversity libraries. These are just selections of molecules selected to be as different from each other as possible still trying to fall within these drug-like properties, like Lipinski's rules, as Jim mentioned, or other drug-like parameters that you might consider. And this is a schematic example of what those drugs would look like.
The straight lines are scaffolds, so you'll have different shapes to the molecules, and then the different polygons around those are different functional groups. So, you'll have an array of different shapes, different sizes, and different functionalities to come to about half a million compounds plus or minus. You're then going to screen those compounds in some assay that is amenable to screening half a million compounds, and then at the end, hope that you get some number of molecules that bind to the target in the way that you want. So, how do we handle this many compounds? There are two general ways to think about handling them, one is miniaturization and the other is automation.
So, in miniaturization, here we have a standard format plate, and each of these wells is another experiment. Like a test tube or an Eppendorf tube, but they're arrayed here in 96, 96 times 4, 384 well format, and even smaller, 4 times smaller than that, fifteen thirty six well format. So, here we can do one thousand five hundred and thirty six experiments on a single plate. And we'll go into the lab in a minute and I'll show you what these plates look like. And then there's automation, so we have automation, that allows us to prosecute many examples of these plates, one right after the other, we have parallel pipetters, so that we can add reagents with high reproducibility to each of these experiments. Robotics to handle those plates and those pipettes, and then plate readers that are able to read all of these experiments in a high-throughput fashion. So, you can hear now and see now that we're in the laboratory, can you hear the hum of activity? We were just talking about miniaturization into small format plates and automation, so I thought we would show you where we keep our libraries and then show you some automation.
We have several of these freezers and in each freezer we have five shelves, and in each shelf we have large numbers of racks of compounds. Here you can see they're stored in 96- or 384- well format and they're barcoded so that we can directly go back to our database and find out what's in those wells. So that we can identify the structures of the compounds and we know what to do next. Ok, now let's go look at some automation. Now we're in the automation room, and I wanted to show you what the plates look like.
At the small molecule discovery center, we work mostly in 96-well format and 384-well format and different color plates depending on the assay. Here's a 96-well plate used for luminescence assay, you want to have maximum scatter, to get all of your photons from the luminescence. Then we have black plates, here's a 384-well format, the black bottom plate, this is used for fluorescence, which we measure from the top. And then here's a 384-black sided plate with a clear bottom, and this we use for doing high content imaging for microscopy.
Here we have the liquid handler, so these liquid handlers are set up to do automatic pipetting, when you have a lot of plates to work with. We have an incubated plate hotel so your cells can stay warm and cozy, in their plates while they're waiting to be processed. And then over in the back, you probably can't see it here, but in the back, we have a large plate hotel that holds tip boxes, so pipette tips, and also plates for processing. So, now we're going to go through one of these routines, so you can see how the pipetter works. Now we're watching an automation run, where we're going to be adding media to a plate. That will eventually contain cells.
So you see the 384-well plate is coming out of an incubated plate hotel, this is where the cells can sit while they're being processed, so that they can stay warm and cozy. You'll notice we're going from a 96-well plate into a 384-well plate. So we're using a 96 tip head, And the pipetter is going to pull media from the 96-well plate and pipette it into all four quadrants of the 384-well plate. Now we say thank you to the tips, thank you to the 96-well plate, and we'll send that 384-well plate of cells back into its hotel. And this plate hotel can hold 42 plates at a time. So you can imagine that pipetting 384 experiments by 42 times would be quite laborious without an instrument like this. Fluorescence is one of the most popular assay formats to use both for biochemical and cell based assays. There are a lot great things about fluorescence. It's very sensitive, often in the nanomolar range of sensitivity, it has a very wide dynamic range, so you can see over several orders of magnitude of intensity.
And you can also tune the wavelengths so that you can look at multiple probes at one time that fluoresce in different colors or you can look at the interaction between probes. So, naturally, we don't look at fluorescent 384-well plate visually and score it, we use a plate reader. And so here's the very same plate, on that plate reader, collects the data very quickly, and you can see the image coming up on the screen, as we go. The red is going to be the lowest fluorescence and the blue is among the highest fluorescence on the plate. From there, if you're screening a whole bunch of compounds, you can see that you would have a range of colors in between those.
And the top and the bottom of this plate show a dose response curve. So that you can see how the fluorescence changes intensity as the concentration drops. So, this allows us to quantify how much fluorescence we have and therefore how much active enzyme or inactive enzyme we have present in the mix. We just looked at some of the automation equipment that we have in the laboratory, showed you the freezers, where the compounds are stored. Now let's talk about what assays you're actually going to put into those little wells. What little miniaturized experiments are we going to do? So that depends very much on your biology and what's known about the biology. Now, if you have a purified protein and you can make an activity that you can visualize, very typically fluorescence or luminescence, you can do a biochemical based assay. So here's a very simple example where we have an enzyme that binds to a substrate and this substrate has this caged fluorophore in green so that it doesn't fluoresce when it's not bound, then when it binds to the enzyme and is activated by the enzyme, is acted on by the enzyme, it releases this fluorophore, which we can now detect as an increase in fluorescence here at 520 nanometers.
If we're looking for an activator of this enzyme, we'd be looking for a low signal that becomes a high signal. If we're looking for an inhibitor of the enzyme, we'll be starting with a situation like this, with a high fluorescence signal, and looking for things that inhibit that signal. So there are several instances where a biochemical assay is really an appropriate high-throughput screening assay especially if you know what protein you want to inhibit. Bcr-Abl is a good example, it's the known agent of causing CML. So inhibiting that enzyme is going to inhibit that disease progression. You also need to be able to express and purify that protein in large enough quantities to run many experiments on it and that protein needs to be similar enough to what it is in the cell to make sense. Also, the selectivity among similar proteins is important. So, again, the kinase example. There are hundreds of kinases in the body, some you want to inhibit for your disease, some you don't.
They're critically involved in homeostasis. So, if you have a whole panel of those similar proteins, then you can test activity against your protein of interest without having activity against the proteins you're not interested in. Ok, there are other examples. So when you have a biochemical assay, you come out of it knowing the molecular target. You know what protein your compound is binding to. What you don't know coming out of that assay is whether that compound is going to survive in the cell, whether it's still going to bind to your target when it's in the cell, whether it will even be permeable and get into that cell.
On the other hand, if you use a cell based assay, then you'll know the opposite. You'll know that the compound is active in the cell, because that's how you identified it, but you won't necessarily know the molecular target, the protein or other biomolecule that that molecule is targeting. So, when would you want to screen in a whole-cell context. There are several cases where you'd want to screen in cells, and this is an area that's really gotten to be much more popular recently, as high throughput cell culture has become available. For example, if you know the molecular target, but you can't isolate from it the cell, GPCRs are a good example, ion channels are an example, and more subtly, some kinases even are regulated by subcellular localization, by time in the cell cycle and if you take that protein out of its context, it might have a very different inhibitory profile then in the cellular context.
Also, sometimes your goal is to alter a cellular phenotype. Maybe you're not that interested in the molecular target, but you want to turn on a whole pathway, or you want to kill a bacteria, and you're a little less concerned about how you kill that bacteria or for example, you get your phenotype and you'll use that molecule to understand the biology well enough, use that molecule to tell you what the important targets are to cause that phenotype.
For example, death of a bacterium. Also, if we know the pathway that we want to inhibit, but we don't know the best or most druggable target in that pathway, we can allow the molecules to tell us that information by using an assay that measures some parameter at the bottom of that pathway and see which molecules inhibit or activate that function. How are we going to read out these cell based assays? So, remember we have cells cultured in this small, microtiter plates now, and we want to read them out in some way that a plate reader will be able to detect them. There are two very common, basic approaches to looking at cell-based assays. One is to look at a whole well parameter, such as fluorescence or luminescence. So, this would be an example of reporter cell lines, where you have a luciferase or you have a green fluorescent protein that's downstream of a promoter that you want to turn on or off. And if you alter the state of that promoter, then you're going to get more or less of that fluorescent or luminescent protein.
Another more recent advance that's getting to be very popular also is to look at those properties cell by cell, and we're going to go through some examples of high-content imaging, where we're looking at an automated microscope so we can look at each cell in that well, and multiple parameters that are affected by the drugs. So this instrument is our high-content screening instrument, it's an In Cell Analyzer, by GE Healthcare. It's basically automated microscopy. It allows to automatically collect data, fluorescence and absorbance, transmitted light data, across several wavelengths, sequentially, in a multi-well plate. Several plates at a time. And then later, we'll analyze that data.
So the assay we're going to look at today is a pretty simple, straight-forward assay. It just has one stain, we can do up to six stains. This is just a nuclear stain, which we'll see in blue, and we're looking for trypanosomes. Trypanosomes are parasites, they're the causative agent of Chagas disease, which is a terrible disease that infects people in South America. And these trypanosomes, these little parasites, live inside host cells. So, this assay is going to have a host cell that's infected with parasites and we'll be able to detect and quantify the number of parasites per host cell based on the nucleic acid staining. So, let's get started. So, now Kenny, our high-content screening expert, starting the instrumentation and it's loading the plate in the plate reader, it's setting the objectives, the lenses to 10x objective, the correct filters to detect fluorescence in the blue channel.
And now you can see the data are coming up. We're scanning twenty fields per well see here, and as the data are collected, they're coming up on the screen. And they're also being saved into the computer. They'll go to our database for analysis later. You can see the large spots are host nuclei and the smaller spots in some of them are trypanosome kinetoplastid DNA. This is very dense DNA that is next to the nucleus. We just looked at this experiment on the high content imager and now we're going to look at how we analyze that data. So, you saw that we can automatically collect these experiments and in the blue channel, here, we're looking at nucleic acid. We see the large circles here, are the host nuclei and the small dots are kinetoplastid DNA from the parasite.
So it's very straightforward for the instrument, for us to teach the instrument which of these blobs belong to host nuclei and which of these blobs belong to the parasite. And here you see circled in green what the instrument has identified as host nucleus and in yellow, circled, what the instrument has identified as kinetoplastid, or parasite DNA. We can then very simply count the number of cells that have parasites in them and look at a reduction in the number of infected cells with drug treatment. Or look at the ratio of parasites to host cells. One really nice thing about an assay like this is instead of a whole well assay, where we just see the properties overall, here we can look as well at the health of the cells.
So, are we killing the host nuclei? Are we causing them to be apoptotic? Or to otherwise change their morphology? We don't want to alter that cell, we only want to remove the number of parasites that are infecting those cells. So, we can get a lot of selectivity and some early toxicity information from an experiment like this. So, now we have run our screen and this is an example of a ninety thousand compound screen. So, a sort of moderate size high-throughput screen. And you can see the three types of experiments that you get out of this. Here you have on the y-axis, we have a percent inhibition. So, we're looking at an enzyme that's active at the bottom and fully inhibited, or inactive, at the top.
In green, we see what the average fluorescence is, these are maximum signal controls on each plate that show you the variance in the activity of the enzyme across that plate. And then a fully inhibited enzyme, here these are minimum signal controls going across the plate and you see some variation, but overall, this is a minimum signal, a highly inhibited enzyme signal. And it's normalized to the average here, being 100 percent inhibited and the average of the green being 100 percent active. Now, in blue, are the compounds that we've actually screened. So, how do we identify which of these compounds are inhibitors, or partial inhibitors that we want to follow up in further assays. We have an assay parameter that's commonly used that's called a Z-prime value and the Z-prime value measures the width of the minimum signal, the width of the maximum signal, and the distance between them to come up with an assay parameter that varies between negative one and one, one being a perfect assay, point five being adequate for high-throughput screening. To give you an example, this has a Z-prime factor across the entire assay of about point eight.
So this is a very high quality assay. Now we go to find hits in that assay, here the grey line is that average minimum signal, it's the average fully inhibited enzyme. And here, this black line, shows you three standard deviations above the noise. So, most of the compounds are totally inactive in this assay, the enzyme was fully active. And then three standard deviations above that these would be considered statistically relevant changes from active or partially inhibited. We would go in and select several of these compounds and test them some more.
So the next steps are to re-test those compounds to see if they recapitulate, run does responses of those compounds to see how potent they are, determine the mechanism of inhibition. This is really important aspect of high throughput screening because a lot of compounds will inhibit a cellular function or protein for reasons that are not drug-like. So we need to tease those apart before we put a lot of effort into the compounds. Then once we're confident that the compounds are truly active, and acting in the way we want them to be acting, then we go into optimize. And this is what Jim will be talking about in the next segment of the seminar. So, just to go back to where we started, hit identification takes some time, and there are a lot of experiments that go there. But it's just the first step to discovering a drug. So, if we start here with a hundred thousand to a million compounds, we really want to make sure that what's going through this funnel has a chance of being successful at the end.