VIDEO: Emma Lundberg on Mapping the Spatiotemporal Proteome Architecture of Human Cells
Author | Director External Innovation, Dewpoint Therapeutics |
---|---|
Type | Kitchen Table Talk |
Topics | |
Keywords |
Emma Lundberg, the Director of the Cell Atlas and founding member of the Human Protein Atlas program, joined Dewpoint and Condensates.com on June 1st for a lively discussion about her lab’s work as part of the Kitchen Table Talk series. Emma is an Associate Professor of Bioengineering and Pathology at Stanford, recently having moved from KTH Institute of Technology in Sweden.
Her research focuses on bioimaging and spatial proteomics, and her efforts have led to systematically assessing cell biology at a single cell level using an antibody-based approach. This work has resulted in an image-based map of subcellular distribution of the human proteome, which really highlights the complexity of subcellular organization, and has implications for characterizing and cataloging the components of biomolecular condensates. You’re going to see some of this work in the video below. Emma was also kind enough to provide written answers for a couple questions she didn’t have time to answer during the live show–you can find those further below. Enjoy!

Create an Account or Sign In to view the video.
TRANSCRIPT
Michael Fenn (00:00:00):
Thanks for joining us, Emma. Really appreciate it today and we’re really excited to have you. Just a real quick intro, then I’m going to turn it over to Emma and let her go. Emma is associate professor of bioengineering and pathology at Stanford, recently having moved from KTH Institute of Technology in Sweden. Emma is also director of the Cell Atlas, and founding member of the Human Protein Atlas program, which I think many of you are probably aware of. Her research focuses on spatial proteomics and cell biology. She works really at the interface of bioimaging and proteomics, where her efforts have led to systematically assessing cell biology at a single cell level, using an antibody-based approach. You’re going to see some of this work today, and this work has resulted in an image-based map of the subcellular distribution of the human proteome, which really highlights the complexity of subcellular organization.
Michael Fenn (00:00:56):
So today we’re going to hear a bit about Emma’s work in regards to the subcellular spatiotemporal variability, as well as temporal changes of cellular compartments and composition, which I think we all have a pretty good idea of the many implications this has in the context of characterizing and cataloging biomolecular condensates. And she’ll also talk a bit about the computational techniques that she uses in processing this vast amount of data these techniques create. So with that, I’m going to go ahead and turn it over to Emma. And again, thanks so much for joining us today.
Emma Lundberg (00:01:27):
Thank you so much for inviting me and thanks for that introduction, and to all of you watching, thanks for joining and I look forward to hearing your questions. Let’s get started. I’m going to talk, as you heard, about our work to map the spatiotemporal architecture of human cells, the proteome architecture. And this work is, as you also heard, to great extent, based on the use of antibodies. But the reason that my lab is interested in the subcellular organization of proteins is, as I’m sure you are a crowd that is very aware of that, is of course that cellular processes are partitioned in time and space, and knowing where a protein is located gives us good clues about its function. It’s also known that relocalization of proteins is an important form of functional regulation and mislocalization may be a cause of disease…
Emma Lundberg (00:02:17):
And despite this, I would say that most systems level descriptions of cells today do not take spatial distribution of proteins into account, and I think that we need to in order to understand cells as systems. So that’s what I’m going to try to convince you of today if you’re not already with me on that. Sorry, I got a bit of a cold, so please bear with my voice through this talk.
Emma Lundberg (00:02:42):
And my work is closely connected to the Human Protein Atlas, where I’m one of the directors, and I’m sure you’ve used this database. It’s freely available at proteinatlas.org. We have about 1.5 million users on a yearly basis, both free for commercial and non-commercial use, and we’re mapping the protein expression using our in-house generated proteome-wide collection of antibodies to cells or tissues, using… And I can also talk for a long time about our antibodies, how we generated them, and how we validated them, and all of that. I don’t have it in my presentation, but happy to discuss it later if you want to do that. Basically what I do want to point out is that the most important raw data that we have in the Protein Atlas is images.
Emma Lundberg (00:03:25):
We have about 20 million images that have been systematically acquired and systematically annotated. These are a great resource to develop machine learning models, for example, for image interpretation. And to the left here, you can see the immunohistochemical images that outline the cell type expression in situ and in tissues, just sections. And to the right, you see fluorescent images outlining the subcellular expression of proteins. And today I will mainly focus on the right side of these talk, the subcellular partitioning of proteins. And starting with basically the spatial aspect, and then I’ll move into temporal aspects, and finally, end with computational approaches.
Emma Lundberg (00:04:05):
So when we started this project over 15 years ago, we realized we wanted high resolution insights into where in the cells our proteins are distributed, so we decided to build an automated workflow using oil immersion, confocal microscopes, and work with some fixed reference markers to basically outline the cells. So we have DAPI for the nucleus. We have a marker for microtubules and a marker for the endoplasmic reticulum, and these have stayed constant throughout the project, most of the project. And on top of this, we use our in-house generated antibodies to localize proteins one by one, always together with the reference markers. So this is CD-44 and as expected, we see it on the plasma membrane of the cell. So basically we systematically generated data like this, and the conclusion would be that this protein is in actin filaments. You can see it in green. So the protein of interest will most often be showed in green in my talk. We can see it in nuclear speckles. We can see, if it looks like this, we conclude this protein is in the centrosome and so on.
Emma Lundberg (00:05:06):
So this is what we’ve been doing systematically for a very long time, which ended up in this paper published in Science in 2017, a subcellular map of the human proteome, where we had localized, at that point, 12,000 proteins to 32 different cellular structures, and this is more or less the intracellular proteins that are expressed in cultivated cell lines. It doesn’t go higher than this, more or less. And at this point, half of these proteins were previously… Their location was not known in databases such as Uniprot. So it’s a rich resource of basic biological information. And based on this, we could define organelle proteomes and we’re still, today, using this data set as a starting point for in depth cell biology studies, because images are very rich in information and I’ll get back to that later.
Emma Lundberg (00:05:55):
So one thing I do want to point out to this crowd. So we have not focused that much on phase separated structures, but we’ve done some work. So this is a paper that was published last year, where basically we’ve seen for a long time this strange rim-like pattern in the nucleolus and we call it a new compartment, but maybe it’s not a compartment, but more enrichment of certain types of proteins. So we could see this distinct compartment in literature, it was described as an artifact from cell fixation, but we could see it for several different antibodies. We could also validate it with GFP-tagged proteins, but we could see that fixation, of course, let’s say made this pattern even more pronounced, but it was still there even in live cells.
Emma Lundberg (00:06:39):
And we can also quantify this patterns and basically identify all the proteins with this pattern. And in total, we found that 157 of the nucleolar proteins are enriched at the rim and these proteins have much higher intrinsic disorder than the rest of the nucleolar proteome, which, in itself, has a lot of intrinsic disorder in it, and we hypothesize that the nucleolar rim proteins are associated with the paranucleolar chromatin aiding in the tethering of the nucleolus to the chromatin, but this we have not verified. What we do see is that a lot of these rim proteins tend to go to line the mitotic chromosomes during mitosis. So that is the little work that we’ve done directly with phase-separated structures and maybe we’ll follow up more somewhere going forward.
Emma Lundberg (00:07:30):
Right now, the group is working on mapping two organelles that we don’t have present in the Protein Atlas at the moment. So primary cilia that you can see to the left, very interesting structures of cells that also show a lot of heterogeneity in the proteome composition between different cell types. And then also micronuclei that also show a lot of heterogeneity, but often in between, even within the cell line, there’s great proteome heterogeneity of micronuclei. So that is some of the locations that you’ll see added to the Protein Atlas if maybe this year or the coming year.
Emma Lundberg (00:08:08):
But one of the findings from this earlier paper that we still find very fascinating is that more than 50% of all proteins localize to more than one compartment in the cell. So here you can see a protein that is in the nucleus and on the plasma membrane. And I should say also now there’s been other studies based on mass spectrometry that even indicate that this number could be as high as 69% of the proteome. And it’s been shown also for yeast and mouse. So we believe that this is accurate. And what is interesting is of course, to pleiotropic effects that may arise from these multi-localizing proteins. So they may have a context specific function, interact with different proteins, and moonlight in different parts of the cell and thus increase the functionality of the proteome and also the complexity of the cell from a systems perspective. So if I would have to give one good reason for why we need to take spatial distribution into account when we model cell activity, this would be it. Because if we don’t take this into account, we’re basically losing out on a lot of potential protein functions.
Emma Lundberg (00:09:13):
But related to these multi-localizing proteins. So we can observe that more than half of all proteins localize to multiple compartments, but still there’s a lot of questions. How is this localization achieved? The sorting, how is it achieved? Are the populations static or dynamic? Are they constantly interchanging? And what are the functional consequences? And here we have no systems level answers yet. It all depends. We have handful of examples where we know that if we perturb the nucleus, some proteins will go to the Golgi apparatus and vice versa. And we have indications of that it’s a lot of dynamics going on here, but of course it has to be studied for every single protein in order to answer it. But I think that in this context, we also need to think about proteoforms and modifications, something that we cannot easily resolve with our antibodies. So this is where we would turn to mass spec based approaches.
Emma Lundberg (00:10:06):
But to get back to the moonlighting activity, I still think this is an interesting phenomenon. And if we look in literature again, we know that we can see that predict… Some models predict that 23% of the human proteome is estimated to have moonlighting activity. And this number has been increasing over the years. So I wouldn’t be surprised if it continues to increase a bit, but if we look at well-characterized moonlighting proteins, there’s only about 200 proteins where the dual functions are known and characterized, and most of these are enzymes. And many of these are also well studied proteins and drug targets. So this hints at this bias in biology that we know more about the well studied proteins, whereas there’s a whole bunch of proteins where we don’t know that much.
Emma Lundberg (00:10:51):
And on this topic, we published a little opinion piece last week about the consequences of the fact that there are so many understudied proteins and the biases that are introduced in all your gene set enrichment analysis and things like that, just coming from this. So despite that we’re two decades from when the Genome Public Project was published, this is still the case. So if you’re interested in it, please check it out. And we have some ideas on how we could systematically address this as well.
Emma Lundberg (00:11:20):
But this is a plot I do want to show. So if we see that proteins have multiple locations, how can we figure out if they also perform different functions? And here we’ve been working quite a lot with protein-protein interaction data, and we actually find protein-protein interaction data and image data very complimentary. So this is just two examples of enzymes that are involved in collagen synthesis. And we see them in vesicles, in the cell, as you can see here, but we also see them in the nucleus and this has not been described in literature.
Emma Lundberg (00:11:51):
But then if we look at protein-protein interaction networks, for both of these enzymes, we see that they interact with the expected other enzymes and proteins in the cytoplasmic compartment, but they both also interact with other proteins in the nuclear compartment. And these proteins are involved in mRNA processing. So here, for example, we can start to form hypotheses that both of these enzymes may, in addition to serving as enzymes in the cytosol, form some kind of function, maybe scaffolding function, in mRNA processing in the nucleus. So we believe that this combination is very fruitful and we look forward to more protein-protein interaction data sets being generated for different types of cell lines, so we can assess this systematically.
Emma Lundberg (00:12:39):
Yes, but to add to the complexity. So we’ve talked about the spatial distribution and the proteins can be in multiple places, but then of course we have temporal heterogeneity also. And if you have temporal heterogeneity and we work with static images, temporal heterogeneity will look something like this. So your cells in your image will look slightly different. So you’ll see a population heterogeneity. So here, for example, is one cell line they’re more or less genetically identical cells, but they’re not synchronized, but we can see that this protein is in the nucleus in some cells, in the cytosols in some cells, in both sometimes, and also at the plasma membrane. And this protein is Enolase1; it’s one of the few, well 200, well-characterized moonlighting enzymes. So we know that it’s a glycolytic enzyme in the cytosol, functions as a plasminogen receptor on the cell surface, and binds DNA– some kind of tumor suppressing activity in the nucleus.
Emma Lundberg (00:13:31):
So here, I just think that this is a cool example to basically exemplify what I’m saying so that these cells that are genetically identical, they present a great functional heterogeneity by modulating the location and abundance of this protein, because we can also see that some cells have a higher abundance of these proteins than others. So there’s great heterogeneity here. And when we started to think about this, we realized that this kind of heterogeneity we can actually assess from our images. So we asked ourselves, what’s the extent of basically heterogeneity among the human proteome in cultivated cells.
Emma Lundberg (00:14:04):
And we searched through our image database and looking for images like this, where the protein is only expressed in a subset of the cells, where we see intensity expression level differences–high and low abundant cells, or when we see translocations like here. The protein is in the Golgi apparatus in most cells, but here, in these two cells that’s just divided, it’s in the endoplasmic reticulum. So in total, we could see that 19% of the human proteome shows such single cell variability or heterogeneity, if you will. And so this really made us think about the temporal partitioning of proteins, and how we could resolve that despite working with antibodies.
Emma Lundberg (00:14:45):
So this led us on a long, several years work, on temporal protein profiling. And given this last image, we were thinking that these cells are non-synchronized. So maybe most of the variations that we see actually relate to the cell cycle. So let’s start out by figuring out which of these proteins correlate to cell cycle progression. And this is work led by Diana and Anthony in my lab.
Emma Lundberg (00:15:13):
To start out with the easiest thing with the images of course, is to identify the proteins that are spatial temporally restricted to mitotic cellular structures. And we found, well 98 proteins that were previously not characterized to be in any of these compartments, like the cytokinetic bridge, mitotic spindle, the midbody, mitotic chromosomes, kinetochores, and so on. This is nothing that my lab has followed up or anything, but if you’re interested in mitosis, it’s a good data set. And maybe there’s interesting proteins you haven’t thought of being involved in these processes here.
Emma Lundberg (00:15:42):
But then to resolve protein expression over interphase, we realized that we have to do more experiments. So we chose to work with the U-2OS FUCCI cell line, so a cell line with genetically encoded for authentic indicators for cell cycle positions. So the cells are red, they’re increasingly red, and then they turn on the green probe and become yellow. And then increasingly green as time goes. So basically if we measure the red and the green fluorescence, we get this trajectory of the cell cycle and we can fit a model to this horse shoe shape and basically make a linear representation of cell cycle pseudotime.
Emma Lundberg (00:16:16):
And here we’re sorting cells doing single cell sequencing. And we also co-stain these cells with antibodies. So choosing the antibodies that had shown variability in the Protein Atlas before, and then we restain them here and map the protein heterogeneity. So just to point out to be very clear, we’re actually not following one cell over the cell cycle, but we’re rather measuring hundreds of cells, and we’re determining the cell cycle position of each single cell. And then we map that onto the cell cycle pseudotime. And that’s how we get this trajectory.
Emma Lundberg (00:16:48):
So to show you some real data at the top here, you see two well known cell cycle regulators, BUB1B and CCNE1. And this is typical profiles that we get. They’re a bit noisy because of the FUCCI system being a bit noisy, but still accurate enough so that we can see temporal profiles. So we see this temporal profile at the protein level and also at the RNA level for both these proteins and it’s known that they are regulated by transcript cycling. So this all makes sense. But at the bottom here, you see two phosphatases, DUSP18 and DUSP19 that were previously not, there was nothing in the literature about them being involved in proliferation or cell cycling in any way. And we saw these striking temporal profiles at the protein level, but a flat line at the RNA level. So here we would hypothesize that DUSP18 and DUSP19 are likely regulated at a translational or post translational level. And now we’ve generated a phosphorolation proteomics data set using mass spectrometry resolved over interphase as well. So we can confirm that these are likely driven by phosphorolations actually. And these types of profiles were very common for the new proteins that we discovered in this project.
Emma Lundberg (00:18:08):
We discovered about in total about 500 cell cycle proteins, and three of these were new, never described in literature before. And most of these were in turn, stable at the RNA level. So this is probably the reason they were not discovered before, because most singles cell studies of the cell cycle had been done at the transcript level. But then if we look at this whole set of transcript regulated genes and proteins, and look at protein-protein interaction data, again, they all form this highly connected network, which is good. It indicates that they are indeed involved in the same cellular process. But then if we look at which are the ones that are regulated by transcript cycling, it’s a bit hard to see the colors here, but basically we get this super tightly connected core of transcriptionally regulated core cell cycle machinery. And then the new proteins that we found, they’re basically connected to this core, but form an extended network. And we’re thinking that this is how cell cycle signaling is propagated out to the cell, for example.
Emma Lundberg (00:19:08):
But we also did for a handful of examples, some knockdown and gene silencing experiments to show that these non-transcriptionally regulated new cell cycle proteins can also influence cell growth. And that’s actually the case that’s shown here for DUSP18 and JPH3. So there’s many, this was a huge study and the paper is packed with the data. So this is a very short overview of it, but besides all the new cell cycle proteins that are of course interesting, we can also determine that the average temporal delay between peak of RNA expression and peak of protein expression for the genes that are transcriptionally regulated is actually 8.6 hours.
Emma Lundberg (00:19:48):
And at first, this sounded long, but it’s actually very similar to what has been reported for the circadian rhythm cycle. So it makes sense, but for the cell cycle, it’s good to be aware of that when you’re doing single cell measurements and you want to correlate RNA in protein expression, be aware of this, it might not always correlate. And for the cell cycle in particular, often, you will see a peak of RNA in the G1 phase, whereas the corresponding proteins will peak in G2. So it actually bridges phases of the cell cycle. And of course there’s many proteins of potential clinical interests here, particularly potential novel markers of cell proliferation that stains slightly different cells than Ki67. And also in my interest, it’s very common with cell cycle dependent translocations, particularly to the nucleus in a later phases of the cell cycle. And this was published in Nature in the end of last year, if you want to check it out and the data’s available in the Protein Atlas.
Emma Lundberg (00:20:48):
But to get to our initial hypothesis, that most of the proteins where we saw heterogeneity in the images would be explained by the cell cycle. In that assumption, we were wrong. So only about one third of the proteins could be explained by the cell cycle, which is very interesting because it basically means that two thirds of the proteins that show single cell heterogeneity, are not explained by the cell cycle. So what is this, people tend to ask us. So is it stochasticity? No, we don’t think so. Of course we don’t know, might be, but there are some observations here. One is that we see great enrichment for metabolic enzymes. If we look overall both cell cycle, non-cell cycle dependent variations, I told you that 19% of the human proteome showed such heterogeneity, but for metabolic enzymes, this number is 40%. So significantly higher, and we can also see it across many different types of metabolism.
Emma Lundberg (00:21:42):
There are also some other interesting observations. If we call these groups of proteins, cell cycle dependent proteins and non-cell cycle dependent proteins, if we just look at these two groups of proteins and ask what the difference is, we can see for example, that the magnitude of cellular variation that we see in populations, those are similar and they’re significantly higher than for the rest of the proteome. We also see that these sets of proteins have more intrinsic disorder regions than the rest of the proteome as well. And they have lower thermal stability than the rest of the proteome, but there are some differences, for example, differences in that they localize to slightly different compartment. So the cell cycle dependent proteins tend to be in the cytosol nuclear compartment. Whereas the non-cell cycle dependent proteins also seem to be in those compartment, but in addition to that, they’re in the mitochondria and intermediate filaments as well.
Emma Lundberg (00:22:36):
And then we can also see that these two sets of proteins are regulated by different upstream kinases. So where the cell cycle dependent proteins are actually regulated by kinases known to be involved in the cell cycle. Whereas these non-cell cycle dependent group of proteins are often regulated by kinases known to be involved in metabolism. So it really points towards that these sets of proteins are similar in some ways, maybe indicating that they serve multiple functions or they’re highly flexible and dynamics so that they can adapt to these dynamic transition points in the cells. But they’re also different in terms of the biological functions that are involved in. I think we have a question now, sorry, Jill?
Bede Portz (00:23:18):
This has been really great so far. Thanks. So when you see heterogeneity with respect to protein localization in a population of cells, is there ever a correlation or perhaps anti-correlation spatially, such that you might be able to infer some sort of epigenetic phenomenon by which a daughter cell was more likely to behave similar to the mother? In the field of cells?
Emma Lundberg (00:23:43):
Yes. Yes. I’ll get back that very shortly. I’ll just show you some more images and then I’ll try to answer that. We don’t really know yet, but very good question.
Bede Portz (00:23:52):
Okay. And then…
Emma Lundberg (00:23:54):
Oh, go ahead.
Bede Portz (00:23:55):
I have another question. You just mentioned something about metabolic enzymes, and then enrichment for IDRs. Have you examined whether there is a enrichment of specific subtypes of IDRs? People are beginning to try to classify the biophysical or sequenced properties of these things.
Emma Lundberg (00:24:17):
No, we have not. We actually haven’t looked into that at all. Do you have a specific paper or something you could refer us to?
Bede Portz (00:24:25):
Yeah, we can talk about this offline.
Emma Lundberg (00:24:25):
That would be wonderful. Yeah, yeah, yeah. That would be super interesting. So I’d love to look into that. Just to show you one example of what this might look like. This is an enzyme HMGCS1, it’s involved in cholesterol biosynthesis it’s known function is in the cytosol. So this is a typical example of enzyme heterogeneity, and it resembles the example I showed you earlier, we have differences in abundance, bright cells, less bright cells. We have, sometimes it’s only in the cytosol as expected. Sometimes we see the mainly in the nucleus and sometimes we see it both in the nucleus and the cytosol. And here we we’re thinking about, okay, so before we could map the static images to the cell cycle, here we don’t really have a general time indicator right, in our images, so we can’t map the data to time.
Emma Lundberg (00:25:14):
So we realize that maybe it’s better that we actually do live cell imaging. And we also started to question whether this is biologically relevant. Why have people not observed this before? And maybe it’s just an artifact from cell lines. So let’s start out confirming it with a different technology. So we teamed up with the Chan Zuckerberg Biohub and Manu Leonetti there, and used his CRISPR pipeline to do biallelic GFP tagging of a certain set of enzymes, including HMGCS1. And this was in HEK cells. So then we isolated a single cell where we had confirmed the biallelic tagging, and asked the question, can a single cell basically recapitulate the entire variability here within a couple of cell divisions? So we’re starting to get to your question there.
Emma Lundberg (00:26:01):
And this is the resulting image that we got. So of course we were super excited here because it’s even more pronounced in this image that we have the high and low abundance. And we also have the different locations, nuclear, cytosol, and both. So we could clearly see that a single cell can reproduce all the distinct protein expression phenotypes, if you will. But at this point we started worrying about the sunlight. So maybe this is an artifact for cell lines and it’s not relevant in tissues, and so on. Because we did do some live cell imaging here. I don’t have the data to present you. So if we do live cell imaging, then we can, of course look at also at the inheritance between mother cell and daughter cell.
Emma Lundberg (00:26:39):
So I would say that we’ve looked at maybe 50 proteins type protein so far. Most often we do not see strong signs of inheritance, but sometimes we do. And sometimes we also see asymmetry in the cell division, which is interesting as well. But what is complicated is that we often see basically different temporal modes being mixed. So for example, translocations are often pretty fast and might happen several times, even during a cell cycle. Whereas expression levels might change more slowly sometimes, and then be a little bit inherited over one cell doubling, but then go back to some lower state. And then in that state, you’re still having the translocations being mixed.
Emma Lundberg (00:27:23):
So it’s hard when you start to think about how to do the imaging assays to really measure this because we need tight temporal time lapses, and we also need to cover at least three cell divisions, 72 hours. So it becomes really, really massive. So we haven’t really gotten working on those screens yet. Out of those 50 enzymes that we studied, I would say for only a handful we see some kind of inheritance across populations, but this might also be due to the facts of how we image the cells to start with and how we identify these proteins, so I don’t know, happy to hear what you think. Did this answer your question?
Michael Fenn (00:28:08):
I think so. We have one more question by Anthony.
Emma Lundberg (00:28:09):
Yep.
Jill Bouchard (00:28:11):
Anthony, are you around? You want to unmute yourself?
Anthony Vega (00:28:16):
Yeah. Sorry, let me see my question. So I was curious about the study with DUSP18, and what you had shown, I think it’s surprising that you were able to find 300 additional cell cycle proteins. And in the case of DUSP18, there’s a very strong phenotype, but would you be able to differentiate for those that might have a weak phenotype, whether these proteins have some sort of redundant role in this connection, or whether these are just false positives?
Emma Lundberg (00:28:52):
I don’t think that they’re false positives. I think rather think that we have false negatives. We know that we have some false negatives because of the noise in the FUCCI cell system. And then the limitation that we have in the number of cells that we’ve mapped. I would say that there is definitely false negatives. That’s our main problem. I’m pretty sure that there’s weaker phenotypes and we don’t really know maybe they are only passengers of the cell cycle. We don’t know how they influence the cell cycle yet. That’s absolutely correct. Since we haven’t done any… I think that there’s a lot of the subtle phenotypes actually involve translocations. And I think that’s pretty interesting.
Emma Lundberg (00:29:39):
And one example is an enzyme where actually it mapped just at the border being what we classified cell cycle dependent. It was really close to being non-cell cycle dependent. But when we looked closer into it, the actual translocation that happens, happened in a very distinct time point in the cell cycle. So the translocation to the nucleus was quite cell cycle dependent. And there, maybe that phenotype is strongly connected to cell cycling. Whereas the other phenotype of this, because this was actually an enzyme that is involved in the production of extracellular matrix. So the hard thing is if you do CRISPR knockouts, which phenotype are you measuring? Which function are you measuring? So I think that we’re suffering from the fact that the current CRISPR based screenings are really blunt too, if you want to figure out location specific functions, right?
Anthony Vega (00:30:28):
Right.
Emma Lundberg (00:30:28):
So here we’ve been discussing and I think this is where we fail and this is the examples that we can’t really deduce the function of. So we’ve been discussing ways of doing screens where we, instead of knocking genes out, we, for example, taking examples like this, like we’re looking at, that localize to the cytosol and nucleus, and we screen it with a nuclear exit. We add a nuclear exit tag. We add a nuclear entry tag and basically see how that shifts cell function.
Anthony Vega (00:30:55):
Oh, that’s very cool.
Emma Lundberg (00:30:57):
To force proteins into different compartments. But that’s the only way I can think of, I would love to see tools developed where we could, for example, say, can we knock down HMGCS1 in the nucleus only, and then look at phenotypes. Because that’s what need in order to find these weaker phenotypes as well. So maybe not the answer you were looking for, but…
Anthony Vega (00:31:16):
No, no, it’s very interesting. I think.
Emma Lundberg (00:31:19):
Yeah. So yes, at this point we know that HMGCS1, the variations are real. We see it with antibodies, we see it with GFP tagging. We can see it in different cell lines, but what about tissues? So here we resorted to the Human Protein Atlas Tissue Atlas. And actually, if you look at HMGCS1, it looks like this in glandular cells of the stomach, healthy tissue or acinar cells of the pancreas, again, healthy tissue. So we can see that we see high variability again between cells, they’re seemingly identical. And we also see different localizations in stomach, for example, in pancreas, we only see the cytosolic location. So there’s for most of the enzymes where we see heterogeneity in the cell life, we can also confirm this heterogeneity in metabolic human tissues.
Emma Lundberg (00:32:06):
So at this point we decided to, while we were struggling with the temporally resolve the imaging assays, we decided to take a step back and we’ve spent Christian in the lab, spent I think last year to really assess the heterogeneity in liver and pancreas to metabolic organs. So that is the paper that we’re currently wrapping up and we’ll submit at the end of the summer. I’m not showing any of that data here, but I can just say that there is extensive single cell heterogeneity among cells that we think are identical in these two organs that relate to metabolism, which is super interesting. And it also seemed to be dynamic and we can study it in cell lines at the model system.
Emma Lundberg (00:32:50):
Of course, we see the same heterogeneity in tumors as well. I guess everyone expects that everything is heterogeneous in tumors. So here are some examples for enzymes involved in lipid metabolism, carbohydrate metabolism, amino acid metabolism, where we only see expression in the subset of the cells. So of course this might be important for when you talk about persister cells, or why some cells don’t respond to certain drug treatments. Maybe they’re not just receptive because they’re at that time point in the metabolic state that doesn’t make them sensitive, for example. So I think this is very interesting. And also this, let’s say, the weak phenotypes of cell cycle dependent proteins that are also involved in metabolism that involve translocations. There’s an intricate interconnection there that we haven’t really understood.
Emma Lundberg (00:33:34):
So at this point we have many questions, lots of data, probably need more data, and we have few answers, but it’s exciting. So what’s the cause and consequence of these spatial temporal variations? What’s the level of cellular regulation? Is it RNA or proteins? How do intrinsic and extrinsic factors influence the cell metabolic states? So here we can do some fun image analysis. For example, pull one cell out and see if we can predict the state of that cell given the surrounding cells and things like that. Maybe there’s intrinsic oscillatory systems we don’t know of yet in cells. And how do metabolic and proliferative states interconnect. And here, of course now 40% of the metabolic enzymes show heterogeneity so that’s more than thousand proteins, right? Thousand genes. So this is very hard to study one by one.
Emma Lundberg (00:34:27):
Ideally we want to look at all of these proteins at the same time. And for this reason we are working a lot with building assays for multiplex protein imaging. And as you can see, there’s many people in my lab involved here, and we do work with DNA barcoded technologies like the CODEX, or 4I, for indirect immunofluorescence assays. And of course we cannot do a thousand-plex. We can do maybe a 30-, 40-plex with these methods. But what we aim to do is to use them in a slightly different way than what is the most common application.
Emma Lundberg (00:35:00):
So for example, look for changes in morphology and metabolic states in single cells. So basically multiplex the expression of 30 markers that are expressed in every cell in the tissue. But we’re also building for example, cell type specific panels. As you can see in the image here, this is a 29-plex that we’ve built to visualize all the cell types, known cell types and states of human pancreas. And the idea is to use this panel, to guide us, to identify all these different known cell states, and then to map the rest of the variability on top of this, but how do we do that? Because we can’t still not do a thousand-plex.
Emma Lundberg (00:35:37):
So here we’re working with a technology that is called deep visual protoeomics. So it was developed together with Matthias Mann and Peter Horvath. And they are really the leading authors of this first paper published in Nature Biotech last, two weeks ago. And we call it deep visual protoeomics, and it’s basically, we work with archival tissue samples.
Emma Lundberg (00:35:58):
And in this first paper, we demonstrate that you can use fluorescence imaging or IHC to image a tissue section and then use AI-based image analysis to identify cell types and cluster them based on whatever parameter you want to cluster them on (size of the cell, or shape of the cell, interactors of the cell, and so on). And then we do automatic laser micro dissection, but instead of collecting single cells, because in single cells, we can still only get shallow proteomes, we pool about hundred to 200 contours of cells that are highly similar. And then we can do ultra high sensitivity proteomics and detect between about 4,000 proteins per cell type.
Emma Lundberg (00:36:39):
So basically this is what we are doing now, which is not part of this first paper, is that we’re replacing this initial imaging step with highly multiplexed imaging. So for example, the pancreas panel, we can visualize all cell types known, cell types and states in pancreas. And then we can automatically isolate all those populations and do deep mass spectrometry to basically detect metabolic enzymes and map that onto these cell types. So this has been more challenging than you think because it’s rarely talked about, but most multiplex imaging methods are cyclic and you lose a lot of the proteome content in these processes. Usually people only talk about that they lose the epitope of the antigen that they’re looking at, but basically you lose massive amounts of total protein content. So we’ve been trying to optimize that, to minimize the proteome loss, to allow this type of workflows. And now we got it. We hope that we got it working. Because otherwise I don’t know how we can possibly map, look at this many proteins at the same time. Yes. Any more questions now, or otherwise I’ll move on to the computational parts?
Michael Fenn (00:37:45):
We had another question by Raeline.
Raeline Valbuena (00:37:49):
Hi. Yeah. Thanks for the awesome talk. I was wondering if your team has quantified which proteins have different localization patterns based on the antibody versus when you do GFP tagging. And I’m curious if there are specific subsets of proteins that are more prone to mislocalization and tagging.
Emma Lundberg (00:38:10):
Yes, we did that. There’s a paper by Charlotte Stadler, maybe, what can it be, 10 years back or something. It’s in Nature Methods, where we systematically compared this for 500 proteins. And of course there’s a bias. There’s pros and cons with both. I would say the general problem is that you tend to get things localized to the ER with tagging. And whereas antibodies tend to give unspecific binding to the nucleus because it’s sticky and dense and charged. So that would be the most common artifact. But please check out this paper, there’s some more detailed insights as well. I also know that some people that work with these dual localizing proteins that they have a hard time to capture it also when you tag with GFP, but we can validate it with many different kinds of antibodies. So sometimes I think that it’s easier to interfere with the dual location and maybe the trafficking, if it’s dynamic populations with GFP as well.
Michael Fenn (00:39:11):
And we have one more question here from Karthik, which I think is quite interesting. Karthik?
Karthik Balakrishnan (00:39:19):
Thank you for this great, fascinating talk for the first half of it. So I was just wondering about the effect of activity dependent, translocalization of proteins and enzymes, especially in cells that are really also electrophysiologically active like neurons or cardiomyocytes for that matter.
Emma Lundberg (00:39:38):
Yes. So the question is if we see more of it or…
Karthik Balakrishnan (00:39:43):
How do you account for the spatial and especially temporal heterogeneity when it comes to this translocation problem when you image these cells?
Emma Lundberg (00:39:56):
Yes. So I basically, we don’t work a lot with neurons or cardiomyocytes at this point. We have not captured data sets to answer anything related to that. And I don’t really, sorry, get the question, whether how we account for it, but we try to of course capture all the heterogeneity that we see. So we try to, even if it’s only one cell that shows a different location, we try to capture it. So instead of capturing the main location, we try to at least capture everything. I don’t know if that answers your question.
Karthik Balakrishnan (00:40:24):
Thank you. And following up on that, you mentioned about RNAs, right? I was wondering even in U2OS cells or HeLa cells, how does the cellular, not just cell cycle dependent, but just cell energy dependent and micro environment dependent contribution of RNAs or long non-coding RNAs, pre-mRNAs, in this temporal heterogeneity, have you looked at any of those aspects of these?
Emma Lundberg (00:40:51):
No, we have not yet. Love to do that. I think that there’s a great bioRXiv paper, or at least we’re looking forward to the data set from Kathryn Lilley’s lab in Cambridge where they’ve actually done spatial proteomics to identify RNA binding proteins and the distribution in the cell of both the RNA and the proteins. And I think those data sets will really help us to start to understand that as well–how it influences the distribution of proteins. And I think also with these in situ sequencing data sets that are coming out now, maybe we can start to answer questions like, can we predict protein localization in the cell based on RNA localization, and for some genes we know that we can. Overall, we don’t really know the extent of this. So I think that there’s a lot of, let’s say, emerging data sets that will really help us to not only look at proteins as isolated entities, but rather also in together with RNAs.
Karthik Balakrishnan (00:41:49):
Great. Thank you.
Michael Fenn (00:41:52):
And we had one more question here from Ashish.
Ashish Bihani (00:41:59):
Hi, great talk. I was wondering about the protein that have multiple localizations. There are new techniques for proteomics through Nanopore, where we could get the species, the post translational modification in what combinations they occur. And we can find multiple species of protein and then correlate with that data.
Emma Lundberg (00:42:22):
Yes, that would be great. I’d love to do that. So we haven’t generated anything but bulk proteomics basically bottom up PTM data sets. So I think that type of assays will be super valuable in trying to figure out the proteoform differences and how they contribute to differential localization.
Ashish Bihani (00:42:43):
Thanks.
Karthik Balakrishnan (00:42:45):
Sorry if I may just interrupt again, you mentioned a few mapping efforts in pancreas, right, so I was just wondering about the energy centers within the pancreas, because there are multiple short or very small energy localizations in terms of various pathways with mitochondrial or glycolysis or whatever it is. So various biochemical pathway dependent energy hubs within a cell. So have you seen some kind of distribution for these enzymes or for these protein localizations based on this energy hub attraction or repulsion properties?
Emma Lundberg (00:43:21):
No, we have not. I would be a bit cautious here and almost say that the Allen Institute type of cell modeling that they’re doing would be more suited for it. Maybe here we are limited by the 2D type of imaging that we’re doing. And like I say, I assume that these energy hubs, maybe you would want to resolve it in mitochondria and three dimension also, and it might relate to how they’re distributed in three dimensions in the cell, and that we don’t really have the resolution to map it to. So I would say a cautious no, but I’m not sure our data can resolve it.
Karthik Balakrishnan (00:43:58):
Great. Thank you.
Emma Lundberg (00:43:58):
Yeah. Thank you. Very good question though. Yes, but actually I should say yes, we have seen some things in the pancreas, for example, that cells surrounding the eyelets have more mitochondrial content and they also have a certain metabolic profile compared to other cells with lower mitochondrial content. So there are some more high level correlations there. Yes.
Karthik Balakrishnan (00:44:20):
And also the pancreatic beta cells, the alpha, beta, and delta, if I remember correctly, those are like… Their secrete[inaudible 00:44:29] change upon depletion of beta cells. The alpha then morph into insulin secreting cells and the delta morph into insulin secreting cells and things like that, right. So I’m not sure if it is possible to recapitulate that in a culture-based system, but it would also be super interesting to see how this spatial temporal localization of enzymes and proteins changed when the cell fate changes based on its micro environment.
Emma Lundberg (00:44:55):
Yeah, absolutely. So this is one of the reasons, so what we tried to build this organelle panel for highly multiplexed imaging, for that reason to be able to quantify basically the different organelles, mitochondrial content, Golgi volume, things like that in the cells. And we tried to design this panel to be very generic so that it would be applicable to all cells, but it’s very hard to find markers that are expressed at the same level across all different cell types. But basically we have a panel kind of like the cell paint that we can apply in situ to start to look at things like this. Because I also agree, it’s a correlation that would be very interesting to do, but we’re at the starting point, but at least we have an eight-plex, basically more organelle panel that we can apply insitu now.
Karthik Balakrishnan (00:45:38):
Super cool. Thank you.
Emma Lundberg (00:45:40):
Well thank you for good questions. So basically everything boils down to your ability to classify and quantify spatial distribution patterns. So this is also something that people in my lab, Wei, Casper, and Trang are working a lot with. And these fluorescent images of cells are of course, a lot easier to work with than the tissues as we just discussed. But yet it’s not trivial. We have a great class imbalance. Some patterns are common: cytosol. Some are rare: Microtubal lens, for example. We see great morphological differences. So we work with many different cell lines. So we need machine learning models that are very generalizable. This is the same protein in actin filaments, in three different cell lines, as you see here. And then we have the single cell variations for 19% of all proteins. And we know that half of all proteins localize to two or more compartments in the cells.
Emma Lundberg (00:46:27):
So we really need models that are capable of a multi-label classification. So a couple of years back, we worked with many different strategies to do this. And let’s say a couple of years back, five years back, the best published model could classify the 10 main organelles in one cell line. And that’s very far from what we need. So here we decided to do a citizen science approach where we basically tapped into a science fiction computer game to crowdsource image annotations. And that was a very successful approach. And we had 300,000 gamers giving us 32 million image classifications within a year in the game. So that was one approach. And I’m not going to talk about it today though.
Emma Lundberg (00:47:07):
And then after that we’ve done two Kaggle challenges to develop machine learning models. So here we tend to, well, I think it’s fun to work with crowdsourcing. That’s one of the reasons, but it’s also good being in public database. We of course track our users. And we know that a lot of our users never even look at the images and even very, very few of our users actually compute with images or work with images as raw data. So we wanted to encourage this through these Kaggle competitions. So I’ll go a bit in more detail into them.
Emma Lundberg (00:47:35):
So the first Kaggle competition was simple, basically designed to have Kagglers beat the gamers. So we assembled the benchmark data set that has not been public on the Protein Atlas before. And we have our labels, like I showed you in the beginning of my talk and we have people develop collusion, neural network, deep neural networks to classify the labels in these images. And it was at that point, I think one of the first challenges, or maybe the first with multi-label classification problem. So this challenge turned out very well. The number one Kaggler bingo actually won it.
Emma Lundberg (00:48:08):
What I do want to point out is that besides that model being able to accurately predict labels for cells, we can actually use it as a feature extractor. So if we take the penultimate layer, and extract the features there from this model, and then just reduce the dimensions into this UMAP plot, this is all images in the Protein Atlas Cell Atlas. So this is about 80,000 images corresponding to 13,000 proteins from a mixture of different cell lines. So it’s very generalizable. We can see that it, and the labels here, the colors are our manual annotations. So we can see that it distinguishes all of these different labels, which it was trained to do. So that’s great.
Emma Lundberg (00:48:45):
But then there’s two other things here that we were super excited about. One is all these grey dots that are the multilocalizing proteins. If we look closer at those. F for example, is a protein in the nucleus and the nucleoli, we can see that it ends up right in between the clusters of nucleus and nucleoli pure patterns. Same thing for D here, a protein that is in three locations, it’s in the nucleus, the cytosol, and the plasma membrane, and it’s right in between. So this model seems to, let’s say capture or embed the spatial information for multilocalizing proteins in a very good way, too, which is good for modeling purposes.
Emma Lundberg (00:49:18):
The other important thing here is to think of our images. It’s basically where we’re asking people to train models, to recognize patterns. And a pattern of a punctate in the nucleus and a punctate and the cytosol, they’re pattern-wise, they’re very, very similar, right? So you could expect them to be all punctates clustered together for example, but that’s not the case. So here we have a different group of cytosolic punctates for example, whereas the nuclear punctates are over here. And if we think of it, this entire purple, red, blue cloud, are nuclear structures, and these are the cytosolic structures. So really this plot, even in these reduced dimensions, separates proteins on their pattern and location in relation to the reference landmarks that we had in the images, the microtubals and the ER. So basically what these features are describing are not just the pattern per se, but also the distribution in the cell, which makes it perfect for modeling. So I’ll get back to that, but that’s why we were super, super excited when we got these plots.
Emma Lundberg (00:50:22):
But before I get into the modeling, I want to talk a bit about the Kaggle challenge number two. So here we realized that of course, 19% of the proteins show single cell heterogeneity. If we classify patterns based on images, we’re basically producing bulk data from our nice single cell data. And we don’t want to do that. And we really want to explore spatial heterogeneity as well. So we need models that can assign labels to single cells in images. But the problem here is that we don’t want to label a million of single cells and we don’t have the ground truth data sets. So here we instead designed a very fun challenge, a very difficult challenge, because here we could see that mainly expert Kagglers were participating, but it was also very, very successful output. So we ask them to develop models under weakly supervised settings. So mainly given image level labels, train a model that can predict cell level labels accurately.
Emma Lundberg (00:51:15):
And as the test data set that was hidden to rank them. We had selected images where we knew that there was heterogeneity among different cells. So it really had to work. And the key here I would say is to develop models that ensure that the attention is spread to all cells in an image, because most often your machine learning models will focus on the most discriminative features. And based on the first competition, and inspecting the models of the first competition, we realized that the most discriminative features of images most often only includes a couple of cells and not all of the cells. So this was really the key where we expected to see innovations, and we saw a lot of different approaches. This paper will soon be out as well. And you can also read in the Kaggle discussion forums if you’re interested. Very successful challenge too. And the winning, same person won this competition. And it was actually a transformer model this time. And sorry, now I have to move the video.
Emma Lundberg (00:52:11):
But, before I move further on, I just want to show, so when inspecting these models, we can use, for example, class activation mapping to visualize with heat maps, regions of the image that are important for making the prediction and the classification. So I just want to show how beautiful this is also to make biologists trust these types of models. These are all cells, single cells where we see mixture of two patterns like here, mitochondria and nucleus. And we can see that when the model is classifying the mitochondria, it’s looking in the correct regions in the cell. And when it’s classifying the nucleus, it’s looking in the correct region. And even overlapping patterns like a nuclear body, one nuclear body here, and the nucleoplast. We can see that the model actually focuses on the right regions when making these decisions. So I think that this single cell model is very, very good at focusing its attention in the right places. So we’re quite happy with this model.
Emma Lundberg (00:53:03):
And again, if we use this model to extract features, it becomes very big plots because imagine the number of cells that we have in total in the Human Protein Atlas Cell Atlas, but the separation looks very similar. We still get this nuclear cloud, we get this cytosolic cloud, we can distinguish all the different locations. So, it’s basically similar UMAP distribution, but slightly better separation of the classes. But if we look, sorry, I’m removing some of the clusters here, just to make it easier to look at. What is interesting of course, that we can look at certain classes of locations, where we see a large spread in this UMAP pattern. And for example, if we look at punctate or vesicular patterns, this is where we see a great distribution.
Emma Lundberg (00:53:45):
And I think this is, we haven’t really focused a lot on resolving different phase separated structures or punctate in the cell. And this is basically what we’re seeing here. So maybe we’re now at the point that we can start to resolve this. And we’re also quite excited to use this machine learning model to study the dynamic spatial regulation of the cellular proteome and re-identify proteins that move to different compartments of the cells, maybe in relation to cell shape as well. But this is very much work in progress.
Emma Lundberg (00:54:16):
So final couple of minutes here before I’ll end, it is about our work to try to model cell shape and structure. And if we think of cells, they are really hierarchical systems and we measure protein distribution here at the organelle level. But really if we want to understand function, we also want to understand how proteins form protein complexes and assemble into pathways and everything here. We want to model all of this. So we were thinking that what if we try to integrate image data with protein-protein interaction data systematically, and then we can actually capture the hierarchy of the architecture of cells.
Emma Lundberg (00:54:51):
So this has been a collaboration with Professor Trey Ideker at UCSD, also published at the end of last year. And here we used the HEK cell model, only 700 proteins for this proof of concept work. It’s a pilot study to demonstrate the computational concept basically. And we used protein-protein interaction data from the BioPlex database and images just from our database, and embedded these different data modes using our machine learning model or a node2vec model, and basically use this feature descriptors of the distribution of the cell. So either the distribution in relation to cellular landmarks, or the distribution in relation to the immediate neighboring proteins of the cell. We can measure pairwise protein distances. We can calibrate it to physical distances in nanometer, which is cool. And then we can build this type of multi-scale hierarchy of the cell. And you can see the cell structure ontology from gene ontology, for example, but a purely data driven such ontology.
Emma Lundberg (00:55:53):
So just the first sanity check to the left here, of course, similar pairs in each data set are enriched for similarities in the other data sets, so that makes sense. And I often get questions about the calibration to scale. So we realize that we can do this very naively. So basically the size of a system in literature often correlates pretty well to the number of known protein components in that system. So basically we generated a calibration function and trained our supervised machine learning model to estimate the physical proximity from the IF and AP-MS embeddings.
Emma Lundberg (00:56:28):
And then we, in the typical Trey Ideker manner, build this hierarchical model of the cells, starting with the most stringent tight complexes at the bottom of this hierarchy, and then gradually relaxed the parameters and find larger and larger systems. We could also verify the predicted versus actual diameter for systems that we did not use for the calibration. So it holds up pretty well.
Emma Lundberg (00:56:50):
And in the end, we generated this map that we call MuSIC, the Multi-Scale Integrated Cell, and we could reveal out of these 700 proteins, 69 different cellular systems. And half of these systems were not documented before in, for example, gene ontology or literature, and that’s all the purple systems here. And then we spent most of the time here to do validation experiments and reveal process. For example, reveal cross-talk between cytoplasmic and mitochondrial ribosomes and identify roles for poorly characterized proteins.
Emma Lundberg (00:57:20):
So I should point out that some of the keys to this type of modeling is of course, that MuSIC captures the pleiotropic effects on multi-localizing proteins. So one protein can be in several nodes here, several systems, and gives protein-protein interactions a spatial dimension. And we believe that this is a very interesting way of modeling cell activity or cell systems in order to get insights about potential mechanisms, for example, and we’re working on a whole proteome model and later we plan to add dynamics and other things to this. So this has been a very fun project.
Emma Lundberg (00:57:56):
And I just want to point out that if we try to build the same map with image data alone, we find three systems with PPI data alone. We find 11 systems, but with both, we really get this synergistic effect of the 69 different systems. But, few more minutes. MuSIC modeling is great, but again, from my perspective, we’re basically making our single cell data into bulk data and we’re losing information about shape. So at the same time, we’re also trying to see if we can map all the protein expression from our images into common shape space to start to explore similarities in protein expression.
Emma Lundberg (00:58:32):
So here we have very similarly to the work that the Allen Cell Institute has done, which are collaborators on this project, where we’ve built this shape mode pipeline, where we basically assume that the shape in two dimensions for our flat cells is relevant. And we can capture the outlines of the cell through a Fourier transformation. And then with a PCA, the top 12 PCs for U2OS for example, explains 95% of the shape variance. And we can then look at the variance for these different shape modes.
Emma Lundberg (00:59:02):
This was supposed to be a moving image. Sorry, it seems to be static, but we can identify these different shape modes. And for example, some of them might relate to the elongation of the cell and some might relate to the size of the cell or the size of the nucleus in relation to the cell. So we have these different modes. We can now go back to our image collection and map the protein intensity onto these different shape modes. And we can start to ask questions, for example, in D here, it’s just a sanitary check. So we ask, show us all the proteins that do not change their abundance with orientation or any of the shape modes here besides the nuclear relative size. And then we found a subset of nuclear proteins that are involved in the cell cycle, which makes sense. So in similar ways, maybe we can look for proteins that are not involved with size changes of cells, but are involved in elongation changes, and might be involved in migration, for example. So it’s just a different way of aggregating the data and finding common patterns and guilt by association mapping of proteins, basically.
Emma Lundberg (01:00:03):
And of course, long term, my lab, especially here at Stanford, will work towards spatial integrated cells, where we basically generate a super multiplexed image from the images that we have and a digital cell model that we can use to predict spatial perturbations in the cell that we can later verify experimentally. So with that, I’d just like to end and thank my group both here at Stanford and in Sweden, they’re all amazing. The Human Protein Atlas, funders, collaborators, and thanks to all of you for listening.
Michael Fenn (01:00:37):
Thank you so much, Emma. That was fantastic.
Jill Bouchard (01:00:40):
Super engaging.
Michael Fenn (01:00:42):
If you have time, I think we had one or two additional questions. Kamran, I don’t know if you’re still here with us. I know you had a question.
Kamran Rizzolo (01:00:51):
I am. Thank you, Emma. This is an amazing talk. And I do recall a lot of the work that you’ve done, I’ve been a huge fan. I’m very curious. So I, myself, I come more or less from the yeast world, working with yeast and synthetic genetic arrays and genetic interactions. And, we really found that to be a very powerful tool in that model organism to try to understand beyond physical interactions. So I know that it’s obviously not as straightforward in mammalian systems, but there are, I think in the past maybe five years or so, there are some pretty relevant studies trying to build the genetic interaction map. So I was just curious if you’ve considered genetic interactions and if you have, what have you found to be the most challenging parts?
Emma Lundberg (01:01:49):
Yeah, that’s a very good question. And maybe we should look into it again. So we have not considered doing that type of assays ourselves because there are other big efforts doing it, basically. We do encourage them to use our cell lines though. We did use some of that data from DepMap and other projects for the validation of the systems and the MuSIC modeling. And it was very, very powerful there. So I think that’s where we found the most use of it, but maybe we can think of adding it early on in some way in the modeling as well. So, it’s definitely one of the most, let’s say orthogonal valuable data sets that we’ve found so far.
Kamran Rizzolo (01:02:30):
Okay, cool. Thank you.
Michael Fenn (01:02:34):
Great. And I think one last question here from Anthony, and then I think we’ll be able to follow up with additional questions via email, correct?
Emma Lundberg (01:02:45):
Yes, absolutely. Great.
Anthony Vega (01:02:48):
Yeah. So I think the transformer model that you introduced is super exciting. That technology in general is very cool. And I’m curious to see how it’s basically growing in the image field, but now that you’ve shown success with cellular populations, are there also plans, or have you tried to generalize this to tissue, just because that’s the world I live in at this moment and that’s desperately what we need.
Emma Lundberg (01:03:18):
Yeah, I know, but so far, we have not, I think the shape mode pipeline is interesting from that perspective because it’s really agnostic to imaging modality. And of course in tissue cells will have a 3D shape, but I think maybe shape is where we’ll see the most integrations first, where you can go between different, let’s say organoids and tissue context and cells and tissue. Besides that, I have not seen any work where people try to classify patterns in cells and tissues. Maybe a little bit when it comes again in situ sequencing those type of methods, but so far, we’d love to do that with this organelle cell paint assay, but it’s, as you know, it’s very challenging.
Anthony Vega (01:03:59):
Sure, sure. To get it at that resolution.
Emma Lundberg (01:04:01):
There are tissue. That’s why I think both pancreas and liver we’ve chosen to work with those, are pretty good because the cells are pretty uniform in size. It’s not like we have neurons and it’s one of the easier tissues to work with there. So we’re slowly getting there, but yeah.
Anthony Vega (01:04:17):
Yeah. Because that’s really been, looking at tumors for instance, what’s the difference in morphology and what are the features in this tissue that actually correlate to different genetics or whatever, right. Having transformers to give these more interpretable features would be super strong. Anyways, great talk.
Emma Lundberg (01:04:39):
Yeah. And I think we’re seeing more and more highly multiplexed data sets being generated from all over the world. So I think also the public BioImage Archive, the one being run out of EBI ,and maybe there will be one in the US as well that will really help to push that field forward as well.
Anthony Vega (01:04:56):
Yeah.
Michael Fenn (01:04:58):
Great. Okay. Thank you everyone for joining us today. Emma, we really appreciate this. This was fantastic talk. Again, if you have questions that you’d like to follow up with Emma via email, please send them our way, and Jill and I will help facilitate getting those questions to Emma. Okay. And again, we really appreciate it. Thank you everyone.
Jill Bouchard (01:05:17):
Thanks everyone for coming. And thanks again, Emma, and join us next week for Bede. Thanks, Emma. Have a good one, everyone.
EXTENDED Q&A
Question from Shruti Jha: Is Enolase1 gfp tagged or IFed?
Emma’s Response: Enolase1 was visualized with an antibody in the image that I showed…
Question from Alan Underhill: Amazing work. Curious about how the approaches deal with low abundance proteins?
Emma’s Response: Most imaging approaches deal ok with low abundant proteins given that the affinity of the antibody is high enough. Also a high local concentration of a protein helps a lot even if it is low abundant (for example a centrosomal protein).
Join the conversation