Scientists across many therapeutic areas are striving to solve complex biological problems by measuring multiple analytes, thinking that together these data will power deeper discovery. However, analyzing these data independently is less effective and more time-consuming than analyzing them together. Cytobank’s new DROP feature allows scientists to apply machine learning algorithms to many data types, including these datasets, and to develop integrated insights quickly.
With bulk data, unsupervised machine learning algorithms on Cytobank can help you identify clinically relevant groups of samples by combining information from all of your markers at the same time. We’ll illustrate that here with Nanostring® 3D Biology™ technology, which simultaneously analyzes up to 800 SNVs, RNA, and proteins and phospho-proteins from the same sample. In this example, the assays used profile 104 SNV and small InDels, 192 RNA, and 28 total and phospho-proteins in 144 samples.
Applying viSNE to the protein and RNA markers shows us that there are groups of samples that have similar expression across the protein and RNA markers. Coloring by expression of each marker illustrates why the samples are segregated this way. For example, we see that only one of the large “islands” contains samples that express RAF1 mRNA, whereas another contains samples that express MAPK1 mRNA (Figure 1, top). We can see similar patterns with protein expression, where large islands on the viSNE map contain samples with differential expression of p53 and phospho-ERK. We also see differential expression of markers within the samples in the large islands, for example phospho-GSK is expressed only in the samples that separate to the inside of each island, and MAPK3 mRNA is expressed only in some of the samples in one of the islands (Figure 1, bottom). The SNV data revealed that these samples are a mix of homozygous WT for BRAF, and heterozygous and homozygous for BRAF V600E. Overlaying this information on the viSNE map supports the conclusion that these mutations drive the RNA and protein expression differences that segregate these samples.
To define groups of the samples that we could further characterize and analyze, we applied the hierarchical clustering method SPADE to the viSNE map coordinates (Figure 2). The resulting groups can then be overlaid on the viSNE map and the marker expression can be viewed on the SPADE tree to visualize how these groups capture the patterns of expression we explored above.
After defining groups of samples based on combined 3D Biology marker expression, we used tools in Cytobank’s platform to see how these groups correlate with known characteristics of the samples. The samples came from 3 different melanoma cell lines that were treated with vemurafenib, DMSO or Calyculin A for varying lengths of time.
Overlaying these variables on the viSNE map and inspecting Cytobank’s summary tables shows us that these variables correlate with the groups we defined based on marker expression (Figure 3). A quick verification of marker expression patterns with heatmaps and histogram overlays stratified by these known characteristics confirms that the expression signatures that drive the unsupervised sample groups are the same (Figure 4).
This example illustrates the power of applying unsupervised machine learning algorithms on the Cytobank platform to NanoString’s 3D Biology data in order to efficiently extract relevant information and understand the similarities and differences between samples based on combined expression of hundreds of multi-analyte markers.
Want to try it for yourself?
- DROP is coming soon to Cytobank Enterprise! Contact us at info@cytobank.org if you want to enroll in our Beta test.
- Get ahead of the game: start generating some data with Nanostring’s powerful 3D Biology Assays
- Create a new account to access a free 30 day trial on Cytobank Premium, and test out our machine learning algorithms with demo datasets or your own cytometry data