Starting Data Analysis and Obtaining Fluorometry Data
Starting Data Analysis and Obtaining Fluorometry Data
This past week I decided to finally sift through the crazy DLS data and see if I could shed some light onto what in the world is happening. I've included links to the google CoLab sessions I used to code everything up and I highly recommend taking a look at those if you're interested. However, I've also included some of the primary charts/graphs right here on this post as well. Let's get started!
Starting DLS Data Analysis
My first step in data analysis was to assemble all of the raw excel sheets I exported from the DLS instrument into one coherent data-analysis-ready file. I deleted some redundant data and/or data that didn't really make sense or add any value to my current circumstances (like time data and viscosity for example). I mainly focused on intensity, amplitude, PDI (polydispersity index), and a whole host of other things that go into calculating the hydrodynamic radius. I stored the whole file as a .csv file.
I loaded the .csv file into the google colab and then got to work using pandas (a python library for data work-up). I pretty much just had to fill in empty cells with something that wasn't empty, but also not 0 (which would change the statistics). I ultimately filled those cells with the mean of each column, so it was something the code could work with (floating point numbers) while not actually changing the shape of the data.
My first step was to perform what's called Principle Component Analysis (PCA) on my data. PCA is super useful when your data has a ton of features (different columns) like mine does. Each feature (PDI, radius, amplitude, etc.) can be thought of as a dimension so when I have 35 features per data sample, things can get complicated fast. PCA is a way to take higher dimensional data and pack it into a 2D or 3D space based on what aspects of the data result in the most variance (ultimately I still don't understand the math behind this).
Additionally, PCA can also provide some information (but not comprehensively) on how the data clusters together. This is really what I want because I want to see how buffer with mercury ions is different than, say, buffer with both mercury and DNA in solution. Below is the first PCA I did on the first dataset I collected:
Figure 1: PCA of DLS dataset including four different categories
I know it looks a little goofy here and if you want to see the whole graph and actually be able to rotate it in 3D please take a look at the colab link!
The yellow clusters correspond to buffer with DNA and mercury, so those are hopefully the nanoparticles I'm looking for. Purple corresponds to buffer with just mercury, orange corresponds to just buffer with DNA, and blue corresponds to just the background buffer.
As you can see, it's not perfect, but there is definitely a separation, and it's not too bad either. This is exactly what I want. I want to see DNA and mercury behave differently than just mercury or just DNA.
When I did this PCA, I just used HEPES as the buffer. When I tested it with PIPES, I got different variance and clustering and it actually wasn't as separated. It looked like with the PIPES, a lot of the signals seemed suspiciously close to that of just buffer and mercury.
Figure 2: PCA of DLS dataset including both HEPES and PIPES data
You can see that the yellow data points are not as separated from the other colors as they were in Figure 1. A lot of them occupy the same territory as the buffer and mercury data points. Again, please see the full figure in this colab page.
Additionally, in the colab notebook linked above, I included a UMAP graph and an HDBSCAN graph of the PCA data. UMAP is pretty much a way of discovering more complex patterns in the data than PCA can. PCA only works for linear relationships and even-so it's a good overall indicator of what's going on. However, knowing how my DLS data behaves, it is definitely not linear all the time. UMAP is good at finding nonlinear patterns and it preserves both local and global data structure by using density-based clustering. You can see all of that form the google colab notebooks.
The next step is to do even more data analysis and then hopefully move on to developing a categorical classifier machine learning model to characterize each of the four types of signals! That way, when I feed it new data straight from the DLS instrument, I can have a good idea of if a signal if mostly DNA/mercury nanoparticles, or more likely just buffer/mercury interactions.
Moving on to Fluorometry
This week I also looked at analyzing the DNA/mercury solution via a spectrofluorometer. This instrument (in the way I used it) basically operates by exciting a given fluorophore (a fluorescent molecule) at its excitation wavelength and then scanning across a given range of wavelengths to analyze where the fluorophore emits light. A fluorophore always emits light at a higher wavelength than it's excited at.
So what exactly did I do? I used a fluorophore called Thiazole Orange (TO). TO works by intercalating between the base pairs of DNA. When it does so, it emits light around 535nm. However, when it is not able to bind to DNA and is just floating around in solution, it doesn't really give off much light. This is because when it is excited at 510nm, it usually gives off that energy via a rotational bond within the molecule; the energy is given off as rotational energy. However, when it is bound between stacked DNA bases, this bond is not free to rotate and therefore the energy is "forced" to escape as light, which the machine picks up on.
So I made samples with HEPES buffer, mercury, and my single stranded DNA and tested several controls in the instrument (just buffer with TO, buffer with mercury and TO, buffer with single stranded DNA and TO). Then, I tested buffer with DNA, mercury and TO to test whether the mercury was inducing the single stranded DNA to form particles where they are close enough for TO to intercalate. Long story short, the data was wonderful and pretty much confirmed this!
Figure 3: TO intensity vs. wavelength in sample with single stranded DNA
From this graph we can see that even the single stranded DNA is resulting in some emittance. This could just be because some of the single stranded DNA is interacting with itself. this is somewhat expected. This graph was kind of a close up on 520-560nm wavelengths. However, when we look at the next graph for DNA and mercury from a range of 450-700nm we get this:
Figure 4: TO intensity vs. wavelength in sample with single stranded DNA and mercury.
From this graph we can definitely see an immense peak right around 530-535nm, which is about the literature value of TO emittance.
Therefore, mercury is definitely complexing with the DNA and bringing them so close together that the Thiazole Orange is able to intercalate between them!
So, that's where I'm at now. Soon I may start preparing for Scanning Electron Microscopy analysis!
Thank you for reading and see you next time!
Where thoughts orbit stars and dreams power suns...
Comments
Post a Comment