Analyzing Star Data with Power BI and Python
Observing the stars with a telescope is a humbling and relaxing activity, but I'm no astronomer, or even an amateur astronomer. But I wanted to get a better understanding of what I was looking at in the night sky. I wanted to dig into some of the data that's publicly available to us - data like size, luminosity, temperature, and distance of the stars. I ended up finding a great website which has a lot of the juicy data that I wanted on the exo-planet archives here: https://exoplanetarchive.ipac.caltech.edu/
I found this website through a YouTube video by Anton Petrov here.He does a great job at explaining what the data is and how you can do your own analysis of the data. I really recommend you watch his video!
If you are already an amateur astronomer, or even a professional, you can use Power BI to host all of your graphical data. It's great for formal presentations or just personal projects. The nice thing is, if you already have R or Python scripts, (which a lot of astronomers do, I hear) you can use those same scripts directly inside the software! If you are not familiar with Power BI, it is a powerful business intelligence tool used to quickly find trends in your data. The graphs are all interactive so you can essentially 'find' hidden patterns and relationships. It's also a free tool to use, assuming you have a subscription to the Microsoft Suite. You can find more information on Power BI at PowerBI.com.
This blog will go over how you can create Python graphs within Power BI. If you are already familiar with Power BI, but don't know anything about Python, this blog may help you as well!
The website I mentioned above contains data about stars and their exoplanets. Some of the fields you can grab are the distance of the star, brightness, mass, radius and temperature, just to name a few. The data I pulled only contained stars with confirmed exoplanets only. So, there are also fields specific to the planet, like planet density, orbital period, radius, etc.. You can easily select the columns you want for your own project via the "Select Columns" button on the top left. To download the data, simply use the "Download Data" button.
After importing my data set from the archives, I created some graphs from the standard Power BI set (note, this data set is only comprised of stars with confirmed exoplanets - hence it does not contain any stars without exoplanets):
With some Googling help, I tried my best to recreate the same charts with Python using the Seaborn library (with some variation):
Note I'm using Python 2.7 for my visuals. If you want to read more about getting started with Python and Power BI, please read my other article here: Using Python To Get Your Data. If you are interested in creating Power Query functions using Python, check out this article: Creating a Function with Python.
Using a bit of DAX, I created some calculated columns to get the log 10 for several fields like temperature and mass. I wanted to do this because the log 10 puts all data points on a relative axis. Trends that may be hidden due to a large range of data points may be discovered using this method. I also categorized the stars mass and radius into "0-1", "1-2", "2-5", and "5+ " buckets:
Stellar Radius Range = var r = Data[st_rad] return if(r<1,"0 - 1 times",if(r<2,"1 - 2 times",if(r<5,"2 - 5 times","5 + times")))
The relationship between temperature and mass was an expected positive correlation, however, I was surprised to find that radius does not always correlate to temperature. For example, stars which have a radius of 5+ times our sun don't typically get hotter than 5000 kelvin (note that's radius not mass).
As of this writing, Python graphs are not interactive like the standard Power BI graphs, but all filters work as together. So if you create a filter in Power BI, selecting a category in the filter will also filter any Python or R graphs you have on your dashboard. Similar to the above screenshot, you can highlight and expand to fill the screen with any selected graph. For example, here is an expanded view of the Python boxplot:
We can see that stars 1-2 times the mass of our sun account for the majority of our data points. This category also seems to have the widest variance in brightness (G-band mag).
One advantage of using the Python scatterplot, is that it gives you a regression line to see the trend better. The standard Power BI also has a trend line, but it doesn't have the cool translucent bands around it:
To accomplish this graph, I'm using the simple Python code here:
import matplotlib.pyplot as plt import seaborn as sns sns.set_style("darkgrid") sns.regplot(x = dataset["Log_st_teff"], y= dataset["Log_st_mass"], data=dataset) plt.xlabel("Log Temp (K)") plt.ylabel("Log Stellar Mass") plt.show()
Note, you need to add the "plt.show()" in order for Power BI to render the graph. Power BI does a bit of housecleaning for us automatically when we initiate a Python visual.
You'll notice Power BI removes duplicates. If you want to keep duplicates, you can work around this by adding an index to your data set first in the Power Query editor. Then include that index in your Python visual. For more information on how to add an index, see here: https://msdn.microsoft.com/en-us/query-bi/m/table-addindexcolumn
We can add a category to this scatterplot, but it creates a bit of chaos with the regression lines as it lays out 4 of them (1 for each category). Turning them off gives us a nice, clean version which mimics the standard Power BI scatterplot:
Comparing Power BI and Python Visuals
For me, the Python visuals are great for custom charts and graphs. The amount of detail you can add far exceeds the customization of Power BI's visuals. In this article, I only explored Seaborn visuals, but I know there are many more options available within the Python library. The down side of using Python visuals, (at least for me) is that they don't update / refresh as fast as Power BI's native visuals. On average, it took about 5 seconds for my Python visuals to refresh, whereas the Power BI visuals react almost instantly when a filter is applied. Another thing to note is, as of this writing Python visuals are not supported on the web version of Power BI - they only work on the desktop version. Hopefully the developers over at Power BI update this soon.
As for the Power BI visuals, the interactive elements can't be beat. It makes finding trends so easy. And it's actually fun playing around with it to find new and unexpected patterns. If anything, Power BI can serve as a hub to host all of your Python graphs in one central location.
If you want to see the dashboard I made for this article for yourself, please feel free to check it out on my dashboards page, as well as some others.