# Lecture 26 (5/25/2022)¶

Announcements

• Today’s lab + OH on zoom as well! See announcement from Purva

• This week’s lab due Friday 5/27!

• Plan for upcoming lectures

• Friday 5/27: last “official course content” lecture

• Monday 5/30: no class (holiday)

• Wednesday 6/1: class in ERC 117 for final project presentations

• Sign up for a slot here

• Friday 6/3: special topic: APIs

Last time we covered:

• Dimensionality reduction: intro to Principal Components Analysis

Today’s agenda:

• Interpreting PCA results (cont’d from last time) + Evaluating PCA

• These are kind of interchangeable so we’ll mostly be presenting interpretation and evaluation measures together

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


# Interpreting Principal Components Analysis¶

## Review¶

On Monday, we walked through the basics of how PCA works and how to implement it with the sklearn PCA class.

As a reminder, we’re looking for lines like the blue and red ones below, which form the principal components of our data.

(Source)

These lines have two key properties:

1. They represent the axes on which our data has the highest variance (the first principal component is the highest, the second is the second highest, …)

2. They are orthogonal to each other, meaning they are independent as predictors of our data

Because of these properties, when we project our data onto the principal components, we can often describe most of the variance in our high-dimensional data with only a few principal component axes. In other words, they provide a high fidelity summary of what our data is doing without needing all the original dimensions.

(Source)

For this reason, PCA is one of the most popular dimensionality reduction techniques in modern data science.

Today, we’re going to talk about how to interpret and evaluate our PCA resuls.

## Example: low-dimensional representation of pokemon attributes¶

Today, we’ll use the pokemon dataset, which we’ve discussed in previous lectures and assignments, to create a low-dimensional encoding of pokemon attributes.

In the data below, take a look at the columns indicating each pokemon’s effectiveness (HP, Attack, Defense, Sp. Atk, Sp. Def, Speed); we need a very high-dimensional representation of each pokemon if we use all these columns to cluster or classify them!

# Read in the data
pokemon

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False
... ... ... ... ... ... ... ... ... ... ... ... ... ...
795 719 Diancie Rock Fairy 600 50 100 150 100 150 50 6 True
796 719 DiancieMega Diancie Rock Fairy 700 50 160 110 160 110 110 6 True
797 720 HoopaHoopa Confined Psychic Ghost 600 80 110 60 150 130 70 6 True
798 720 HoopaHoopa Unbound Psychic Dark 680 80 160 60 170 130 80 6 True
799 721 Volcanion Fire Water 600 80 110 120 130 90 70 6 True

800 rows × 13 columns

Let’s find the principal components of these pokemon behavior attributes to create a lower-dimensional representation:

from sklearn.decomposition import PCA

# Use these columns as the basis for PCA
cols = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']

# Fit the PCA class to our data
pca = PCA(random_state = 1).fit(pokemon.loc[:, cols])
pca

PCA(random_state=1)


How effective is PCA here?

• Look at the explained variance ratio to see how many principal components we need to account for a large amount of the variance in our data

### Proportion of explained variance: how well do our principal components summarize the data?¶

A good first step after running PCA is to see how well each successive component accounts for variance in our original data. Remember, the principal components are identified in order of how much of the variance in the data they can explain.

A good PCA result will show that you can account for a large percentage of the variance in the underlying data with much fewer dimensions.

sns.pointplot(x = np.arange(1, pca.n_components_ + 1), y = pca.explained_variance_ratio_)
plt.xlabel("Principal component")
plt.ylabel("Proportion of additional variance explained")
plt.show()


The plot above suggests that we can explain 70-80% of variance in our (6-dimensional) data with just 2-3 dimensions. In other words, a lof the general pattern of our data is captured by a couple key axes.

We can confirm this by adding up the actual values from the graph above:

pca.explained_variance_ratio_[0] + pca.explained_variance_ratio_[1]

0.6484827653819596


So what are these axes?

Remember that a “principal component” is just a line through our data, expressed via weights on each of the existing dimensions that are a lot like regression coefficients. Like regression coefficients, these weights tell us about the pattern in our original variables that each principal component is capturing.

### Principal component weights: what are the key “axes” along which our data varies?¶

Below, we’ll plot the weights applied to each of our original dimensions to create the principal components. Ideally, these should give us some indication of the smaller number of “axes” that our data varies along.

sns.barplot(x = pca.components_[0], y = cols)
plt.title("Component 1")
plt.show()

sns.barplot(x = pca.components_[1], y = cols)
plt.title("Component 2")
plt.show()

sns.barplot(x = pca.components_[2], y = cols)
plt.title("Component 3")
plt.show()


How do we interpret these plots? What does each one mean?

Ideally, should have some interpretation in terms of more abstract patterns in the data.

Note: PCA can sometimes present challenges in interpreting the principal components. In some cases, they may index really clear aspects of the data. In other cases, they may be more ambiguous. For this reason, it’s best to have a grasp of the domain of the data when interpreting PCA. Like unsupervised clustering we talked about last week, it requires more subjective interpretation than our supervised methods.

### Transforming data onto principal components: how well do they summarize our data?¶

Since our principal components are new orthogonal lines drawn through our data, we can plot the value that each of our original data points takes on when projected onto these lines.

If the first 2-3 principal components describe our data well, we should see it line up in a fairly orderly way along these axes.

Our first step is to transform our original data into a position on each of the principal components that our PCA identified:

pokemon_transform = pca.transform(X = pokemon.loc[:, cols])
pokemon_transform = pd.DataFrame(pokemon_transform, columns = ['Component ' + str(i) for i in np.arange(1, pca.n_components_ + 1)])

pokemon_transform

Component 1 Component 2 Component 3 Component 4 Component 5 Component 6
0 -45.860728 -5.384432 18.925550 -0.988558 -12.398527 10.548700
1 -11.152937 -5.805620 20.848717 0.269407 -5.800877 7.175004
2 36.946009 -5.236130 21.520463 1.531646 2.445413 3.159865
3 80.128413 18.995343 29.313909 -11.228419 -8.684840 0.214346
4 -50.385905 -21.792797 3.921880 -12.581893 -7.357519 3.041302
... ... ... ... ... ... ...
795 72.196952 67.431919 44.284620 -34.857821 -10.971975 26.977909
796 120.944879 -20.303238 -8.390285 -38.395104 -44.341807 21.930314
797 75.999885 -27.270786 37.017466 19.106076 -28.247968 39.369910
798 114.096713 -36.870567 6.750875 17.902908 -45.622767 54.767251
799 72.883550 15.152616 10.180516 -3.206397 -32.026195 -11.208742

800 rows × 6 columns

The dataframe above shows us the values of each row of our original data projected onto the principal components.

In other words, instead of each Pokemon’s values in $$x_1$$, $$x_2$$, …, $$x_n$$, it shows us each pokemon’s new value on $$pc_1$$, $$pc_2$$, …, $$pc_n$$.

Let’s add these to our original dataframe so we can do interesting comparisons:

pokemon = pd.concat([pokemon, pokemon_transform], axis = 1)
pokemon

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Component 1 Component 2 Component 3 Component 4 Component 5 Component 6
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False -45.860728 -5.384432 18.925550 -0.988558 -12.398527 10.548700
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False -11.152937 -5.805620 20.848717 0.269407 -5.800877 7.175004
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False 36.946009 -5.236130 21.520463 1.531646 2.445413 3.159865
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False 80.128413 18.995343 29.313909 -11.228419 -8.684840 0.214346
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False -50.385905 -21.792797 3.921880 -12.581893 -7.357519 3.041302
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
795 719 Diancie Rock Fairy 600 50 100 150 100 150 50 6 True 72.196952 67.431919 44.284620 -34.857821 -10.971975 26.977909
796 719 DiancieMega Diancie Rock Fairy 700 50 160 110 160 110 110 6 True 120.944879 -20.303238 -8.390285 -38.395104 -44.341807 21.930314
797 720 HoopaHoopa Confined Psychic Ghost 600 80 110 60 150 130 70 6 True 75.999885 -27.270786 37.017466 19.106076 -28.247968 39.369910
798 720 HoopaHoopa Unbound Psychic Dark 680 80 160 60 170 130 80 6 True 114.096713 -36.870567 6.750875 17.902908 -45.622767 54.767251
799 721 Volcanion Fire Water 600 80 110 120 130 90 70 6 True 72.883550 15.152616 10.180516 -3.206397 -32.026195 -11.208742

800 rows × 19 columns

Now, let’s get a sense of how well our first couple principal components summarize our data by plotting the data projected onto these components:

In other words, we plot each of our data points but instead of plotting them on our original axes, we plot them on the new principal component axes:

sns.scatterplot(data = pokemon, x = "Component 1", y = "Component 2", alpha = 0.5)
plt.show()


How should we interpret this plot? What does it show?

PC1 does a really nice job capturing variance in our data long its axis. PC2 as well.

## Applying Principal Components: can we understand our data better by looking at the primary axes it varies along?¶

In the plot above, there seemed to be an intriguing discontinuity in our data along the first two principal components.

One way to evaluate PCA is to see how well it affords the analyses we want to do with our high-dimensional data, like classification and clustering.

sns.scatterplot(data = pokemon, x = "Component 1", y = "Component 2", alpha = 0.5)
plt.axvline(x = 50, c = "r", ls = "--")
plt.show()


Is the discontinuity in our first principal component telling us something useful about how our data is arranged in high-dimensional space?

One way we can approach this question is by applying a clustering algorithm to our data, but now we’ll cluster along the principal components rather than the original data points.

This tells us how our low-dimensional data representation can be clustered.

Does a Gaussian Mixture Model cluster according to the discontinuity we detected above?

from sklearn.mixture import GaussianMixture

# Fit a GMM with 4 clusters
gm = GaussianMixture(n_components = 4, random_state = 1)

# Then, generate labels for each of our data points based on these clusters
preds = gm.fit_predict(X = pokemon.loc[:, ('Component 1', 'Component 2')])

# Finally, let's add these labels to our original dataframe
pokemon['pca_lab'] = preds

pokemon

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Component 1 Component 2 Component 3 Component 4 Component 5 Component 6 pca_lab
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False -45.860728 -5.384432 18.925550 -0.988558 -12.398527 10.548700 1
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False -11.152937 -5.805620 20.848717 0.269407 -5.800877 7.175004 3
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False 36.946009 -5.236130 21.520463 1.531646 2.445413 3.159865 3
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False 80.128413 18.995343 29.313909 -11.228419 -8.684840 0.214346 0
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False -50.385905 -21.792797 3.921880 -12.581893 -7.357519 3.041302 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
795 719 Diancie Rock Fairy 600 50 100 150 100 150 50 6 True 72.196952 67.431919 44.284620 -34.857821 -10.971975 26.977909 2
796 719 DiancieMega Diancie Rock Fairy 700 50 160 110 160 110 110 6 True 120.944879 -20.303238 -8.390285 -38.395104 -44.341807 21.930314 0
797 720 HoopaHoopa Confined Psychic Ghost 600 80 110 60 150 130 70 6 True 75.999885 -27.270786 37.017466 19.106076 -28.247968 39.369910 0
798 720 HoopaHoopa Unbound Psychic Dark 680 80 160 60 170 130 80 6 True 114.096713 -36.870567 6.750875 17.902908 -45.622767 54.767251 0
799 721 Volcanion Fire Water 600 80 110 120 130 90 70 6 True 72.883550 15.152616 10.180516 -3.206397 -32.026195 -11.208742 0

800 rows × 20 columns

Now, let’s see how well the clustering above did with our data arranged along the first two principal components:

sns.scatterplot(data = pokemon,
x = "Component 1",
y = "Component 2",
hue = "pca_lab",
alpha = 0.5)
plt.show()


This seems to do a decently good job clustering our data.

Interestingly, it does somewhat capture the discontinuity we observed at $$pc_1 = 50$$, though not perfectly.

As an aside, we can show that 4 clusters is a pretty good choice using the “elbow method” with the GMM’s “Akaike Information Criterion” below:

clusters = np.arange(1, 11)
scores = []

for k in clusters:
scores.append(GaussianMixture(n_components = k, random_state = 1).fit(
X = pokemon.loc[:, ('Component 1', 'Component 2')]).aic(
X = pokemon.loc[:, ('Component 1', 'Component 2')]))
scores

sns.lineplot(x = clusters, y = scores)

<AxesSubplot:>


So what’s happening at that discontinuity in our first principal component?

Is there anything interpretable in our data that the PCA identified?

Note, there’s nothing guaranteeing that this will be the case, but here, it looks like this discontinuity may in part reflect whether a pokemon is “Legendary” or not!

sns.scatterplot(data = pokemon,
x = "Component 1",
y = "Component 2",
hue = "Legendary",
alpha = 0.5)
plt.show()


That’s pretty cool! Our Principal Components Analysis showed us a pattern in our data along the primary axes that our data varies on.

And, when we cluster based on those primary axes, we do a decent job separating out the “Legendary” pokemon based on this discontinuity:

sns.scatterplot(data = pokemon,
x = "Component 1",
y = "Component 2",
hue = "pca_lab",
style = "Legendary",
alpha = 0.5)
plt.show()


It’s not perfect, but pretty good!

Now, if you’re skeptical of the above, you might be thinking, “maybe we could have done all this without our fancy PCA and clustering based on principal components.”

When we look at our data with “Legendary” pokemon highlighted in each of the two-dimensional representations of our original data, it seems kind of unlikely that a clustering solution would easily isolate those labels…

cols.append('Legendary')
sns.pairplot(pokemon.loc[:, cols],
hue = "Legendary",
plot_kws = {"alpha": 0.5}
)

<seaborn.axisgrid.PairGrid at 0x7fcdc546d9a0>


But, just for thoroughness, we can run a similar GMM with 4 clusters on the original high-dimensional data and see if we do as clean a job separating out the “Legendary” pokemon:

# Use these columns as the basis for our high-dimensional GMM
cols = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']

# Then, fit a GMM and assign the labels to our original data
# we call them 'highd_lab' to differentiate from the PCA labels ('pca_lab')
pokemon['highd_lab'] = GaussianMixture(n_components = 4, random_state = 1).fit_predict(X = pokemon.loc[:, cols])
pokemon

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def ... Generation Legendary Component 1 Component 2 Component 3 Component 4 Component 5 Component 6 pca_lab highd_lab
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 ... 1 False -45.860728 -5.384432 18.925550 -0.988558 -12.398527 10.548700 1 1
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 ... 1 False -11.152937 -5.805620 20.848717 0.269407 -5.800877 7.175004 3 1
2 3 Venusaur Grass Poison 525 80 82 83 100 100 ... 1 False 36.946009 -5.236130 21.520463 1.531646 2.445413 3.159865 3 2
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 ... 1 False 80.128413 18.995343 29.313909 -11.228419 -8.684840 0.214346 0 3
4 4 Charmander Fire NaN 309 39 52 43 60 50 ... 1 False -50.385905 -21.792797 3.921880 -12.581893 -7.357519 3.041302 1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
795 719 Diancie Rock Fairy 600 50 100 150 100 150 ... 6 True 72.196952 67.431919 44.284620 -34.857821 -10.971975 26.977909 2 3
796 719 DiancieMega Diancie Rock Fairy 700 50 160 110 160 110 ... 6 True 120.944879 -20.303238 -8.390285 -38.395104 -44.341807 21.930314 0 3
797 720 HoopaHoopa Confined Psychic Ghost 600 80 110 60 150 130 ... 6 True 75.999885 -27.270786 37.017466 19.106076 -28.247968 39.369910 0 3
798 720 HoopaHoopa Unbound Psychic Dark 680 80 160 60 170 130 ... 6 True 114.096713 -36.870567 6.750875 17.902908 -45.622767 54.767251 0 3
799 721 Volcanion Fire Water 600 80 110 120 130 90 ... 6 True 72.883550 15.152616 10.180516 -3.206397 -32.026195 -11.208742 0 3

800 rows × 21 columns

How did we do here? Did these clusters also identify our “Legendary” pokemon?

One drawback of doing the high-dimensional clustering is that it’s not easy to visualize our data in this many dimensions!

Instead, we can resort to a summary based on counting the percent of “Legendary” pokemon assigned to each of our high dimensional clusters:

highd_summary = pokemon.groupby("highd_lab").agg(
Legendary = ("Legendary", "sum"),
).reset_index()

highd_summary['Legendary_pct'] = highd_summary['Legendary'] / np.sum(highd_summary['Legendary'])
highd_summary

highd_lab Legendary Legendary_pct
0 0 4 0.061538
1 1 0 0.000000
2 2 33 0.507692
3 3 28 0.430769

We did… not great. We have about 23% of our “Legendary” pokemon in one cluster, 58% in another, and 19% in a third.

How does this compare to our PCA-based clusters above, using only the first two principal components?

lowd_summary = pokemon.groupby("pca_lab").agg(
Legendary = ("Legendary", "sum"),
).reset_index()

lowd_summary['Legendary_pct'] = lowd_summary['Legendary'] / np.sum(lowd_summary['Legendary'])
lowd_summary

pca_lab Legendary Legendary_pct
0 0 53 0.815385
1 1 0 0.000000
2 2 6 0.092308
3 3 6 0.092308

With PCA and only two dimensions, we put 83% of our “Legendary” pokemon into a single cluster and less than 10% in the others. This is much better!

What’s the point of all this?

When we take our high-dimensional data and use a dimensionality reduction technique like PCA to identify the primary axes that our data varies on, this can actually help us understand our data better than in its original form.

Here, for example, we saw that the primary axis along which our data varies does a pretty good job of separating out “Legendary” pokemon from the others. Clustering algorithms based on just the first two principal components do a better job isolating these pokemon than the same algorithms applied to our original data!

### Evaluating PCA along individual dimensions¶

cols = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']

pokemon_subset = pokemon.loc[:, cols]

# Fit the PCA class to our data with just two principal components
pca2 = PCA(n_components = 2).fit(pokemon_subset)

# Transform our data onto these two principal components
pca2_vals = pd.DataFrame(pca2.transform(pokemon.loc[:, cols]), columns = ["PC1", "PC2"])
pca2_vals

# Add the transformed data to our dataframe
pokemon_subset = pd.concat([pokemon_subset, pca2_vals], axis = 1)
pokemon_subset

# Run the "inverse transform" of our data projected onto the principal components
inv_transform = pca2.inverse_transform(pca2.transform(pokemon.loc[:, cols]))
inv_transform

# Make a dataframe of the new predictions and add it to our original dataframe for comparison
pca2_preds = pd.DataFrame(inv_transform, columns = [elem + "_pred" for elem in cols])
pca2_preds

pokemon_subset = pd.concat([pokemon_subset, pca2_preds], axis = 1)
pokemon_subset

HP Attack Defense Sp. Atk Sp. Def Speed PC1 PC2 HP_pred Attack_pred Defense_pred Sp. Atk_pred Sp. Def_pred Speed_pred
0 45 49 49 65 65 45 -45.860728 -5.384432 55.236205 55.984724 52.642982 51.541692 52.880090 56.370858
1 60 62 63 80 80 60 -11.152937 -5.805620 65.658802 73.059669 65.561149 69.368731 66.494554 67.972058
2 80 82 83 100 100 80 36.946009 -5.236130 80.151381 96.810835 84.265187 93.631871 85.562359 83.384973
3 80 100 123 122 120 80 80.128413 18.995343 94.163805 119.949881 117.548003 106.322678 106.805912 83.557711
4 39 52 43 60 50 65 -50.385905 -21.792797 53.182394 52.498326 39.513188 55.527978 48.242175 64.342455
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
795 50 100 150 100 150 50 72.196952 67.431919 93.822481 119.748098 148.202887 83.719427 112.100828 53.058732
796 50 160 110 160 110 110 120.944879 -20.303238 104.782914 137.059880 105.763167 142.161063 116.068896 119.554513
797 80 110 60 150 130 70 75.999885 -27.270786 90.969004 114.373532 83.811600 121.955673 97.132328 108.859557
798 80 160 60 170 130 80 114.096713 -36.870567 102.023620 132.416331 91.638649 145.025926 110.487221 126.857459
799 80 110 120 130 90 70 72.883550 15.152616 91.822291 116.084807 112.118834 104.108145 103.280528 83.400454

800 rows × 14 columns

# What is the difference between our original values and our "predicted" values?
pokemon_subset[cols] - pca2.inverse_transform(pca2.transform(pokemon_subset[cols]))

HP Attack Defense Sp. Atk Sp. Def Speed
0 -10.236205 -6.984724 -3.642982 13.458308 12.119910 -11.370858
1 -5.658802 -11.059669 -2.561149 10.631269 13.505446 -7.972058
2 -0.151381 -14.810835 -1.265187 6.368129 14.437641 -3.384973
3 -14.163805 -19.949881 5.451997 15.677322 13.194088 -3.557711
4 -14.182394 -0.498326 3.486812 4.472022 1.757825 0.657545
... ... ... ... ... ... ...
795 -43.822481 -19.748098 1.797113 16.280573 37.899172 -3.058732
796 -54.782914 22.940120 4.236833 17.838937 -6.068896 -9.554513
797 -10.969004 -4.373532 -23.811600 28.044327 32.867672 -38.859557
798 -22.023620 27.583669 -31.638649 24.974074 19.512779 -46.857459
799 -11.822291 -6.084807 7.881166 25.891855 -13.280528 -13.400454

800 rows × 6 columns

# What is the mean of these differences squared in each column?
((pokemon_subset[cols] - pca2.inverse_transform(pca2.transform(pokemon_subset[cols])))**2).mean()

HP         425.808290
Attack     445.921493
Defense    127.285682
Sp. Atk    281.520945
Sp. Def    358.746080
Speed      245.169533
dtype: float64


What does this tell us?

The mean squared error for each of our columns when we project them onto the principal components gives us an indication of how much information we lose when we project our original data in high dimensions onto (in this case) just two principal components.

Now, we can calculate something kind of like $$R^2$$ to see how much of the variance in each column’s individual values is accounted for by the principal component predictions.

# This is the mean total sum of squares for each column (essentially the variance)
((pokemon_subset[cols] - pokemon_subset[cols].mean())**2).mean()

HP          651.204298
Attack     1052.163748
Defense     971.195194
Sp. Atk    1069.410100
Sp. Def     773.480494
Speed       843.455494
dtype: float64

# When we divide the mean sum of squared "residuals" by the mean total sum of squares (and subtract from 1),
# this gives us an R^2 metric for our PCA broken out by each column
1 - (((pokemon_subset[cols] - pca2.inverse_transform(pca2.transform(pokemon_subset[cols])))**2).mean() / ((pokemon_subset[cols] - pokemon_subset[cols].mean())**2).mean())

HP         0.346122
Attack     0.576186
Defense    0.868939
Sp. Atk    0.736751
Sp. Def    0.536192
Speed      0.709327
dtype: float64