Exploring Data With Pandas¶

In this notebook, you'll be exploring a dataset with pandas. You'll get to choose a dataset from a shared datasets folder I'll be providing.

Step 1 - Getting Access to the Drive¶

First things first, you'll have to access this shared folder and add a shortcut to it to your drive.

Open this link to our shared datasets folder in a different tab.
In that folder, select the drop-down from the top where it has the folder's name, "datasets".
From that drop-down, go to "Organize" and then select "Add Shortcut"
In the new list of locations to add the shortcut to, select "My Drive"

Step 2 - Connecting Colab to Your Drive¶

Next, we'll want to mount our Google Drive and to our Notebook environment (we'll have

In [ ]:

Copied!

# Connect to Google Drive
from google.colab import drive
drive.mount("/content/gdrive")
# Connect to Google Drive
from google.colab import drive
drive.mount("/content/gdrive")

Step 3 - Read Your Selected CSV¶

In this cell, we're going to import the pandas module:

import pandas as pd

We're going to pick which CSV we want to read and specify the file path in our Colab Notebook's filesystem and then read it using pandas into a dataframe variable called df.

df = pd.read_csv("gdrive/MyDrive/datasets/pokemon.csv")

And to confirm that our notebook looks right, we'll look at the .head() (first 5 rows) of our CSV. Just call this as the last line in your cell to see the results.
```
df.head()
```
Alternatively, if you want to see a random sample of rows, you can use:
```
df.sample(5)
```

In [ ]:

Step 4 - Inspect the DataFrame¶

In this cell, use a few different methods on the dataframe to inspect it and get a feel for it.

df.info()  # Note: This is a method (you call it)
df.describe() # Note: This is a method
df.columns # Note: This is an attribute (you don't call it)
df.dtypes  # Note: This is an attribute
df.index   # Note: This is an attribute

Ask yourself, what does this say about the dataframe?

For context, here's the documentation for:

In [ ]:

Step 5 - Pull Out a Single Series From the DataFrame¶

Next, try pulling out a single Series from the dataframe. This is like a fancy list.

column_name = ... # This should be a string
df[column_name]

In [ ]:

Step 6 - Use Methods to Describe That Series¶

Try some of the Series Descriptive Methods.

For example:

df[column_name].mode()

Try a few different aggregate methods, what does this say about the Series?

In [ ]:

Step 7 - Use Methods to Describe All of the Series in the DataFrame¶

Try some of the DataFrame Descriptive Methods

In [ ]:

Step 8 - Try the DataFrame `.describe()` Method¶

Try df.describe()

Ask yourself, what does this tell me about the DataFrame?

In [ ]:

Step 9 - Try Filtering a DataFrame by a Boolean¶

One of the major features of Pandas is that we can filter (keep a subset of) the rows based on a comparison of two columns.

df[df[str_col_name] == "Target Value"]
df[df[num_col_name] >= 200]

For now, only do one comparison at a time. Don't use any and or or statements.

Try assigning these to variables. Try using the descriptive methods from above on these new filtered dataframes.

In [ ]:

Exploring Data With Pandas¶

Step 1 - Getting Access to the Drive¶

Step 2 - Connecting Colab to Your Drive¶

Step 3 - Read Your Selected CSV¶

Step 4 - Inspect the DataFrame¶

Step 5 - Pull Out a Single Series From the DataFrame¶

Step 6 - Use Methods to Describe That Series¶

Step 7 - Use Methods to Describe All of the Series in the DataFrame¶

Step 8 - Try the DataFrame .describe() Method¶

Step 9 - Try Filtering a DataFrame by a Boolean¶

Step 8 - Try the DataFrame `.describe()` Method¶