Ha Khanh Nguyen (hknguyen)
import pandas as pd
ramen = pd.read_csv('https://stat107.hknguyen.org/files/datasets/clean-ramen.csv')
ramen
Brand | Variety | Style | Country | Stars | |
---|---|---|---|---|---|
0 | New Touch | T's Restaurant Tantanmen | Cup | Japan | 3.75 |
1 | Just Way | Noodles Spicy Hot Sesame Spicy Hot Sesame Guan... | Pack | Taiwan | 1.00 |
2 | Nissin | Cup Noodles Chicken Vegetable | Cup | USA | 2.25 |
3 | Wei Lih | GGE Ramen Snack Tomato Flavor | Pack | Taiwan | 2.75 |
4 | Ching's Secret | Singapore Curry | Pack | India | 3.75 |
... | ... | ... | ... | ... | ... |
2570 | Vifon | Hu Tiu Nam Vang ["Phnom Penh" style] Asian Sty... | Bowl | Vietnam | 3.50 |
2571 | Wai Wai | Oriental Style Instant Noodles | Pack | Thailand | 1.00 |
2572 | Wai Wai | Tom Yum Shrimp | Pack | Thailand | 2.00 |
2573 | Wai Wai | Tom Yum Chili Flavor | Pack | Thailand | 2.00 |
2574 | Westbrae | Miso Ramen | Pack | USA | 0.50 |
2575 rows × 5 columns
describe()
of the DataFrame object to get the summary statistics of variables in the DataFrame.dataframe_name.describe()
.ramen.describe()
Stars | |
---|---|
count | 2575.000000 |
mean | 3.654893 |
std | 1.015641 |
min | 0.000000 |
25% | 3.250000 |
50% | 3.750000 |
75% | 4.250000 |
max | 5.000000 |
ramen.dtypes
Brand object Variety object Style object Country object Stars float64 dtype: object
unique()
function to find out!ramen['Brand'].unique()
len(ramen['Brand'].unique())
355
ramen['Stars'].hist()
<AxesSubplot:>
matplotlib
module:import matplotlib.pyplot as plt
ramen['Stars'].hist(color = 'darkorange')
plt.xlabel('Rating (0-5)')
plt.ylabel('Count')
plt.title('Histogram of Ramen Ratings')
plt.show()
ramen.hist(column='Stars')
plt.show()
iris = pd.read_csv('https://stat107.hknguyen.org/files/datasets/iris.csv')
iris.hist(column=['Sepal.Length', 'Sepal.Width'])
plt.show()
pandas
DataFrame also has a boxplot()
function.ramen.boxplot(column='Stars')
<AxesSubplot:>
ramen['Stars'].boxplot()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-10-c8ec82afe18c> in <module> ----> 1 ramen['Stars'].boxplot() ~/miniconda3/lib/python3.8/site-packages/pandas/core/generic.py in __getattr__(self, name) 5460 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5461 return self[name] -> 5462 return object.__getattribute__(self, name) 5463 5464 def __setattr__(self, name: str, value) -> None: AttributeError: 'Series' object has no attribute 'boxplot'
iris.boxplot(column=['Sepal.Length', 'Sepal.Width'])
<AxesSubplot:>
[]
¶ramen['Brand']
0 New Touch 1 Just Way 2 Nissin 3 Wei Lih 4 Ching's Secret ... 2570 Vifon 2571 Wai Wai 2572 Wai Wai 2573 Wai Wai 2574 Westbrae Name: Brand, Length: 2575, dtype: object
ramen.Brand
0 New Touch 1 Just Way 2 Nissin 3 Wei Lih 4 Ching's Secret ... 2570 Vifon 2571 Wai Wai 2572 Wai Wai 2573 Wai Wai 2574 Westbrae Name: Brand, Length: 2575, dtype: object
ramen[['Brand', 'Stars']]
Brand | Stars | |
---|---|---|
0 | New Touch | 3.75 |
1 | Just Way | 1.00 |
2 | Nissin | 2.25 |
3 | Wei Lih | 2.75 |
4 | Ching's Secret | 3.75 |
... | ... | ... |
2570 | Vifon | 3.50 |
2571 | Wai Wai | 1.00 |
2572 | Wai Wai | 2.00 |
2573 | Wai Wai | 2.00 |
2574 | Westbrae | 0.50 |
2575 rows × 2 columns
iloc
¶iloc
is short for interger-location based indexing.iloc
syntax is:dataframe_name.iloc[<row index>, <column index>]
Stars
rating of the first observation (index 0):ramen.iloc[0, 4]
3.75
ramen.iloc[0, 1:5]
Variety T's Restaurant Tantanmen Style Cup Country Japan Stars 3.75 Name: 0, dtype: object
ramen.iloc[0, :]
Brand New Touch Variety T's Restaurant Tantanmen Style Cup Country Japan Stars 3.75 Name: 0, dtype: object
ramen.iloc[0:5, 4]
0 3.75 1 1.00 2 2.25 3 2.75 4 3.75 Name: Stars, dtype: float64
ramen.iloc[[0, 5, 10, 15], :]
Brand | Variety | Style | Country | Stars | |
---|---|---|---|---|---|
0 | New Touch | T's Restaurant Tantanmen | Cup | Japan | 3.75 |
5 | Samyang Foods | Kimchi song Song Ramen | Pack | South Korea | 4.75 |
10 | Tao Kae Noi | Creamy tom Yum Kung Flavour | Pack | Thailand | 5.00 |
15 | KOKA | Mushroom Flavour Instant Noodles | Cup | Singapore | 3.50 |
What would ramen.iloc[:, :]
return?
loc
¶i
in iloc
stands for integer. That is why with iloc
, we always use numbers for indexing.loc
, we use label (names) or boolean array/list for indexing instead.ramen.loc[0,'Stars']
3.75
Try to use loc
to select the following rows, columns:
Brand
for the 1st observation.Stars
for the first 5 observations.Country
is Vietnam
.ramen['Country'] == 'Vietnam'
0 False 1 False 2 False 3 False 4 False ... 2570 True 2571 False 2572 False 2573 False 2574 False Name: Country, Length: 2575, dtype: bool
# method 1
ramen[ramen['Country'] == 'Vietnam']
Brand | Variety | Style | Country | Stars | |
---|---|---|---|---|---|
18 | Binh Tay | Mi Hai Cua | Pack | Vietnam | 4.00 |
52 | Uni-President | Mushroom Flavor | Pack | Vietnam | 0.00 |
143 | Mum Ngon | Lau Tom Chua Cay | Pack | Vietnam | 3.50 |
224 | Vifon | Viet Cuisine Bun Rieu Cua Sour Crab Soup Insta... | Bowl | Vietnam | 5.00 |
365 | Acecook | Oh! Ricey Pork Flavour | Pack | Vietnam | 4.00 |
... | ... | ... | ... | ... | ... |
2484 | Binh Tay | Mi Chay Mushroom | Pack | Vietnam | 2.75 |
2533 | Ve Wong | Kung-Fu Chicken Flavor | Pack | Vietnam | 2.75 |
2568 | Ve Wong | Mushroom Pork | Pack | Vietnam | 1.00 |
2569 | Vifon | Nam Vang | Pack | Vietnam | 2.50 |
2570 | Vifon | Hu Tiu Nam Vang ["Phnom Penh" style] Asian Sty... | Bowl | Vietnam | 3.50 |
108 rows × 5 columns
# method 2
ramen.loc[ramen['Country'] == 'Vietnam']
Brand | Variety | Style | Country | Stars | |
---|---|---|---|---|---|
18 | Binh Tay | Mi Hai Cua | Pack | Vietnam | 4.00 |
52 | Uni-President | Mushroom Flavor | Pack | Vietnam | 0.00 |
143 | Mum Ngon | Lau Tom Chua Cay | Pack | Vietnam | 3.50 |
224 | Vifon | Viet Cuisine Bun Rieu Cua Sour Crab Soup Insta... | Bowl | Vietnam | 5.00 |
365 | Acecook | Oh! Ricey Pork Flavour | Pack | Vietnam | 4.00 |
... | ... | ... | ... | ... | ... |
2484 | Binh Tay | Mi Chay Mushroom | Pack | Vietnam | 2.75 |
2533 | Ve Wong | Kung-Fu Chicken Flavor | Pack | Vietnam | 2.75 |
2568 | Ve Wong | Mushroom Pork | Pack | Vietnam | 1.00 |
2569 | Vifon | Nam Vang | Pack | Vietnam | 2.50 |
2570 | Vifon | Hu Tiu Nam Vang ["Phnom Penh" style] Asian Sty... | Bowl | Vietnam | 3.50 |
108 rows × 5 columns
The 2nd method is PREFERRED and HIGHLY RECOMMENDED! The reason? We will learn in future lectures!
Now, what's if I want only the ones where Brand
is Gau Do
(means Red Bear)?
ramen.loc[(ramen['Country'] == 'Vietnam') & (ramen['Brand'] == 'Gau Do')]
Brand | Variety | Style | Country | Stars | |
---|---|---|---|---|---|
1776 | Gau Do | Hot Sour Shrimp | Pack | Vietnam | 3.75 |
1840 | Gau Do | Chicken Shrimp | Pack | Vietnam | 2.50 |
Let's look up the rating for the 2 ramen packs I bought recently! (see video)