1. Importing Modules/Libraries

# import a library/module
import pandas

# call read_csv() function
pandas.read_csv('https://stat107.hknguyen.org/files/datasets/rivers.csv')
##      length
## 0       735
## 1       320
## 2       325
## 3       392
## 4       524
## ..      ...
## 136     720
## 137     270
## 138     430
## 139     671
## 140    1770
## 
## [141 rows x 1 columns]
# import pandas but now call it pd
import pandas as pd

# call read_csv() function
pd.read_csv('https://stat107.hknguyen.org/files/datasets/rivers.csv')
##      length
## 0       735
## 1       320
## 2       325
## 3       392
## 4       524
## ..      ...
## 136     720
## 137     270
## 138     430
## 139     671
## 140    1770
## 
## [141 rows x 1 columns]
# not recommended for pandas!
from pandas import read_csv

read_csv('https://stat107.hknguyen.org/files/datasets/rivers.csv')
##      length
## 0       735
## 1       320
## 2       325
## 3       392
## 4       524
## ..      ...
## 136     720
## 137     270
## 138     430
## 139     671
## 140    1770
## 
## [141 rows x 1 columns]

1.1 What is pandas?

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.” - pandas.pydata.org

  • pandas provides the data structures and functions needed to conduct clean data and conduct data analysis.

1.2 What is a csv file?

  • csv stands for comma separated value. This means that the values of a row are comma separated.
  • We will be dealing with mainly only csv files in this class.