Pandas

郭耀仁

啟發自 R 語言

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more.

Source: https://github.com/pandas-dev/pandas

Pandas 提供的資料結構

名稱 描述
Series 可以建立索引的一維陣列
DataFrame 有列索引與欄標籤的二維資料集
Panel 有資料集索引、列索引與欄標籤的三維資料集

Gapminder 練習

  • 首先我們從一個叫做 gapminder 的資料開始
In [1]:
import pandas as pd

csv_file = "https://storage.googleapis.com/learn_pd_like_tidyverse/gapminder.csv"
gapminder = pd.read_csv(csv_file)

暸解外觀

In [2]:
gapminder.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1704 entries, 0 to 1703
Data columns (total 6 columns):
country      1704 non-null object
continent    1704 non-null object
year         1704 non-null int64
lifeExp      1704 non-null float64
pop          1704 non-null int64
gdpPercap    1704 non-null float64
dtypes: float64(2), int64(2), object(2)
memory usage: 80.0+ KB
In [3]:
gapminder.shape
Out[3]:
(1704, 6)
In [4]:
gapminder.head()
Out[4]:
country continent year lifeExp pop gdpPercap
0 Afghanistan Asia 1952 28.801 8425333 779.445314
1 Afghanistan Asia 1957 30.332 9240934 820.853030
2 Afghanistan Asia 1962 31.997 10267083 853.100710
3 Afghanistan Asia 1967 34.020 11537966 836.197138
4 Afghanistan Asia 1972 36.088 13079460 739.981106

對觀測值做篩選

  • 選出臺灣的觀測值
In [5]:
is_tw = gapminder['country'] == 'Taiwan'
gapminder.loc[is_tw, :]
Out[5]:
country continent year lifeExp pop gdpPercap
1500 Taiwan Asia 1952 58.50 8550362 1206.947913
1501 Taiwan Asia 1957 62.40 10164215 1507.861290
1502 Taiwan Asia 1962 65.20 11918938 1822.879028
1503 Taiwan Asia 1967 67.50 13648692 2643.858681
1504 Taiwan Asia 1972 69.39 15226039 4062.523897
1505 Taiwan Asia 1977 70.59 16785196 5596.519826
1506 Taiwan Asia 1982 72.16 18501390 7426.354774
1507 Taiwan Asia 1987 73.40 19757799 11054.561750
1508 Taiwan Asia 1992 74.26 20686918 15215.657900
1509 Taiwan Asia 1997 75.25 21628605 20206.820980
1510 Taiwan Asia 2002 76.99 22454239 23235.423290
1511 Taiwan Asia 2007 78.40 23174294 28718.276840

對變數做選擇

  • 選出 countrycontinent
In [6]:
gapminder.loc[:, ['country', 'continent']]
Out[6]:
country continent
0 Afghanistan Asia
1 Afghanistan Asia
2 Afghanistan Asia
3 Afghanistan Asia
4 Afghanistan Asia
5 Afghanistan Asia
6 Afghanistan Asia
7 Afghanistan Asia
8 Afghanistan Asia
9 Afghanistan Asia
10 Afghanistan Asia
11 Afghanistan Asia
12 Albania Europe
13 Albania Europe
14 Albania Europe
15 Albania Europe
16 Albania Europe
17 Albania Europe
18 Albania Europe
19 Albania Europe
20 Albania Europe
21 Albania Europe
22 Albania Europe
23 Albania Europe
24 Algeria Africa
25 Algeria Africa
26 Algeria Africa
27 Algeria Africa
28 Algeria Africa
29 Algeria Africa
... ... ...
1674 Yemen, Rep. Asia
1675 Yemen, Rep. Asia
1676 Yemen, Rep. Asia
1677 Yemen, Rep. Asia
1678 Yemen, Rep. Asia
1679 Yemen, Rep. Asia
1680 Zambia Africa
1681 Zambia Africa
1682 Zambia Africa
1683 Zambia Africa
1684 Zambia Africa
1685 Zambia Africa
1686 Zambia Africa
1687 Zambia Africa
1688 Zambia Africa
1689 Zambia Africa
1690 Zambia Africa
1691 Zambia Africa
1692 Zimbabwe Africa
1693 Zimbabwe Africa
1694 Zimbabwe Africa
1695 Zimbabwe Africa
1696 Zimbabwe Africa
1697 Zimbabwe Africa
1698 Zimbabwe Africa
1699 Zimbabwe Africa
1700 Zimbabwe Africa
1701 Zimbabwe Africa
1702 Zimbabwe Africa
1703 Zimbabwe Africa

1704 rows × 2 columns

新增衍生變數

In [7]:
gapminder['country_abb'] = gapminder['country'].apply(lambda x: x[:3])
gapminder
Out[7]:
country continent year lifeExp pop gdpPercap country_abb
0 Afghanistan Asia 1952 28.801 8425333 779.445314 Afg
1 Afghanistan Asia 1957 30.332 9240934 820.853030 Afg
2 Afghanistan Asia 1962 31.997 10267083 853.100710 Afg
3 Afghanistan Asia 1967 34.020 11537966 836.197138 Afg
4 Afghanistan Asia 1972 36.088 13079460 739.981106 Afg
5 Afghanistan Asia 1977 38.438 14880372 786.113360 Afg
6 Afghanistan Asia 1982 39.854 12881816 978.011439 Afg
7 Afghanistan Asia 1987 40.822 13867957 852.395945 Afg
8 Afghanistan Asia 1992 41.674 16317921 649.341395 Afg
9 Afghanistan Asia 1997 41.763 22227415 635.341351 Afg
10 Afghanistan Asia 2002 42.129 25268405 726.734055 Afg
11 Afghanistan Asia 2007 43.828 31889923 974.580338 Afg
12 Albania Europe 1952 55.230 1282697 1601.056136 Alb
13 Albania Europe 1957 59.280 1476505 1942.284244 Alb
14 Albania Europe 1962 64.820 1728137 2312.888958 Alb
15 Albania Europe 1967 66.220 1984060 2760.196931 Alb
16 Albania Europe 1972 67.690 2263554 3313.422188 Alb
17 Albania Europe 1977 68.930 2509048 3533.003910 Alb
18 Albania Europe 1982 70.420 2780097 3630.880722 Alb
19 Albania Europe 1987 72.000 3075321 3738.932735 Alb
20 Albania Europe 1992 71.581 3326498 2497.437901 Alb
21 Albania Europe 1997 72.950 3428038 3193.054604 Alb
22 Albania Europe 2002 75.651 3508512 4604.211737 Alb
23 Albania Europe 2007 76.423 3600523 5937.029526 Alb
24 Algeria Africa 1952 43.077 9279525 2449.008185 Alg
25 Algeria Africa 1957 45.685 10270856 3013.976023 Alg
26 Algeria Africa 1962 48.303 11000948 2550.816880 Alg
27 Algeria Africa 1967 51.407 12760499 3246.991771 Alg
28 Algeria Africa 1972 54.518 14760787 4182.663766 Alg
29 Algeria Africa 1977 58.014 17152804 4910.416756 Alg
... ... ... ... ... ... ... ...
1674 Yemen, Rep. Asia 1982 49.113 9657618 1977.557010 Yem
1675 Yemen, Rep. Asia 1987 52.922 11219340 1971.741538 Yem
1676 Yemen, Rep. Asia 1992 55.599 13367997 1879.496673 Yem
1677 Yemen, Rep. Asia 1997 58.020 15826497 2117.484526 Yem
1678 Yemen, Rep. Asia 2002 60.308 18701257 2234.820827 Yem
1679 Yemen, Rep. Asia 2007 62.698 22211743 2280.769906 Yem
1680 Zambia Africa 1952 42.038 2672000 1147.388831 Zam
1681 Zambia Africa 1957 44.077 3016000 1311.956766 Zam
1682 Zambia Africa 1962 46.023 3421000 1452.725766 Zam
1683 Zambia Africa 1967 47.768 3900000 1777.077318 Zam
1684 Zambia Africa 1972 50.107 4506497 1773.498265 Zam
1685 Zambia Africa 1977 51.386 5216550 1588.688299 Zam
1686 Zambia Africa 1982 51.821 6100407 1408.678565 Zam
1687 Zambia Africa 1987 50.821 7272406 1213.315116 Zam
1688 Zambia Africa 1992 46.100 8381163 1210.884633 Zam
1689 Zambia Africa 1997 40.238 9417789 1071.353818 Zam
1690 Zambia Africa 2002 39.193 10595811 1071.613938 Zam
1691 Zambia Africa 2007 42.384 11746035 1271.211593 Zam
1692 Zimbabwe Africa 1952 48.451 3080907 406.884115 Zim
1693 Zimbabwe Africa 1957 50.469 3646340 518.764268 Zim
1694 Zimbabwe Africa 1962 52.358 4277736 527.272182 Zim
1695 Zimbabwe Africa 1967 53.995 4995432 569.795071 Zim
1696 Zimbabwe Africa 1972 55.635 5861135 799.362176 Zim
1697 Zimbabwe Africa 1977 57.674 6642107 685.587682 Zim
1698 Zimbabwe Africa 1982 60.363 7636524 788.855041 Zim
1699 Zimbabwe Africa 1987 62.351 9216418 706.157306 Zim
1700 Zimbabwe Africa 1992 60.377 10704340 693.420786 Zim
1701 Zimbabwe Africa 1997 46.809 11404948 792.449960 Zim
1702 Zimbabwe Africa 2002 39.989 11926563 672.038623 Zim
1703 Zimbabwe Africa 2007 43.487 12311143 469.709298 Zim

1704 rows × 7 columns

依照變數做排序

In [8]:
gapminder.sort_values(['year', 'continent', 'country'])
Out[8]:
country continent year lifeExp pop gdpPercap country_abb
24 Algeria Africa 1952 43.077 9279525 2449.008185 Alg
36 Angola Africa 1952 30.015 4232095 3520.610273 Ang
120 Benin Africa 1952 38.223 1738315 1062.752200 Ben
156 Botswana Africa 1952 47.622 442308 851.241141 Bot
192 Burkina Faso Africa 1952 31.975 4469979 543.255241 Bur
204 Burundi Africa 1952 39.031 2445618 339.296459 Bur
228 Cameroon Africa 1952 38.523 5009067 1172.667655 Cam
252 Central African Republic Africa 1952 35.463 1291695 1071.310713 Cen
264 Chad Africa 1952 38.092 2682462 1178.665927 Cha
312 Comoros Africa 1952 40.715 153936 1102.990936 Com
324 Congo, Dem. Rep. Africa 1952 39.143 14100005 780.542326 Con
336 Congo, Rep. Africa 1952 42.111 854885 2125.621418 Con
360 Cote d'Ivoire Africa 1952 40.477 2977019 1388.594732 Cot
420 Djibouti Africa 1952 34.812 63149 2669.529475 Dji
456 Egypt Africa 1952 41.893 22223309 1418.822445 Egy
480 Equatorial Guinea Africa 1952 34.482 216964 375.643123 Equ
492 Eritrea Africa 1952 35.928 1438760 328.940557 Eri
504 Ethiopia Africa 1952 34.078 20860941 362.146280 Eth
540 Gabon Africa 1952 37.003 420702 4293.476475 Gab
552 Gambia Africa 1952 30.000 284320 485.230659 Gam
576 Ghana Africa 1952 43.149 5581001 911.298937 Gha
612 Guinea Africa 1952 33.609 2664249 510.196492 Gui
624 Guinea-Bissau Africa 1952 32.500 580653 299.850319 Gui
816 Kenya Africa 1952 42.270 6464046 853.540919 Ken
876 Lesotho Africa 1952 42.138 748747 298.846212 Les
888 Liberia Africa 1952 38.480 863308 575.572996 Lib
900 Libya Africa 1952 42.723 1019729 2387.548060 Lib
912 Madagascar Africa 1952 36.681 4762912 1443.011715 Mad
924 Malawi Africa 1952 36.256 2917802 369.165080 Mal
948 Mali Africa 1952 33.685 3838168 452.336981 Mal
... ... ... ... ... ... ... ...
119 Belgium Europe 2007 79.441 10392226 33692.605080 Bel
155 Bosnia and Herzegovina Europe 2007 74.852 4552198 7446.298803 Bos
191 Bulgaria Europe 2007 73.005 7322858 10680.792820 Bul
383 Croatia Europe 2007 75.748 4493312 14619.222720 Cro
407 Czech Republic Europe 2007 76.486 10228744 22833.308510 Cze
419 Denmark Europe 2007 78.332 5468120 35278.418740 Den
527 Finland Europe 2007 79.313 5238460 33207.084400 Fin
539 France Europe 2007 80.657 61083916 30470.016700 Fra
575 Germany Europe 2007 79.406 82400996 32170.374420 Ger
599 Greece Europe 2007 79.483 10706290 27538.411880 Gre
683 Hungary Europe 2007 73.338 9956108 18008.944440 Hun
695 Iceland Europe 2007 81.757 301931 36180.789190 Ice
755 Ireland Europe 2007 78.885 4109086 40675.996350 Ire
779 Italy Europe 2007 80.546 58147733 28569.719700 Ita
1019 Montenegro Europe 2007 74.543 684736 9253.896111 Mon
1091 Netherlands Europe 2007 79.762 16570613 36797.933320 Net
1151 Norway Europe 2007 80.196 4627926 49357.190170 Nor
1235 Poland Europe 2007 75.563 38518241 15389.924680 Pol
1247 Portugal Europe 2007 78.098 10642836 20509.647770 Por
1283 Romania Europe 2007 72.476 22276056 10808.475610 Rom
1343 Serbia Europe 2007 74.002 10150265 9786.534714 Ser
1379 Slovak Republic Europe 2007 74.663 5447502 18678.314350 Slo
1391 Slovenia Europe 2007 77.926 2009245 25768.257590 Slo
1427 Spain Europe 2007 80.941 40448191 28821.063700 Spa
1475 Sweden Europe 2007 80.884 9031088 33859.748350 Swe
1487 Switzerland Europe 2007 81.701 7554661 37506.419070 Swi
1583 Turkey Europe 2007 71.777 71158647 8458.276384 Tur
1607 United Kingdom Europe 2007 79.425 60776238 33203.261280 Uni
71 Australia Oceania 2007 81.235 20434176 34435.367440 Aus
1103 New Zealand Oceania 2007 80.204 4115771 25185.009110 New

1704 rows × 7 columns

聚合計算

In [9]:
gapminder[gapminder['year'] == 2007][['pop']].sum()
Out[9]:
pop    6251013179
dtype: int64

依照組別聚合計算

In [10]:
gapminder[gapminder['year'] == 2007].groupby(by = 'continent')['pop'].sum()
Out[10]:
continent
Africa       929539692
Americas     898871184
Asia        3811953827
Europe       586098529
Oceania       24549947
Name: pop, dtype: int64

Series

建立 Series

  • Series() 建立 Series
  • 其中 data 可以是:
    • 一個 ndarray
    • 一個 dict
    • 單一資料
import pandas as pd

ser = pd.Series(data, index = idx)

建立 Series(2)

  • data 是一個 ndarray
In [11]:
import numpy as np
import pandas as pd

arr = np.array(("Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"))
ser = pd.Series(arr) # 預設的索引
print(type(ser))
print("\n")
print(ser)
<class 'pandas.core.series.Series'>


0      Monkey D. Luffy
1         Roronoa Zoro
2                 Nami
3                Usopp
4       Vinsmoke Sanji
5    Tony Tony Chopper
6           Nico Robin
7               Franky
8                Brook
dtype: object
In [12]:
# 使用自訂的索引

crew_idx = []
for i in range(9):
    crew_idx.append("crew " + str(i + 1))
ser = pd.Series(arr, index = crew_idx)
print(ser)
crew 1      Monkey D. Luffy
crew 2         Roronoa Zoro
crew 3                 Nami
crew 4                Usopp
crew 5       Vinsmoke Sanji
crew 6    Tony Tony Chopper
crew 7           Nico Robin
crew 8               Franky
crew 9                Brook
dtype: object

建立 Series(3)

  • data 是一個 dict
  • 預設會將 key 當作索引值
In [13]:
import pandas as pd

crew_dict = {
    "captain": "Monkey D. Luffy",
    "swordsman": "Roronoa Zoro",
    "navigator": "Nami",
    "sniper": "Usopp",
    "chef": "Vinsmoke Sanji",
    "doctor": "Tony Tony Chopper",
    "archaeologist": "Nico Robin",
    "shipwright": "Franky",
    "musician": "Brook"
}

ser = pd.Series(crew_dict) # 會依照 key 排序
print(ser)
archaeologist           Nico Robin
captain            Monkey D. Luffy
chef                Vinsmoke Sanji
doctor           Tony Tony Chopper
musician                     Brook
navigator                     Nami
shipwright                  Franky
sniper                       Usopp
swordsman             Roronoa Zoro
dtype: object
In [14]:
import pandas as pd

crew_dict = {
    "captain": "Monkey D. Luffy",
    "swordsman": "Roronoa Zoro",
    "navigator": "Nami",
    "sniper": "Usopp",
    "chef": "Vinsmoke Sanji",
    "doctor": "Tony Tony Chopper",
    "archaeologist": "Nico Robin",
    "shipwright": "Franky",
    "musician": "Brook"
}

ser = pd.Series(crew_dict, index = crew_dict.keys()) # 排序與原 dict 相同
print(ser)
captain            Monkey D. Luffy
swordsman             Roronoa Zoro
navigator                     Nami
sniper                       Usopp
chef                Vinsmoke Sanji
doctor           Tony Tony Chopper
archaeologist           Nico Robin
shipwright                  Franky
musician                     Brook
dtype: object

建立 Series(4)

  • data 可以是單一資料
In [15]:
import pandas as pd

luffy = "Monkey D. Luffy"
ser = pd.Series(luffy, index = range(5))
print(ser)
0    Monkey D. Luffy
1    Monkey D. Luffy
2    Monkey D. Luffy
3    Monkey D. Luffy
4    Monkey D. Luffy
dtype: object

Series 的操作

  • 透過索引值或標籤選取資料
In [16]:
import pandas as pd

crew_dict = {
    "captain": "Monkey D. Luffy",
    "swordsman": "Roronoa Zoro",
    "navigator": "Nami",
    "sniper": "Usopp",
    "chef": "Vinsmoke Sanji",
    "doctor": "Tony Tony Chopper",
    "archaeologist": "Nico Robin",
    "shipwright": "Franky",
    "musician": "Brook"
}

ser = pd.Series(crew_dict, index = crew_dict.keys()) # 排序與原 dict 相同
print(ser[0])
print(ser['captain'])
print("\n")
print(ser[[0, 3, 6]])
print(ser[['captain', 'sniper', 'archaeologist']])
Monkey D. Luffy
Monkey D. Luffy


captain          Monkey D. Luffy
sniper                     Usopp
archaeologist         Nico Robin
dtype: object
captain          Monkey D. Luffy
sniper                     Usopp
archaeologist         Nico Robin
dtype: object

Series 的操作(2)

  • 透過 : 快速地切割
In [17]:
import pandas as pd

crew_dict = {
    "captain": "Monkey D. Luffy",
    "swordsman": "Roronoa Zoro",
    "navigator": "Nami",
    "sniper": "Usopp",
    "chef": "Vinsmoke Sanji",
    "doctor": "Tony Tony Chopper",
    "archaeologist": "Nico Robin",
    "shipwright": "Franky",
    "musician": "Brook"
}

ser = pd.Series(crew_dict, index = crew_dict.keys()) # 排序與原 dict 相同
print(ser[:3])
print("\n")
print(ser['sniper':])
captain      Monkey D. Luffy
swordsman       Roronoa Zoro
navigator               Nami
dtype: object


sniper                       Usopp
chef                Vinsmoke Sanji
doctor           Tony Tony Chopper
archaeologist           Nico Robin
shipwright                  Franky
musician                     Brook
dtype: object

Series 的操作(3)

  • 也可以透過判斷條件進行布林篩選
In [18]:
import pandas as pd

crew_dict = {
    "captain": "Monkey D. Luffy",
    "swordsman": "Roronoa Zoro",
    "navigator": "Nami",
    "sniper": "Usopp",
    "chef": "Vinsmoke Sanji",
    "doctor": "Tony Tony Chopper",
    "archaeologist": "Nico Robin",
    "shipwright": "Franky",
    "musician": "Brook"
}

ser = pd.Series(crew_dict, index = crew_dict.keys()) # 排序與原 dict 相同
name_filter = ser.isin(("Nami", "Nico Robin"))
print(ser[name_filter])
navigator              Nami
archaeologist    Nico Robin
dtype: object

Series 的操作(4)

  • 適用 element-wise 運算
In [19]:
import pandas as pd

crew_age = {
    "Monkey D. Luffy": 19,
    "Roronoa Zoro": 21,
    "Nami": 20,
    "Usopp": 19,
    "Vinsmoke Sanji": 21,
    "Tony Tony Chopper": 17,
    "Nico Robin": 30,
    "Franky": 36,
    "Brook": 90
}

ser = pd.Series(crew_age, index = crew_age.keys())
print(ser - 2)
Monkey D. Luffy      17
Roronoa Zoro         19
Nami                 18
Usopp                17
Vinsmoke Sanji       19
Tony Tony Chopper    15
Nico Robin           28
Franky               34
Brook                88
dtype: int64

DataFrame

建立 DataFrame

  • DataFrame() 建立 DataFrame
  • 其中 data 是:
    • 一個 dict
    • 一個 ndarray
import pandas as pd

df = pd.DataFrame(data)

建立 DataFrame(2)

  • 其中 data 是一個 dict
In [20]:
import pandas as pd

straw_hat_dict = {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
                  "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
                  "is_male": [True, True, False, True, True, True, False, True, True]
}

df = pd.DataFrame(straw_hat_dict) # 欄標籤預設排序
print(type(df))
df
<class 'pandas.core.frame.DataFrame'>
Out[20]:
age is_male name
0 19 True Monkey D. Luffy
1 21 True Roronoa Zoro
2 20 False Nami
3 19 True Usopp
4 21 True Vinsmoke Sanji
5 17 True Tony Tony Chopper
6 30 False Nico Robin
7 36 True Franky
8 90 True Brook
In [21]:
import pandas as pd

straw_hat_dict = {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
                  "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
                  "is_male": [True, True, False, True, True, True, False, True, True]
}

df = pd.DataFrame(straw_hat_dict, columns = ["name", "age", "is_male"]) # 指定欄標籤排序
print(type(df))
df
<class 'pandas.core.frame.DataFrame'>
Out[21]:
name age is_male
0 Monkey D. Luffy 19 True
1 Roronoa Zoro 21 True
2 Nami 20 False
3 Usopp 19 True
4 Vinsmoke Sanji 21 True
5 Tony Tony Chopper 17 True
6 Nico Robin 30 False
7 Franky 36 True
8 Brook 90 True

Data frame 的操作

  • 包含多種變數類型,不像 ndarray 僅容納單一變數類型
In [22]:
import pandas as pd

straw_hat_dict = {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
                  "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
                  "is_male": [True, True, False, True, True, True, False, True, True]
}

df = pd.DataFrame(straw_hat_dict, columns = ["name", "age", "is_male"]) # 指定欄標籤排序
df.dtypes
Out[22]:
name       object
age         int64
is_male      bool
dtype: object

Data frame 的操作(2)

  • 可以直接指派新增一個變數
In [23]:
import pandas as pd

straw_hat_dict = {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
                  "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
                  "is_male": [True, True, False, True, True, True, False, True, True]
}

df = pd.DataFrame(straw_hat_dict, columns = ["name", "age", "is_male"]) # 指定欄標籤排序
df['age_2_yr_ago'] = df['age'] - 2
df
Out[23]:
name age is_male age_2_yr_ago
0 Monkey D. Luffy 19 True 17
1 Roronoa Zoro 21 True 19
2 Nami 20 False 18
3 Usopp 19 True 17
4 Vinsmoke Sanji 21 True 19
5 Tony Tony Chopper 17 True 15
6 Nico Robin 30 False 28
7 Franky 36 True 34
8 Brook 90 True 88
In [24]:
import pandas as pd

straw_hat_dict = {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
                  "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
                  "is_male": [True, True, False, True, True, True, False, True, True]
}

df = pd.DataFrame(straw_hat_dict, columns = ["name", "age", "is_male"]) # 指定欄標籤排序
df['favorite_food'] = ["Meat", "Food matches wine", "Orange", "Fish", "Food matches black tea", "Sweets", "Food matches coffee", "Food matches coke", "Milk"]
df
Out[24]:
name age is_male favorite_food
0 Monkey D. Luffy 19 True Meat
1 Roronoa Zoro 21 True Food matches wine
2 Nami 20 False Orange
3 Usopp 19 True Fish
4 Vinsmoke Sanji 21 True Food matches black tea
5 Tony Tony Chopper 17 True Sweets
6 Nico Robin 30 False Food matches coffee
7 Franky 36 True Food matches coke
8 Brook 90 True Milk

Data frame 的操作(3)

  • 利用 .insert() 指定變數新增的位置
In [25]:
import pandas as pd

straw_hat_dict = {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
                  "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
                  "is_male": [True, True, False, True, True, True, False, True, True]
}

df = pd.DataFrame(straw_hat_dict, columns = ["name", "age", "is_male"]) # 指定欄標籤排序
df.insert(1, 'favorite_food', ["Meat", "Food matches wine", "Orange", "Fish", "Food matches black tea", "Sweets", "Food matches coffee", "Food matches coke", "Milk"])
df
Out[25]:
name favorite_food age is_male
0 Monkey D. Luffy Meat 19 True
1 Roronoa Zoro Food matches wine 21 True
2 Nami Orange 20 False
3 Usopp Fish 19 True
4 Vinsmoke Sanji Food matches black tea 21 True
5 Tony Tony Chopper Sweets 17 True
6 Nico Robin Food matches coffee 30 False
7 Franky Food matches coke 36 True
8 Brook Milk 90 True

Data frame 的操作(4)

  • 利用 del 刪除變數
In [26]:
import pandas as pd

straw_hat_dict = {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
                  "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
                  "is_male": [True, True, False, True, True, True, False, True, True]
}

df = pd.DataFrame(straw_hat_dict, columns = ["name", "age", "is_male"]) # 指定欄標籤排序
del df['is_male']
df
Out[26]:
name age
0 Monkey D. Luffy 19
1 Roronoa Zoro 21
2 Nami 20
3 Usopp 19
4 Vinsmoke Sanji 21
5 Tony Tony Chopper 17
6 Nico Robin 30
7 Franky 36
8 Brook 90

Data frame 的操作(5)

  • 利用 .pop() 將變數刪除後指派給一個 Series
In [27]:
import pandas as pd

straw_hat_dict = {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
                  "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
                  "is_male": [True, True, False, True, True, True, False, True, True]
}

df = pd.DataFrame(straw_hat_dict, columns = ["name", "age", "is_male"]) # 指定欄標籤排序
ser = df.pop('is_male')
print(type(ser))
print(ser)
<class 'pandas.core.series.Series'>
0     True
1     True
2    False
3     True
4     True
5     True
6    False
7     True
8     True
Name: is_male, dtype: bool

Panel

建立 Panel

  • 相對比 Series、DataFrame 冷門
  • 有三個維度:
    • items(資料框索引)
    • major_axis(資料框的列索引)
    • minor_axis(資料框的欄索引)

建立 Panel(2)

  • 建立一個有兩個 DataFrame 的 Panel
In [28]:
import pandas as pd

df_2_yr_ago = pd.DataFrame(
    {
        "name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
        "age": [17, 19, 18, 17, 19, 15, 28, 34, 88],
        "mastered_haki": [False, False, False, False, False, False, False, False, False]
    }
)
df_now = pd.DataFrame(
    {
        "name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
        "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
        "mastered_haki": [True, True, False, True, True, False, False, False, False]
    }
)
panel_data = pd.Panel({
    '2 years ago': df_2_yr_ago,
    'now': df_now
})
In [29]:
panel_data
Out[29]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 9 (major_axis) x 3 (minor_axis)
Items axis: 2 years ago to now
Major_axis axis: 0 to 8
Minor_axis axis: age to name
In [30]:
panel_data['now']
Out[30]:
age mastered_haki name
0 19 True Monkey D. Luffy
1 21 True Roronoa Zoro
2 20 False Nami
3 19 True Usopp
4 21 True Vinsmoke Sanji
5 17 False Tony Tony Chopper
6 30 False Nico Robin
7 36 False Franky
8 90 False Brook
In [31]:
panel_data['2 years ago']
Out[31]:
age mastered_haki name
0 17 False Monkey D. Luffy
1 19 False Roronoa Zoro
2 18 False Nami
3 17 False Usopp
4 19 False Vinsmoke Sanji
5 15 False Tony Tony Chopper
6 28 False Nico Robin
7 34 False Franky
8 88 False Brook

選擇 Data frame 中的元素

  • 可以透過中括號 [] 選擇元素
  • 也可以透過 . 將變數當作屬性選擇
In [32]:
import pandas as pd # 引用套件並縮寫為 pd

df = pd.DataFrame(
    {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
     "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
     "is_male": [True, True, False, True, True, True, False, True, True]
    }
)

print(df['name'])
print("\n")
print(df.name)
0      Monkey D. Luffy
1         Roronoa Zoro
2                 Nami
3                Usopp
4       Vinsmoke Sanji
5    Tony Tony Chopper
6           Nico Robin
7               Franky
8                Brook
Name: name, dtype: object


0      Monkey D. Luffy
1         Roronoa Zoro
2                 Nami
3                Usopp
4       Vinsmoke Sanji
5    Tony Tony Chopper
6           Nico Robin
7               Franky
8                Brook
Name: name, dtype: object

選擇 Data frame 中的元素(2)

  • 可以選擇多個變數
In [33]:
import pandas as pd # 引用套件並縮寫為 pd

df = pd.DataFrame(
    {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
     "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
     "is_male": [True, True, False, True, True, True, False, True, True]
    }
)
print(df[['name', 'is_male']])
                name  is_male
0    Monkey D. Luffy     True
1       Roronoa Zoro     True
2               Nami    False
3              Usopp     True
4     Vinsmoke Sanji     True
5  Tony Tony Chopper     True
6         Nico Robin    False
7             Franky     True
8              Brook     True

選擇 Data frame 中的元素(3)

  • [:] 範圍切割(range slicing)支援列資料
In [34]:
import pandas as pd # 引用套件並縮寫為 pd

df = pd.DataFrame(
    {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
     "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
     "is_male": [True, True, False, True, True, True, False, True, True]
    }
)
df[:5]
Out[34]:
age is_male name
0 19 True Monkey D. Luffy
1 21 True Roronoa Zoro
2 20 False Nami
3 19 True Usopp
4 21 True Vinsmoke Sanji
In [35]:
df[5:]
Out[35]:
age is_male name
5 17 True Tony Tony Chopper
6 30 False Nico Robin
7 36 True Franky
8 90 True Brook
In [36]:
df[0:7:2]
Out[36]:
age is_male name
0 19 True Monkey D. Luffy
2 20 False Nami
4 21 True Vinsmoke Sanji
6 30 False Nico Robin

選擇 Data frame 中的元素(4)

  • 不同的選擇方法:
    • .loc
    • .iloc
In [37]:
import pandas as pd # 引用套件並縮寫為 pd

df = pd.DataFrame(
    {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
     "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
     "is_male": [True, True, False, True, True, True, False, True, True]
    },
    index = list(range(5)) + list(range(10, 14))
)

df
Out[37]:
age is_male name
0 19 True Monkey D. Luffy
1 21 True Roronoa Zoro
2 20 False Nami
3 19 True Usopp
4 21 True Vinsmoke Sanji
10 17 True Tony Tony Chopper
11 30 False Nico Robin
12 36 True Franky
13 90 True Brook
In [38]:
df.loc[:7, ['age', 'name']]
Out[38]:
age name
0 19 Monkey D. Luffy
1 21 Roronoa Zoro
2 20 Nami
3 19 Usopp
4 21 Vinsmoke Sanji
In [39]:
df.iloc[:7, [0, 2]]
Out[39]:
age name
0 19 Monkey D. Luffy
1 21 Roronoa Zoro
2 20 Nami
3 19 Usopp
4 21 Vinsmoke Sanji
10 17 Tony Tony Chopper
11 30 Nico Robin

選擇 Data frame 中的元素(5)

  • 可以使用布林值篩選
In [40]:
import pandas as pd # 引用套件並縮寫為 pd

df = pd.DataFrame(
    {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
     "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
     "is_male": [True, True, False, True, True, True, False, True, True]
    }
)
# 篩選小於 30 歲的船員
age_filter = df.age < 30
df[age_filter]
Out[40]:
age is_male name
0 19 True Monkey D. Luffy
1 21 True Roronoa Zoro
2 20 False Nami
3 19 True Usopp
4 21 True Vinsmoke Sanji
5 17 True Tony Tony Chopper

選擇 Data frame 中的元素(6)

  • 請同學練習使用布林值篩選出草帽海賊團的熟男:
    • age >= 30
    • is_male == True

了解 DataFrame 的概觀

  • .shape
  • .index
  • .columns
  • .info()
  • .count()
In [41]:
import pandas as pd # 引用套件並縮寫為 pd

df = pd.DataFrame(
    {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
     "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
     "is_male": [True, True, False, True, True, True, False, True, True]
    }
)
print(df.shape)
print("\n")
print(df.index)
print("\n")
print(df.columns)
print("\n")
print(df.info())
print("\n")
print(df.count())
(9, 3)


RangeIndex(start=0, stop=9, step=1)


Index(['age', 'is_male', 'name'], dtype='object')


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 3 columns):
age        9 non-null int64
is_male    9 non-null bool
name       9 non-null object
dtypes: bool(1), int64(1), object(1)
memory usage: 233.0+ bytes
None


age        9
is_male    9
name       9
dtype: int64

了解 DataFrame 的概觀(2)

  • .head()
  • .tail()
  • .describe()
In [42]:
print(df.head(3))
print("\n")
print(df.tail(3))
print("\n")
print(df.describe())
   age  is_male             name
0   19     True  Monkey D. Luffy
1   21     True     Roronoa Zoro
2   20    False             Nami


   age  is_male        name
6   30    False  Nico Robin
7   36     True      Franky
8   90     True       Brook


             age
count   9.000000
mean   30.333333
std    23.205603
min    17.000000
25%    19.000000
50%    21.000000
75%    30.000000
max    90.000000

排序 DataFrame

  • .sort_index()
  • .sort_values()
In [43]:
import pandas as pd # 引用套件並縮寫為 pd

df = pd.DataFrame(
    {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
     "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
     "is_male": [True, True, False, True, True, True, False, True, True]
    }, columns = ['name', 'age', 'is_male']
)

df.sort_index(axis = 0, ascending = False)
Out[43]:
name age is_male
8 Brook 90 True
7 Franky 36 True
6 Nico Robin 30 False
5 Tony Tony Chopper 17 True
4 Vinsmoke Sanji 21 True
3 Usopp 19 True
2 Nami 20 False
1 Roronoa Zoro 21 True
0 Monkey D. Luffy 19 True
In [44]:
df.sort_index(axis = 1, ascending = False)
Out[44]:
name is_male age
0 Monkey D. Luffy True 19
1 Roronoa Zoro True 21
2 Nami False 20
3 Usopp True 19
4 Vinsmoke Sanji True 21
5 Tony Tony Chopper True 17
6 Nico Robin False 30
7 Franky True 36
8 Brook True 90
In [45]:
df.sort_values(by = 'age')
Out[45]:
name age is_male
5 Tony Tony Chopper 17 True
0 Monkey D. Luffy 19 True
3 Usopp 19 True
2 Nami 20 False
1 Roronoa Zoro 21 True
4 Vinsmoke Sanji 21 True
6 Nico Robin 30 False
7 Franky 36 True
8 Brook 90 True
In [46]:
df.sort_values(by = ['is_male', 'age'])
Out[46]:
name age is_male
2 Nami 20 False
6 Nico Robin 30 False
5 Tony Tony Chopper 17 True
0 Monkey D. Luffy 19 True
3 Usopp 19 True
1 Roronoa Zoro 21 True
4 Vinsmoke Sanji 21 True
7 Franky 36 True
8 Brook 90 True

處理遺漏值

  • .dropna()
  • .fillna()
In [47]:
import pandas as pd
import numpy as np

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook", np.NaN]
age = [19, 21, 20, 19, 21, 17, 30, 36, np.NaN, np.NaN]
is_male = [True, True, False, True, True, np.NaN, False, True, True, np.NaN]

straw_hat_dict = {"name": name,
                  "age": age,
                  "is_male": is_male
}

df = pd.DataFrame(straw_hat_dict, columns = ["name", "age", "is_male"])
df
Out[47]:
name age is_male
0 Monkey D. Luffy 19.0 True
1 Roronoa Zoro 21.0 True
2 Nami 20.0 False
3 Usopp 19.0 True
4 Vinsmoke Sanji 21.0 True
5 Tony Tony Chopper 17.0 NaN
6 Nico Robin 30.0 False
7 Franky 36.0 True
8 Brook NaN True
9 NaN NaN NaN
In [48]:
df.dropna(how = 'all')
Out[48]:
name age is_male
0 Monkey D. Luffy 19.0 True
1 Roronoa Zoro 21.0 True
2 Nami 20.0 False
3 Usopp 19.0 True
4 Vinsmoke Sanji 21.0 True
5 Tony Tony Chopper 17.0 NaN
6 Nico Robin 30.0 False
7 Franky 36.0 True
8 Brook NaN True
In [49]:
df.dropna(how = 'any')
Out[49]:
name age is_male
0 Monkey D. Luffy 19.0 True
1 Roronoa Zoro 21.0 True
2 Nami 20.0 False
3 Usopp 19.0 True
4 Vinsmoke Sanji 21.0 True
6 Nico Robin 30.0 False
7 Franky 36.0 True

處理遺漏值(2)

In [50]:
import pandas as pd
import numpy as np

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook", np.NaN]
age = [19, 21, 20, 19, 21, 17, 30, 36, np.NaN, np.NaN]
is_male = [True, True, False, True, True, np.NaN, False, True, True, np.NaN]

straw_hat_dict = {"name": name,
                  "age": age,
                  "is_male": is_male
}

df = pd.DataFrame(straw_hat_dict, columns = ["name", "age", "is_male"])
df = df.dropna(how = "all")
In [51]:
df['is_male'] = df['is_male'].fillna(True)
df['age'] = df['age'].fillna(90)
df
Out[51]:
name age is_male
0 Monkey D. Luffy 19.0 True
1 Roronoa Zoro 21.0 True
2 Nami 20.0 False
3 Usopp 19.0 True
4 Vinsmoke Sanji 21.0 True
5 Tony Tony Chopper 17.0 True
6 Nico Robin 30.0 False
7 Franky 36.0 True
8 Brook 90.0 True

合併

  • pandas.concat()
  • 垂直合併
In [52]:
df = pd.DataFrame(
    {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
     "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
     "is_male": [True, True, False, True, True, True, False, True, True]
    }, columns = ['name', 'age', 'is_male']
)

upper_df = df.loc[:5, :]
lower_df = df.loc[5:, :]
pd.concat([upper_df, lower_df], axis = 0)
Out[52]:
name age is_male
0 Monkey D. Luffy 19 True
1 Roronoa Zoro 21 True
2 Nami 20 False
3 Usopp 19 True
4 Vinsmoke Sanji 21 True
5 Tony Tony Chopper 17 True
5 Tony Tony Chopper 17 True
6 Nico Robin 30 False
7 Franky 36 True
8 Brook 90 True

合併(2)

  • pandas.concat()
  • 水平合併
In [53]:
df = pd.DataFrame(
    {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
     "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
     "is_male": [True, True, False, True, True, True, False, True, True]
    }, columns = ['name', 'age', 'is_male']
)

left_df = df.loc[:, "name":"age"]
right_df = df.loc[:, "is_male"]
pd.concat([left_df, right_df], axis = 1)
Out[53]:
name age is_male
0 Monkey D. Luffy 19 True
1 Roronoa Zoro 21 True
2 Nami 20 False
3 Usopp 19 True
4 Vinsmoke Sanji 21 True
5 Tony Tony Chopper 17 True
6 Nico Robin 30 False
7 Franky 36 True
8 Brook 90 True

合併(3)

  • pd.merge()
  • Inner Join
In [54]:
import pandas as pd

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, 90]
straw_hat_dict = {
    "name": name,
    "age": age
}

name = ["Monkey D. Luffy", "Tony Tony Chopper", "Nico Robin", "Brook", "Trafalgar D. Water Law"]
devil_fruit = ["Gum-Gum Fruit", "Human-Human Fruit", "Hana-Hana Fruit", "Revive-Revive Fruit", "Op-Op Fruit"]
devil_fruit_dict = {
    "name": name,
    "devil_fruit": devil_fruit
}

left_df = pd.DataFrame(straw_hat_dict)
right_df = pd.DataFrame(devil_fruit_dict)
inner_joined = pd.merge(left_df, right_df)
inner_joined
Out[54]:
age name devil_fruit
0 19 Monkey D. Luffy Gum-Gum Fruit
1 17 Tony Tony Chopper Human-Human Fruit
2 30 Nico Robin Hana-Hana Fruit
3 90 Brook Revive-Revive Fruit

合併(4)

  • pd.merge()
  • Left Join
In [55]:
left_joined = pd.merge(left_df, right_df, how = "left")
left_joined
Out[55]:
age name devil_fruit
0 19 Monkey D. Luffy Gum-Gum Fruit
1 21 Roronoa Zoro NaN
2 20 Nami NaN
3 19 Usopp NaN
4 21 Vinsmoke Sanji NaN
5 17 Tony Tony Chopper Human-Human Fruit
6 30 Nico Robin Hana-Hana Fruit
7 36 Franky NaN
8 90 Brook Revive-Revive Fruit

合併(5)

  • pd.merge()
  • Right Join
In [56]:
right_joined = pd.merge(left_df, right_df, how = "right")
right_joined
Out[56]:
age name devil_fruit
0 19.0 Monkey D. Luffy Gum-Gum Fruit
1 17.0 Tony Tony Chopper Human-Human Fruit
2 30.0 Nico Robin Hana-Hana Fruit
3 90.0 Brook Revive-Revive Fruit
4 NaN Trafalgar D. Water Law Op-Op Fruit

合併(6)

  • pd.merge()
  • Full Join
In [57]:
full_joined = pd.merge(left_df, right_df, how = "outer")
full_joined
Out[57]:
age name devil_fruit
0 19.0 Monkey D. Luffy Gum-Gum Fruit
1 21.0 Roronoa Zoro NaN
2 20.0 Nami NaN
3 19.0 Usopp NaN
4 21.0 Vinsmoke Sanji NaN
5 17.0 Tony Tony Chopper Human-Human Fruit
6 30.0 Nico Robin Hana-Hana Fruit
7 36.0 Franky NaN
8 90.0 Brook Revive-Revive Fruit
9 NaN Trafalgar D. Water Law Op-Op Fruit

Pandas 繪圖功能

In [58]:
import pandas as pd
import quandl

df = quandl.get("WIKI/AAPL", start_date="2016-01-01", end_date="2017-12-31")
df.head()
Out[58]:
Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume
Date
2016-01-04 102.61 105.368 102.00 105.35 67649387.0 0.0 1.0 99.136516 101.801154 98.547165 101.783763 67649387.0
2016-01-05 105.75 105.850 102.41 102.71 55790992.0 0.0 1.0 102.170223 102.266838 98.943286 99.233131 55790992.0
2016-01-06 100.56 102.370 99.87 100.70 68457388.0 0.0 1.0 97.155911 98.904640 96.489269 97.291172 68457388.0
2016-01-07 98.68 100.130 96.43 96.45 81094428.0 0.0 1.0 95.339552 96.740467 93.165717 93.185040 81094428.0
2016-01-08 98.55 99.110 96.76 96.96 70798016.0 0.0 1.0 95.213952 95.754996 93.484546 93.677776 70798016.0

Pandas 繪圖功能

  • plot.hist(bins = ) 直方圖
  • plot.line() 線圖
  • plot.scatter(x = , y = ) 散佈圖
  • plot.box() 盒鬚圖
  • plot.density() 密度圖
In [59]:
import matplotlib.pyplot as plt
import seaborn as sns
In [60]:
df["Volume"].plot.hist(bins = 40)
plt.show()
In [61]:
df[["Open", "High", "Low", "Close"]].plot.line()
plt.show()
In [62]:
df.plot.scatter(x = "Volume", y = "Close")
plt.show()
In [63]:
df[["Open", "High", "Low", "Close"]].plot.box()
plt.show()
In [64]:
df["Close"].plot.density()
plt.show()