pandas 專題:能源指標

郭耀仁

來源

CourseraIntroduction to Data Science in Python 課程第三週作業

問題一

  • 讀入 excel 試算表:https://storage.googleapis.com/py_ml_datasets/Energy%20Indicators.xls 並儲存成 DataFrame,命名為 energy
    • 擷取這四個變數並重新命名:['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
    • Energy Supply 單位改為 gigajoules(1 petajoule = 1,000,000 gigajoules)
    • ... 的資料記錄為遺漏值 np.NaN
    • 重新命名國家名稱:
      • "Republic of Korea": "South Korea"
      • "United States of America": "United States"
      • "United Kingdom of Great Britain and Northern Ireland": "United Kingdom"
      • "China, Hong Kong Special Administrative Region": "Hong Kong"
      • 移除小括號或數字,像是 'Bolivia (Plurinational State of)': 'Bolivia'、'Switzerland17': 'Switzerland'

問題一(cont'd)

問題一(cont'd)

問題一(cont'd)

  • 利用國家名稱將上述三個 DataFrame (GDPEnergyScimEn)聯結
  • 只需要 2006-2015 年的 GDP 資料以及 ScimEn 中排名前 15 的國家
  • 資料框的索引值為國家名稱
  • 資料框的變數名稱為:['Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations', 'Citations per document', 'H index', 'Energy Supply', 'Energy Supply per Capita', '% Renewable', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015']
def answer_one():
    '''
    這個函數應該回傳一個外觀為 15x20 的資料框
    '''
    return "答案"

問題二

在聯結三個資料框的過程中總共失去多少個觀測值?

def answer_two():
    '''
    這個函數應該回傳一個數字
    '''
    return "答案"
In [1]:
%%HTML
<svg width="800" height="300">
  <circle cx="150" cy="180" r="80" fill-opacity="0.2" stroke="black" stroke-width="2" fill="blue" />
  <circle cx="200" cy="100" r="80" fill-opacity="0.2" stroke="black" stroke-width="2" fill="red" />
  <circle cx="100" cy="100" r="80" fill-opacity="0.2" stroke="black" stroke-width="2" fill="green" />
  <line x1="150" y1="125" x2="300" y2="150" stroke="black" stroke-width="2" fill="black" stroke-dasharray="5,3"/>
  <text  x="300" y="165" font-family="Verdana" font-size="35">除了這裡以外的所有觀測值!</text>
</svg>
除了這裡以外的所有觀測值!

以下問題都以 answer_one() 回傳的 15 個國家資料回答

In [2]:
# 現成的 answer_one()

def answer_one():
    import pandas as pd

    Top15 = pd.read_csv('https://storage.googleapis.com/py_ml_datasets/energy_indicator.csv', index_col = 0)
    return Top15
In [3]:
answer_one().head()
Out[3]:
Rank Documents Citable documents Citations Self-citations Citations per document H index Energy Supply Energy Supply per Capita % Renewable 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Country
China 1 127050 126767 597237 411683 4.70 138 1.271910e+11 93.0 19.75491 3.992331e+12 4.559041e+12 4.997775e+12 5.459247e+12 6.039659e+12 6.612490e+12 7.124978e+12 7.672448e+12 8.230121e+12 8.797999e+12
United States 2 96661 94747 792274 265436 8.20 230 9.083800e+10 286.0 11.57098 1.479230e+13 1.505540e+13 1.501149e+13 1.459484e+13 1.496437e+13 1.520402e+13 1.554216e+13 1.577367e+13 1.615662e+13 1.654857e+13
Japan 3 30504 30287 223024 61554 7.31 134 1.898400e+10 149.0 10.23282 5.496542e+12 5.617036e+12 5.558527e+12 5.251308e+12 5.498718e+12 5.473738e+12 5.569102e+12 5.644659e+12 5.642884e+12 5.669563e+12
United Kingdom 4 20944 20357 206091 37874 9.84 139 7.920000e+09 124.0 10.60047 2.419631e+12 2.482203e+12 2.470614e+12 2.367048e+12 2.403504e+12 2.450911e+12 2.479809e+12 2.533370e+12 2.605643e+12 2.666333e+12
Russian Federation 5 18534 18301 34266 12422 1.85 57 3.070900e+10 214.0 17.28868 1.385793e+12 1.504071e+12 1.583004e+12 1.459199e+12 1.524917e+12 1.589943e+12 1.645876e+12 1.666934e+12 1.678709e+12 1.616149e+12

問題三

2006 至 2015 年各國的平均 GDP 為何?

def answer_three():
    '''
    這個函數應該回傳一個有 15 個國家的 Series 取名為 avgGDP
    並以平均 GDP 遞減排序
    '''
    Top15 = answer_one()
    return "答案"

問題四

接續問題三,平均 GDP 排名第六的國家在 2015 年的 GDP 與 2006 年的 GDP 相差多少?

def answer_four():
    '''
    這個函數應該回傳一個數字
    '''
    Top15 = answer_one()
    avgGDP = answer_three()
    return "答案"

問題五

平均 Energy Supply per Capita 為何?

def answer_five():
    '''
    這個函數應該回傳一個數字
    '''
    Top15 = answer_one()
    return "答案"

問題六

哪個國家的 % Renewable 最高?比例為何?

def answer_six():
    '''
    這個函數應該回傳一個 tuple 其中包含國家名稱與比例
    '''
    Top15 = answer_one()
    return "答案"

問題七

建立一個新變數計算 Self-Citations 除以 Total Citations 的比例,哪個國家最高?該比例為何?

def answer_seven():
    '''
    這個函數應該回傳一個 tuple 其中包含國家名稱與比率
    '''
    Top15 = answer_one()
    return "答案"

問題八

建立一個新變數利用 Energy SupplyEnergy Supply per capita 來估算國家人口數,並回傳人口數第三多的國家。

def answer_eight():
    '''
    這個函數應該回傳一個文字:國家名稱
    '''
    Top15 = answer_one()
    return "答案"

問題九

建立一個新變數估算每人引用期刊文章數 citable documents per capita,並計算該變數與 energy supply per capita 的相關係數(使用 .corr() 方法。)

def answer_nine():
    '''
    這個函數應該回傳一個數值
    '''
    Top15 = answer_one()
    return "答案"

問題十

建立一個新變數,假如國家的 % Renewable 大於等於中位數則為 1,未滿中位數則為 0,將數值為 1 的觀測值選出。

def answer_ten():
    '''
    這個函數應該回傳一個 Series 稱為 HighRenew,索引值為國家名稱,並依照 `% Renewable` 排序
    '''
    Top15 = answer_one()
    return "答案"

問題十一

運用以下的 dict 將國家依照洲別分組,接著建立新的 DataFrame 計算各洲的國家數、人口總數、人口平均數與人口標準差。

def answer_eleven():
    '''
    這個函數應該回傳一個 DataFrame
    索引值為 ['Asia', 'Australia', 'Europe', 'North America', 'South America']
    變數名稱為 ['size', 'sum', 'mean', 'std']
    '''
    Top15 = answer_one()
    return "答案"
In [4]:
ContinentDict  = {'China':'Asia', 
                  'United States':'North America', 
                  'Japan':'Asia', 
                  'United Kingdom':'Europe', 
                  'Russian Federation':'Europe', 
                  'Canada':'North America', 
                  'Germany':'Europe', 
                  'India':'Asia',
                  'France':'Europe', 
                  'South Korea':'Asia', 
                  'Italy':'Europe', 
                  'Spain':'Europe', 
                  'Iran':'Asia',
                  'Australia':'Australia', 
                  'Brazil':'South America'}

問題十二

% Renewable 均勻分為五組(.cut() 方法),同時利用這個組別與洲別建立一個 multi-index 的資料框,計算各組中有多少個國家,不需列出無國家的組別。

def answer_twelve():
    '''
    這個函數應該回傳一個 Series 包含兩組索引值:洲別與再生能源比率
    '''
    Top15 = answer_one()
    return "答案"

問題十三

將人口估算變數改為字串,並且利用逗號分隔千分位數。

e.g. 317615384.61538464 -> 317,615,384.61538464

def get_comma_format(x):
    '''
    將數字轉換為千分位格式的文字
    '''
    return '{:,}'.format(x)

def answer_thirteen():
    '''
    這個函數應該回傳一個 Series PopEst,索引值為國家名稱
    '''
    Top15 = answer_one()
    return "答案"

參考解答

In [3]:
def answer_one():
    import pandas as pd
    
    # get energy
    energy = pd.read_excel('https://storage.googleapis.com/py_ml_datasets/Energy%20Indicators.xls', header = None, skip_footer = 38, skiprows = 18, na_values = '...')
    energy = energy.iloc[:, range(2, 6)]
    energy.columns = ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
    energy['Energy Supply'] = energy['Energy Supply'] * 1000000
    sk_idx = energy[energy['Country'] == 'Republic of Korea'].index
    us_idx = energy[energy['Country'] == 'United States of America20'].index
    uk_idx = energy[energy['Country'] == 'United Kingdom of Great Britain and Northern Ireland19'].index
    hk_idx = energy[energy['Country'] == 'China, Hong Kong Special Administrative Region3'].index
    energy.at[sk_idx, 'Country'] = 'South Korea'
    energy.at[us_idx, 'Country'] = 'United States'
    energy.at[uk_idx, 'Country'] = 'United Kingdom'
    energy.at[hk_idx, 'Country'] = 'Hong Kong'
    energy['Country'] = energy['Country'].str.replace('[0-9]+$', '')
    energy['Country'] = energy['Country'].str.replace('\s\(.+\)$', '')
    energy = energy.set_index('Country') # energy done!
    
    # get GDP
    GDP = pd.read_csv('https://storage.googleapis.com/py_ml_datasets/world_bank.csv', skiprows = 4)
    sk_idx = GDP[GDP['Country Name'] == 'Korea, Rep.'].index
    ir_idx = GDP[GDP['Country Name'] == 'Iran, Islamic Rep.'].index
    hk_idx = GDP[GDP['Country Name'] == 'Hong Kong SAR, China'].index
    GDP.at[sk_idx, 'Country Name'] = 'South Korea'
    GDP.at[ir_idx, 'Country Name'] = 'Iran'
    GDP.at[hk_idx, 'Country Name'] = 'Hong Kong'
    GDP = GDP.set_index('Country Name') # GDP done!
    
    # get ScimEn
    ScimEn = pd.read_excel('https://storage.googleapis.com/py_ml_datasets/scimagojr-3.xlsx') # ScimEn done!
    
    # get merged_df
    ScimEn_top_15 = ScimEn.iloc[:15, :]
    ScimEn_top_15 = ScimEn_top_15.set_index('Country')
    GDP_2006_2015 = GDP.loc[:, '2006':'2015']
    merged_df = ScimEn_top_15.join(energy)
    merged_df = merged_df.join(GDP_2006_2015)
    return merged_df
In [6]:
answer_one()
Out[6]:
Rank Documents Citable documents Citations Self-citations Citations per document H index Energy Supply Energy Supply per Capita % Renewable 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Country
China 1 127050 126767 597237 411683 4.70 138 1.271910e+11 93.0 19.754910 3.992331e+12 4.559041e+12 4.997775e+12 5.459247e+12 6.039659e+12 6.612490e+12 7.124978e+12 7.672448e+12 8.230121e+12 8.797999e+12
United States 2 96661 94747 792274 265436 8.20 230 9.083800e+10 286.0 11.570980 1.479230e+13 1.505540e+13 1.501149e+13 1.459484e+13 1.496437e+13 1.520402e+13 1.554216e+13 1.577367e+13 1.615662e+13 1.654857e+13
Japan 3 30504 30287 223024 61554 7.31 134 1.898400e+10 149.0 10.232820 5.496542e+12 5.617036e+12 5.558527e+12 5.251308e+12 5.498718e+12 5.473738e+12 5.569102e+12 5.644659e+12 5.642884e+12 5.669563e+12
United Kingdom 4 20944 20357 206091 37874 9.84 139 7.920000e+09 124.0 10.600470 2.419631e+12 2.482203e+12 2.470614e+12 2.367048e+12 2.403504e+12 2.450911e+12 2.479809e+12 2.533370e+12 2.605643e+12 2.666333e+12
Russian Federation 5 18534 18301 34266 12422 1.85 57 3.070900e+10 214.0 17.288680 1.385793e+12 1.504071e+12 1.583004e+12 1.459199e+12 1.524917e+12 1.589943e+12 1.645876e+12 1.666934e+12 1.678709e+12 1.616149e+12
Canada 6 17899 17620 215003 40930 12.01 149 1.043100e+10 296.0 61.945430 1.564469e+12 1.596740e+12 1.612713e+12 1.565145e+12 1.613406e+12 1.664087e+12 1.693133e+12 1.730688e+12 1.773486e+12 1.792609e+12
Germany 7 17027 16831 140566 27426 8.26 126 1.326100e+10 165.0 17.901530 3.332891e+12 3.441561e+12 3.478809e+12 3.283340e+12 3.417298e+12 3.542371e+12 3.556724e+12 3.567317e+12 3.624386e+12 3.685556e+12
India 8 15005 14841 128763 37209 8.58 115 3.319500e+10 26.0 14.969080 1.265894e+12 1.374865e+12 1.428361e+12 1.549483e+12 1.708459e+12 1.821872e+12 1.924235e+12 2.051982e+12 2.200617e+12 2.367206e+12
France 9 13153 12973 130632 28601 9.93 114 1.059700e+10 166.0 17.020280 2.607840e+12 2.669424e+12 2.674637e+12 2.595967e+12 2.646995e+12 2.702032e+12 2.706968e+12 2.722567e+12 2.729632e+12 2.761185e+12
South Korea 10 11983 11923 114675 22595 9.57 104 1.100700e+10 221.0 2.279353 9.410199e+11 9.924316e+11 1.020510e+12 1.027730e+12 1.094499e+12 1.134796e+12 1.160809e+12 1.194429e+12 1.234340e+12 1.266580e+12
Italy 11 10964 10794 111850 26661 10.20 106 6.530000e+09 109.0 33.667230 2.202170e+12 2.234627e+12 2.211154e+12 2.089938e+12 2.125185e+12 2.137439e+12 2.077184e+12 2.040871e+12 2.033868e+12 2.049316e+12
Spain 12 9428 9330 123336 23964 13.08 115 4.923000e+09 106.0 37.968590 1.414823e+12 1.468146e+12 1.484530e+12 1.431475e+12 1.431673e+12 1.417355e+12 1.380216e+12 1.357139e+12 1.375605e+12 1.419821e+12
Iran 13 8896 8819 57470 19125 6.46 72 9.172000e+09 119.0 5.707721 3.895523e+11 4.250646e+11 4.289909e+11 4.389208e+11 4.677902e+11 4.853309e+11 4.532569e+11 4.445926e+11 4.639027e+11 NaN
Australia 14 8831 8725 90765 15606 10.28 107 5.386000e+09 231.0 11.810810 1.021939e+12 1.060340e+12 1.099644e+12 1.119654e+12 1.142251e+12 1.169431e+12 1.211913e+12 1.241484e+12 1.272520e+12 1.301251e+12
Brazil 15 8668 8596 60702 14396 7.00 86 1.214900e+10 59.0 69.648030 1.845080e+12 1.957118e+12 2.056809e+12 2.054215e+12 2.208872e+12 2.295245e+12 2.339209e+12 2.409740e+12 2.412231e+12 2.319423e+12
In [4]:
def answer_two():
    import pandas as pd
    
    # get energy
    energy = pd.read_excel('https://storage.googleapis.com/py_ml_datasets/Energy%20Indicators.xls', header = None, skip_footer = 38, skiprows = 18, na_values = '...')
    energy = energy.iloc[:, range(2, 6)]
    energy.columns = ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
    energy['Energy Supply'] = energy['Energy Supply'] * 1000000
    sk_idx = energy[energy['Country'] == 'Republic of Korea'].index
    us_idx = energy[energy['Country'] == 'United States of America20'].index
    uk_idx = energy[energy['Country'] == 'United Kingdom of Great Britain and Northern Ireland19'].index
    hk_idx = energy[energy['Country'] == 'China, Hong Kong Special Administrative Region3'].index
    energy.at[sk_idx, 'Country'] = 'South Korea'
    energy.at[us_idx, 'Country'] = 'United States'
    energy.at[uk_idx, 'Country'] = 'United Kingdom'
    energy.at[hk_idx, 'Country'] = 'Hong Kong'
    energy['Country'] = energy['Country'].str.replace('[0-9]+$', '')
    energy['Country'] = energy['Country'].str.replace('\s\(.+\)$', '')
    energy = energy.set_index('Country') # energy done!
    
    # get GDP
    GDP = pd.read_csv('https://storage.googleapis.com/py_ml_datasets/world_bank.csv', skiprows = 4)
    sk_idx = GDP[GDP['Country Name'] == 'Korea, Rep.'].index
    ir_idx = GDP[GDP['Country Name'] == 'Iran, Islamic Rep.'].index
    hk_idx = GDP[GDP['Country Name'] == 'Hong Kong SAR, China'].index
    GDP.at[sk_idx, 'Country Name'] = 'South Korea'
    GDP.at[ir_idx, 'Country Name'] = 'Iran'
    GDP.at[hk_idx, 'Country Name'] = 'Hong Kong'
    GDP = GDP.set_index('Country Name') # GDP done!
    
    # get ScimEn
    ScimEn = pd.read_excel('https://storage.googleapis.com/py_ml_datasets/scimagojr-3.xlsx') # ScimEn done!
    ScimEn = ScimEn.set_index('Country')
    
    # get outer joined DataFrames
    GDP_2006_2015 = GDP.loc[:, '2006':'2015']
    outer_joined_df = ScimEn.join(energy, how = 'outer')
    outer_joined_df = outer_joined_df.join(GDP_2006_2015, how = 'outer')
    nrow_outer = outer_joined_df.shape[0]
    
    # get inner joined DataFrames
    inner_joined_df = answer_one()
    nrow_inner = inner_joined_df.shape[0]
    return nrow_outer - nrow_inner
In [5]:
answer_two()
Out[5]:
303
In [9]:
def answer_three():
    import pandas as pd
    
    Top15 = answer_one()
    gdp_cols = list(Top15.columns)[10:]
    Top15_gdp = Top15.loc[:, gdp_cols]
    Top15_gdp = Top15_gdp.reset_index()
    melted_df = pd.melt(Top15_gdp, id_vars = ['Country'], var_name = ['Year'], value_name = 'GDP')
    grouped = melted_df.groupby(['Country'])
    avgGDP = grouped.mean().GDP
    avgGDP = avgGDP.sort_values(ascending = False)
    return avgGDP
In [10]:
answer_three()
Out[10]:
Country
United States         1.536434e+13
China                 6.348609e+12
Japan                 5.542208e+12
Germany               3.493025e+12
France                2.681725e+12
United Kingdom        2.487907e+12
Brazil                2.189794e+12
Italy                 2.120175e+12
India                 1.769297e+12
Canada                1.660647e+12
Russian Federation    1.565459e+12
Spain                 1.418078e+12
Australia             1.164043e+12
South Korea           1.106715e+12
Iran                  4.441558e+11
Name: GDP, dtype: float64
In [11]:
def answer_four():
    import pandas as pd
    
    Top15 = answer_one()
    avgGDP = answer_three()
    rank_6th_country = avgGDP.index[5]
    rank_6th = Top15.loc[rank_6th_country, :]
    rank_6th_2006 = rank_6th['2006']
    rank_6th_2015 = rank_6th['2015']
    change = rank_6th_2015 - rank_6th_2006
    return change
In [12]:
answer_four()
Out[12]:
246702696075.3999
In [13]:
def answer_five():
    import pandas as pd
    
    Top15 = answer_one()
    ans = Top15['Energy Supply per Capita'].mean()
    return ans
In [14]:
answer_five()
Out[14]:
157.59999999999999
In [15]:
def answer_six():
    import pandas as pd
    
    Top15 = answer_one()
    max_country = Top15['% Renewable'].idxmax()
    max_percentage = Top15['% Renewable'].max()
    return max_country, max_percentage
In [16]:
answer_six()
Out[16]:
('Brazil', 69.648030000000006)
In [17]:
def answer_seven():
    import pandas as pd
    
    Top15 = answer_one()
    Top15['Self-citation-ratio'] = Top15['Self-citations'] / Top15['Citations']
    max_self_citation_country = Top15['Self-citation-ratio'].idxmax()
    max_self_citation_ratio = Top15['Self-citation-ratio'].max()
    return max_self_citation_country, max_self_citation_ratio
In [18]:
answer_seven()
Out[18]:
('China', 0.68931261793894216)
In [19]:
def answer_eight():
    import pandas as pd
    
    Top15 = answer_one()
    Top15['Pop'] = Top15['Energy Supply'] / Top15['Energy Supply per Capita']
    pop_series = Top15['Pop'].sort_values(ascending = False)
    rank_3rd = pop_series.index[2]
    return rank_3rd
In [20]:
answer_eight()
Out[20]:
'United States'
In [21]:
def answer_nine():
    import pandas as pd
    
    Top15 = answer_one()
    Top15['Pop'] = Top15['Energy Supply'] / Top15['Energy Supply per Capita']
    Top15['Citable docs per Capita'] = Top15['Citable documents'] / Top15['Pop']
    ans = Top15['Citable docs per Capita'].corr(Top15['Energy Supply per Capita'])
    return ans
In [22]:
answer_nine()
Out[22]:
0.79400104354429424
In [23]:
def plot9():
    import matplotlib as plt
    %matplotlib inline
    
    Top15 = answer_one()
    Top15['PopEst'] = Top15['Energy Supply'] / Top15['Energy Supply per Capita']
    Top15['Citable docs per Capita'] = Top15['Citable documents'] / Top15['PopEst']
    Top15.plot(x='Citable docs per Capita', y='Energy Supply per Capita', kind='scatter', xlim=[0, 0.0006])
In [24]:
plot9()
In [25]:
def answer_ten():
    import numpy as np
    import pandas as pd
    
    Top15 = answer_one()
    renew_median = Top15['% Renewable'].median()
    Top15['HighRenewFlag'] = np.where(Top15['% Renewable'] >= renew_median, 1, 0)
    HighRenew = Top15[['HighRenewFlag', '% Renewable']][Top15['HighRenewFlag'] == 1]
    HighRenew = HighRenew.sort_values('% Renewable', ascending = False)['% Renewable']
    return HighRenew
In [26]:
answer_ten()
Out[26]:
Country
Brazil                69.64803
Canada                61.94543
Spain                 37.96859
Italy                 33.66723
China                 19.75491
Germany               17.90153
Russian Federation    17.28868
France                17.02028
Name: % Renewable, dtype: float64
In [27]:
def answer_eleven():
    import pandas as pd
    
    ContinentDict = {'China':'Asia', 
                     'United States':'North America', 
                     'Japan':'Asia', 
                     'United Kingdom':'Europe', 
                     'Russian Federation':'Europe', 
                     'Canada':'North America', 
                     'Germany':'Europe', 
                     'India':'Asia',
                     'France':'Europe', 
                     'South Korea':'Asia', 
                     'Italy':'Europe', 
                     'Spain':'Europe', 
                     'Iran':'Asia',
                     'Australia':'Australia', 
                     'Brazil':'South America'}
    Top15 = answer_one()
    Continent = pd.Series(Top15.index).map(ContinentDict).values
    Top15['Continent'] = Continent
    Top15['Pop'] = Top15['Energy Supply'] / Top15['Energy Supply per Capita']
    continent_df = Top15[['Continent', 'Pop']].reset_index()
    grouped = continent_df.groupby('Continent')
    size = grouped.count()['Country'].reset_index(name = "size")
    summ = grouped.sum()['Pop'].reset_index(name = "sum")
    mean = grouped.mean()['Pop'].reset_index(name = "mean")
    std = grouped.std()['Pop'].reset_index(name = "std")
    ans_df = pd.merge(size, summ)
    ans_df = pd.merge(ans_df, mean)
    ans_df = pd.merge(ans_df, std)
    ans_df = ans_df.set_index('Continent')
    return ans_df
In [28]:
answer_eleven()
Out[28]:
size sum mean std
Continent
Asia 5 2.898666e+09 5.797333e+08 6.790979e+08
Australia 1 2.331602e+07 2.331602e+07 NaN
Europe 6 4.579297e+08 7.632161e+07 3.464767e+07
North America 2 3.528552e+08 1.764276e+08 1.996696e+08
South America 1 2.059153e+08 2.059153e+08 NaN
In [29]:
def answer_twelve():
    import pandas as pd
    
    ContinentDict = {'China':'Asia', 
                     'United States':'North America', 
                     'Japan':'Asia', 
                     'United Kingdom':'Europe', 
                     'Russian Federation':'Europe', 
                     'Canada':'North America', 
                     'Germany':'Europe', 
                     'India':'Asia',
                     'France':'Europe', 
                     'South Korea':'Asia', 
                     'Italy':'Europe', 
                     'Spain':'Europe', 
                     'Iran':'Asia',
                     'Australia':'Australia', 
                     'Brazil':'South America'}
    Top15 = answer_one()
    Continent = pd.Series(Top15.index).map(ContinentDict).values
    Top15['Continent'] = Continent
    bins = pd.cut(Top15['% Renewable'], bins = 5, labels = ['Lowest', 'Lower', 'Medium', 'Higher', 'Highest'])
    Top15['Renewable Level'] = bins
    renewable_df = Top15[['Continent', 'Renewable Level']].reset_index()
    grouped = renewable_df.groupby(['Continent', 'Renewable Level'])
    ans_ser = grouped['Country'].count()
    return ans_ser
In [30]:
answer_twelve()
Out[30]:
Continent      Renewable Level
Asia           Lowest             4
               Lower              1
Australia      Lowest             1
Europe         Lowest             1
               Lower              3
               Medium             2
North America  Lowest             1
               Highest            1
South America  Highest            1
Name: Country, dtype: int64
In [31]:
def get_comma_format(x):
    '''
    將數字轉換為千分位格式的文字
    '''
    return '{:,}'.format(x)

def answer_thirteen():
    import pandas as pd
    
    Top15 = answer_one()
    Top15['PopEst'] = Top15['Energy Supply'] / Top15['Energy Supply per Capita']
    PopEst = Top15['PopEst'].apply(get_comma_format)
    return PopEst
In [32]:
answer_thirteen()
Out[32]:
Country
China                 1,367,645,161.2903225
United States          317,615,384.61538464
Japan                  127,409,395.97315437
United Kingdom         63,870,967.741935484
Russian Federation            143,500,000.0
Canada                  35,239,864.86486486
Germany                 80,369,696.96969697
India                 1,276,730,769.2307692
France                  63,837,349.39759036
South Korea            49,805,429.864253394
Italy                  59,908,256.880733944
Spain                    46,443,396.2264151
Iran                    77,075,630.25210084
Australia              23,316,017.316017315
Brazil                 205,915,254.23728815
Name: PopEst, dtype: object
In [33]:
def plot_optional():
    import matplotlib as plt
    %matplotlib inline
    Top15 = answer_one()
    ax = Top15.plot(x='Rank', y='% Renewable', kind='scatter', 
                    c=['#e41a1c','#377eb8','#e41a1c','#4daf4a','#4daf4a','#377eb8','#4daf4a','#e41a1c',
                       '#4daf4a','#e41a1c','#4daf4a','#4daf4a','#e41a1c','#dede00','#ff7f00'], 
                    xticks=range(1,16), s=6*Top15['2014']/10**10, alpha=.75, figsize=[16,6]);

    for i, txt in enumerate(Top15.index):
        ax.annotate(txt, [Top15['Rank'][i], Top15['% Renewable'][i]], ha='center')

    print("This is an example of a visualization that can be created to help understand the data. \
This is a bubble chart showing % Renewable vs. Rank. The size of the bubble corresponds to the countries' \
2014 GDP, and the color corresponds to the continent.")
In [34]:
plot_optional()
This is an example of a visualization that can be created to help understand the data. This is a bubble chart showing % Renewable vs. Rank. The size of the bubble corresponds to the countries' 2014 GDP, and the color corresponds to the continent.