xlsファイルやcsvファイルを読み込むと、読み込んだ数値データ型が何故だかobjectになっている時が多々ある。objectのままだと数値データを使ってグラフを作成できないので、データ型をobjectからintやfloatに変換する必要がある。
import pandas as pd
a=list(range(31,54))
a.insert(0, 3)
df = pd.read_excel("myexcel.xls",sep='\t',thousands=',',usecols=a,na_values=['n/a', '--'])
df.head()
Country | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | … | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 182.842 | 198.736 | 200.069 | 223.737 | 235.731 | 267.177 | 277.498 | 334.621 | 362.857 | … | 482.629 | 493.073 | 506.215 | 517.858 | 533.394 | 552.063 | 574.127 | 599.933 | 629.88 | 664.452 |
1 | Afghanistan | NaN | 8.692 | 0.671 | 11.83 | 5.361 | 13.34 | 3.863 | 20.585 | 8.438 | … | 0.988 | 2.164 | 2.665 | 2.3 | 3 | 3.5 | 3.997 | 4.495 | 4.992 | 5.489 |
2 | Afghanistan | 178.756 | 220.013 | 246.21 | 304.926 | 345.817 | 427.495 | 517.509 | 607.227 | 711.759 | … | “1,226.57” | “1,318.48” | “1,377.54” | “1,418.13” | “1,488.86” | “1,595.05” | “1,733.54” | “1,902.11” | “2,096.95” | “2,322.65” |
3 | Afghanistan | 4.367 | 4.553 | 5.146 | 6.167 | 6.925 | 8.556 | 10.297 | 12.066 | 15.325 | … | 20.057 | 19.428 | 20.235 | 19.585 | 19.99 | 20.682 | 21.928 | 23.577 | 25.45 | 27.608 |
4 | Afghanistan | 18.835 | 20.852 | 21.557 | 24.858 | 26.983 | 31.404 | 33.252 | 40.403 | 44.323 | … | 64.276 | 66.384 | 69.449 | 72.648 | 76.158 | 80.47 | 85.426 | 91.113 | 97.643 | 105.158 |
5 rows × 24 columns
df.columns = list(map(str, df.columns))
df.set_index('Country', inplace=True)
df.dtypes
2002 object 2003 object 2004 object 2005 object 2006 object 2007 object 2008 object 2009 object 2010 object 2011 object 2012 object 2013 object 2014 object 2015 object 2016 object 2017 object 2018 object 2019 object 2020 object 2021 object 2022 object 2023 object 2024 object dtype: object
数値データが何故だかobjectになっているので、これを以下のようにしてfloat型に変えてやる必要がある。
df.apply(pd.to_numeric)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric() ValueError: Unable to parse string ""8,235.10"" During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) <ipython-input-5-6def46ca8df2> in <module> ----> 1 df.apply(pd.to_numeric) ~/.pyenv/versions/miniconda3-latest/envs/py368/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds) 6485 args=args, 6486 kwds=kwds) -> 6487 return op.get_result() 6488 6489 def applymap(self, func): ~/.pyenv/versions/miniconda3-latest/envs/py368/lib/python3.6/site-packages/pandas/core/apply.py in get_result(self) 149 return self.apply_raw() 150 --> 151 return self.apply_standard() 152 153 def apply_empty_result(self): ~/.pyenv/versions/miniconda3-latest/envs/py368/lib/python3.6/site-packages/pandas/core/apply.py in apply_standard(self) 255 256 # compute the result using the series generator --> 257 self.apply_series_generator() 258 259 # wrap results ~/.pyenv/versions/miniconda3-latest/envs/py368/lib/python3.6/site-packages/pandas/core/apply.py in apply_series_generator(self) 284 try: 285 for i, v in enumerate(series_gen): --> 286 results[i] = self.f(v) 287 keys.append(v.name) 288 except Exception as e: ~/.pyenv/versions/miniconda3-latest/envs/py368/lib/python3.6/site-packages/pandas/core/tools/numeric.py in to_numeric(arg, errors, downcast) 133 coerce_numeric = False if errors in ('ignore', 'raise') else True 134 values = lib.maybe_convert_numeric(values, set(), --> 135 coerce_numeric=coerce_numeric) 136 137 except Exception: pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric() ValueError: ('Unable to parse string ""8,235.10"" at position 6', 'occurred at index 2002')
恐らく、quotation mark””が邪魔をしているんだと思われるので、こいつを取り除いていやる。
for i, col in enumerate(df.columns):
df.iloc[:, i] = df.iloc[:, i].str.replace('"', '')
df.head()
2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | … | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | |||||||||||||||||||||
Afghanistan | 182.842 | 198.736 | 200.069 | 223.737 | 235.731 | 267.177 | 277.498 | 334.621 | 362.857 | 386.368 | … | 482.629 | 493.073 | 506.215 | 517.858 | 533.394 | 552.063 | 574.127 | 599.933 | 629.88 | 664.452 |
Afghanistan | NaN | 8.692 | 0.671 | 11.83 | 5.361 | 13.34 | 3.863 | 20.585 | 8.438 | 6.479 | … | 0.988 | 2.164 | 2.665 | 2.3 | 3 | 3.5 | 3.997 | 4.495 | 4.992 | 5.489 |
Afghanistan | 178.756 | 220.013 | 246.21 | 304.926 | 345.817 | 427.495 | 517.509 | 607.227 | 711.759 | 836.222 | … | 1,226.57 | 1,318.48 | 1,377.54 | 1,418.13 | 1,488.86 | 1,595.05 | 1,733.54 | 1,902.11 | 2,096.95 | 2,322.65 |
Afghanistan | 4.367 | 4.553 | 5.146 | 6.167 | 6.925 | 8.556 | 10.297 | 12.066 | 15.325 | 17.89 | … | 20.057 | 19.428 | 20.235 | 19.585 | 19.99 | 20.682 | 21.928 | 23.577 | 25.45 | 27.608 |
Afghanistan | 18.835 | 20.852 | 21.557 | 24.858 | 26.983 | 31.404 | 33.252 | 40.403 | 44.323 | 48.18 | … | 64.276 | 66.384 | 69.449 | 72.648 | 76.158 | 80.47 | 85.426 | 91.113 | 97.643 | 105.158 |
5 rows × 23 columns
df.apply(pd.to_numeric)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric() ValueError: Unable to parse string "8,235.10" During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) <ipython-input-7-6def46ca8df2> in <module> ----> 1 df.apply(pd.to_numeric) ~/.pyenv/versions/miniconda3-latest/envs/py368/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds) 6485 args=args, 6486 kwds=kwds) -> 6487 return op.get_result() 6488 6489 def applymap(self, func): ~/.pyenv/versions/miniconda3-latest/envs/py368/lib/python3.6/site-packages/pandas/core/apply.py in get_result(self) 149 return self.apply_raw() 150 --> 151 return self.apply_standard() 152 153 def apply_empty_result(self): ~/.pyenv/versions/miniconda3-latest/envs/py368/lib/python3.6/site-packages/pandas/core/apply.py in apply_standard(self) 255 256 # compute the result using the series generator --> 257 self.apply_series_generator() 258 259 # wrap results ~/.pyenv/versions/miniconda3-latest/envs/py368/lib/python3.6/site-packages/pandas/core/apply.py in apply_series_generator(self) 284 try: 285 for i, v in enumerate(series_gen): --> 286 results[i] = self.f(v) 287 keys.append(v.name) 288 except Exception as e: ~/.pyenv/versions/miniconda3-latest/envs/py368/lib/python3.6/site-packages/pandas/core/tools/numeric.py in to_numeric(arg, errors, downcast) 133 coerce_numeric = False if errors in ('ignore', 'raise') else True 134 values = lib.maybe_convert_numeric(values, set(), --> 135 coerce_numeric=coerce_numeric) 136 137 except Exception: pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric() ValueError: ('Unable to parse string "8,235.10" at position 6', 'occurred at index 2002')
たぶん、comma,が邪魔をしていると思われるのでこいつを取り除いてやる。
for i, col in enumerate(df.columns):
df.iloc[:, i] = df.iloc[:, i].str.replace(',', '')
df.head()
2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | … | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | |||||||||||||||||||||
Afghanistan | 182.842 | 198.736 | 200.069 | 223.737 | 235.731 | 267.177 | 277.498 | 334.621 | 362.857 | 386.368 | … | 482.629 | 493.073 | 506.215 | 517.858 | 533.394 | 552.063 | 574.127 | 599.933 | 629.88 | 664.452 |
Afghanistan | NaN | 8.692 | 0.671 | 11.83 | 5.361 | 13.34 | 3.863 | 20.585 | 8.438 | 6.479 | … | 0.988 | 2.164 | 2.665 | 2.3 | 3 | 3.5 | 3.997 | 4.495 | 4.992 | 5.489 |
Afghanistan | 178.756 | 220.013 | 246.21 | 304.926 | 345.817 | 427.495 | 517.509 | 607.227 | 711.759 | 836.222 | … | 1226.57 | 1318.48 | 1377.54 | 1418.13 | 1488.86 | 1595.05 | 1733.54 | 1902.11 | 2096.95 | 2322.65 |
Afghanistan | 4.367 | 4.553 | 5.146 | 6.167 | 6.925 | 8.556 | 10.297 | 12.066 | 15.325 | 17.89 | … | 20.057 | 19.428 | 20.235 | 19.585 | 19.99 | 20.682 | 21.928 | 23.577 | 25.45 | 27.608 |
Afghanistan | 18.835 | 20.852 | 21.557 | 24.858 | 26.983 | 31.404 | 33.252 | 40.403 | 44.323 | 48.18 | … | 64.276 | 66.384 | 69.449 | 72.648 | 76.158 | 80.47 | 85.426 | 91.113 | 97.643 | 105.158 |
5 rows × 23 columns
df = df.astype("float")
df.dtypes
2002 float64 2003 float64 2004 float64 2005 float64 2006 float64 2007 float64 2008 float64 2009 float64 2010 float64 2011 float64 2012 float64 2013 float64 2014 float64 2015 float64 2016 float64 2017 float64 2018 float64 2019 float64 2020 float64 2021 float64 2022 float64 2023 float64 2024 float64 dtype: object
df = df.loc['Japan']
df.head()
2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | … | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | |||||||||||||||||||||
Japan | 464134.700 | 471227.700 | 481616.900 | 489624.400 | 496577.00 | 504791.400 | 499271.400 | 472228.800 | 492023.400 | 491455.400 | … | 516932.500 | 520081.100 | 530111.900 | 534409.850 | 539630.510 | 542060.88 | 544974.610 | 547739.420 | 550621.780 | 553488.850 |
Japan | 0.118 | 1.528 | 2.205 | 1.663 | 1.42 | 1.654 | -1.094 | -5.416 | 4.192 | -0.115 | … | 1.223 | 0.609 | 1.929 | 0.811 | 0.977 | 0.45 | 0.538 | 0.507 | 0.526 | 0.521 |
Japan | 515986.200 | 515400.700 | 520965.500 | 524132.900 | 526879.70 | 531688.200 | 520715.700 | 489501.000 | 500354.000 | 491408.400 | … | 531319.800 | 535986.500 | 545122.000 | 549016.200 | 557821.980 | 567275.01 | 572865.670 | 578293.760 | 584636.480 | 591508.090 |
Japan | 4115.120 | 4445.660 | 4815.170 | 4755.410 | 4530.38 | 4515.260 | 5037.910 | 5231.380 | 5700.100 | 6157.460 | … | 4389.480 | 4926.670 | 4859.950 | 4971.930 | 5176.210 | 5495.42 | 5807.520 | 6133.860 | 6475.890 | 6848.810 |
Japan | 3567.750 | 3689.540 | 3872.400 | 4059.400 | 4241.64 | 4427.640 | 4464.400 | 4254.780 | 4484.790 | 4573.190 | … | 5137.290 | 5225.070 | 5427.100 | 5594.450 | 5749.550 | 5896.04 | 6050.970 | 6207.510 | 6369.490 | 6536.650 |
5 rows × 23 columns
df1 = df[:1]
df1
2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | … | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | |||||||||||||||||||||
Japan | 464134.7 | 471227.7 | 481616.9 | 489624.4 | 496577.0 | 504791.4 | 499271.4 | 472228.8 | 492023.4 | 491455.4 | … | 516932.5 | 520081.1 | 530111.9 | 534409.85 | 539630.51 | 542060.88 | 544974.61 | 547739.42 | 550621.78 | 553488.85 |
1 rows × 23 columns
df1.T.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f98432dc390>

from matplotlib.pyplot import *
from matplotlib.font_manager import FontProperties
from matplotlib import rcParams
style.use('ggplot')
fig, ax = subplots()
df1.T.plot(style='.-',ms=20,figsize=(20,10),fontsize=18,ax=ax)
ax.legend(["GDP"],loc=4, prop={'size': 46});

スポンサーリンク
スポンサーリンク