python - Pandas: change data type of columns -

- July 15, 2010

i want convert table, represented list of lists, pandas dataframe. extremely simplified example:

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']] df = pd.dataframe(a)

what best way convert columns appropriate types, in case columns 2 , 3 floats? there way specify types while converting dataframe? or better create dataframe first , loop through columns change type each column? ideally in dynamic way because there can hundreds of columns , don't want specify columns of type. can guarantee each columns contains values of same type.

you can use pd.to_numeric (introduced in version 0.17) convert column or series numeric type. function can applied on multiple columns of dataframe using apply.

importantly, function takes errors key word argument lets force not-numeric values nan, or ignore columns containing these values.

example uses shown below.

individual column / series

here's example using series of strings s has object dtype:

>>> s = pd.series(['1', '2', '4.7', 'pandas', '10']) >>> s 0         1 1         2 2       4.7 3    pandas 4        10 dtype: object

the function's default behaviour raise if can't convert value. in case, can't cope string 'pandas':

>>> pd.to_numeric(s) # or pd.to_numeric(s, errors='raise') valueerror: unable parse string

rather fail, might want 'pandas' considered missing/bad value. can coerce invalid values nan follows:

>>> pd.to_numeric(s, errors='coerce') 0     1.0 1     2.0 2     4.7 3     nan 4    10.0 dtype: float64

the third option ignore operation if invalid value encountered:

>>> pd.to_numeric(s, errors='ignore') # original series returned untouched

multiple columns / entire dataframes

we might want apply operation multiple columns. processing each column in turn tedious, can use dataframe.apply have function act on each column.

borrowing dataframe question:

>>> = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']] >>> df = pd.dataframe(a, columns=['col1','col2','col3']) >>> df   col1 col2  col3 0     1.2   4.2 1    b   70  0.03 2    x    5     0

then can write:

df[['col2','col3']] = df[['col2','col3']].apply(pd.to_numeric)

and 'col2' , 'col3' have dtype float64 desired.

however, might not know of our columns can converted reliably numeric type. in case can write:

df.apply(pd.to_numeric, errors='ignore')

then function applied whole dataframe. columns can converted numeric type converted, while columns cannot (e.g. contain non-digit strings or dates) left alone.

there pd.to_datetime , pd.to_timedelta conversion dates , timestamps.

soft conversions

version 0.21.0 introduces method infer_objects() converting columns of dataframe have object datatype more specific type.

for example, let's create dataframe 2 columns of object type, 1 holding integers , other holding strings of integers:

>>> df = pd.dataframe({'a': [7, 1, 5], 'b': ['3','2','1']}, dtype='object') >>> df.dtypes    object b    object dtype: object

then using infer_objects(), can change type of column 'a' int64:

>>> df = df.infer_objects() >>> df.dtypes     int64 b    object dtype: object

column 'b' has been left alone since values strings, not integers. if wanted try , force conversion of both columns integer type, use df.astype(int) instead.

Search This Blog

Kiastu