python - Pandas: change data type of columns -
i want convert table, represented list of lists, pandas dataframe. extremely simplified example:
a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']] df = pd.dataframe(a) what best way convert columns appropriate types, in case columns 2 , 3 floats? there way specify types while converting dataframe? or better create dataframe first , loop through columns change type each column? ideally in dynamic way because there can hundreds of columns , don't want specify columns of type. can guarantee each columns contains values of same type.
you can use pd.to_numeric (introduced in version 0.17) convert column or series numeric type. function can applied on multiple columns of dataframe using apply.
importantly, function takes errors key word argument lets force not-numeric values nan, or ignore columns containing these values.
example uses shown below.
individual column / series
here's example using series of strings s has object dtype:
>>> s = pd.series(['1', '2', '4.7', 'pandas', '10']) >>> s 0 1 1 2 2 4.7 3 pandas 4 10 dtype: object the function's default behaviour raise if can't convert value. in case, can't cope string 'pandas':
>>> pd.to_numeric(s) # or pd.to_numeric(s, errors='raise') valueerror: unable parse string rather fail, might want 'pandas' considered missing/bad value. can coerce invalid values nan follows:
>>> pd.to_numeric(s, errors='coerce') 0 1.0 1 2.0 2 4.7 3 nan 4 10.0 dtype: float64 the third option ignore operation if invalid value encountered:
>>> pd.to_numeric(s, errors='ignore') # original series returned untouched multiple columns / entire dataframes
we might want apply operation multiple columns. processing each column in turn tedious, can use dataframe.apply have function act on each column.
borrowing dataframe question:
>>> = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']] >>> df = pd.dataframe(a, columns=['col1','col2','col3']) >>> df col1 col2 col3 0 1.2 4.2 1 b 70 0.03 2 x 5 0 then can write:
df[['col2','col3']] = df[['col2','col3']].apply(pd.to_numeric) and 'col2' , 'col3' have dtype float64 desired.
however, might not know of our columns can converted reliably numeric type. in case can write:
df.apply(pd.to_numeric, errors='ignore') then function applied whole dataframe. columns can converted numeric type converted, while columns cannot (e.g. contain non-digit strings or dates) left alone.
there pd.to_datetime , pd.to_timedelta conversion dates , timestamps.
soft conversions
version 0.21.0 introduces method infer_objects() converting columns of dataframe have object datatype more specific type.
for example, let's create dataframe 2 columns of object type, 1 holding integers , other holding strings of integers:
>>> df = pd.dataframe({'a': [7, 1, 5], 'b': ['3','2','1']}, dtype='object') >>> df.dtypes object b object dtype: object then using infer_objects(), can change type of column 'a' int64:
>>> df = df.infer_objects() >>> df.dtypes int64 b object dtype: object column 'b' has been left alone since values strings, not integers. if wanted try , force conversion of both columns integer type, use df.astype(int) instead.
Comments
Post a Comment