python - Pandas: change data type of columns -
i want convert table, represented list of lists, pandas dataframe. extremely simplified example:
a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']] df = pd.dataframe(a)
what best way convert columns appropriate types, in case columns 2 , 3 floats? there way specify types while converting dataframe? or better create dataframe first , loop through columns change type each column? ideally in dynamic way because there can hundreds of columns , don't want specify columns of type. can guarantee each columns contains values of same type.
you can use pd.to_numeric
(introduced in version 0.17) convert column or series numeric type. function can applied on multiple columns of dataframe using apply
.
importantly, function takes errors
key word argument lets force not-numeric values nan
, or ignore columns containing these values.
example uses shown below.
individual column / series
here's example using series of strings s
has object dtype:
>>> s = pd.series(['1', '2', '4.7', 'pandas', '10']) >>> s 0 1 1 2 2 4.7 3 pandas 4 10 dtype: object
the function's default behaviour raise if can't convert value. in case, can't cope string 'pandas':
>>> pd.to_numeric(s) # or pd.to_numeric(s, errors='raise') valueerror: unable parse string
rather fail, might want 'pandas' considered missing/bad value. can coerce invalid values nan
follows:
>>> pd.to_numeric(s, errors='coerce') 0 1.0 1 2.0 2 4.7 3 nan 4 10.0 dtype: float64
the third option ignore operation if invalid value encountered:
>>> pd.to_numeric(s, errors='ignore') # original series returned untouched
multiple columns / entire dataframes
we might want apply operation multiple columns. processing each column in turn tedious, can use dataframe.apply
have function act on each column.
borrowing dataframe question:
>>> = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']] >>> df = pd.dataframe(a, columns=['col1','col2','col3']) >>> df col1 col2 col3 0 1.2 4.2 1 b 70 0.03 2 x 5 0
then can write:
df[['col2','col3']] = df[['col2','col3']].apply(pd.to_numeric)
and 'col2' , 'col3' have dtype float64
desired.
however, might not know of our columns can converted reliably numeric type. in case can write:
df.apply(pd.to_numeric, errors='ignore')
then function applied whole dataframe. columns can converted numeric type converted, while columns cannot (e.g. contain non-digit strings or dates) left alone.
there pd.to_datetime
, pd.to_timedelta
conversion dates , timestamps.
soft conversions
version 0.21.0 introduces method infer_objects()
converting columns of dataframe have object datatype more specific type.
for example, let's create dataframe 2 columns of object type, 1 holding integers , other holding strings of integers:
>>> df = pd.dataframe({'a': [7, 1, 5], 'b': ['3','2','1']}, dtype='object') >>> df.dtypes object b object dtype: object
then using infer_objects()
, can change type of column 'a' int64:
>>> df = df.infer_objects() >>> df.dtypes int64 b object dtype: object
column 'b' has been left alone since values strings, not integers. if wanted try , force conversion of both columns integer type, use df.astype(int)
instead.
Comments
Post a Comment