Use the to_numeric function to convert a string-like representation of a number to a numeric type. It can be used with the string columns of the DataFrame in DataFrame assign method.
Example Setup
Create a DataFrame with all string type columns.
>>> load_example_data("dataframe", "numeric_strings")
>>> df = DataFrame('numeric_strings')
>>> df
hex_col decimal_col commas_col numbers_col id_col 2 ABCDEFAB 0.7.7 ,088 999 0 19FF 00.77 08,8 1 1 abcd 0.77 0,88 1 3 2018 .077 088, 0
>>> df.dtypes
id_col int hex_col str decimal_col str commas_col str numbers_col str
Example 1: Convert to Numeric Type
Except for the id column, the columns in the DataFrame are all string types. To use the DataFrame in a numerical calculation, they first need to be converted to a numeric type.
>>> tdf = df.assign(drop_columns = True, numbers = df.numbers_col, numeric = to_numeric(df.numbers_col))
>>> tdf
numbers numeric 0 1 1 1 999 999 2 0 0 3 1 1
>>> tdf.dtypes
numbers str numeric decimal.Decimal
Example 2: Use Optional format_ Keyword when Converting
The to_numeric function may not be able to parse the string into a numeric value if the string has an unrecognizable format. It returns None in this case.
This example converts decimal-like strings to numeric.
>>> df.assign(drop_columns = True, decimal = df.decimal_col, numeric_dec = to_numeric(df.decimal_col))
decimal numeric_dec 0 0.7.7 None 1 00.77 .77 2 0.77 .77 3 .077 .077
You can control which strings are recognizable by passing a format string into the optional format_ keyword.
This example converts comma (group separated) strings to numeric.
>>> df.assign(drop_columns = True, commas = df.commas_col, numeric_commas = to_numeric(df.commas_col, format_ = '9G99'))
commas numeric_commas 0 ,088 None 1 08,8 None 2 0,88 88 3 088, None
This example convert hex strings to numeric.
>>> df.assign(drop_columns = True, hex = df.hex_col, numeric_hex = to_numeric(df.hex_col, format_ = 'XXXXXXXXXX'))
hex numeric_hex 0 ABCDEFAB 2882400171 1 19FF 6655 2 abcd 43981 3 2018 8216
Example 3: Use String Literals as Arguments
The to_numeric function can take DataFrame columns or string literals as arguments.
This example converts literals to numeric.
>>> df.assign(drop_columns = True, a = to_numeric('123,456',format_ = '999,999'), b = to_numeric('1,333.555', format_ = '9,999D999'), c = to_numeric('2,333,2',format_ = '9G999G9'), d = to_numeric('3E20'), e = to_numeric('$41.99', format_ = 'L99.99'), f = to_numeric('$.12', format_ = 'L.99'), g = to_numeric('dollar123,456.00', format_ = 'L999G999D99', nls = {'param': 'currency', 'value': 'dollar'})).head(1)
a b c d e f g 0 123456 1333.555 23332 300000000000000000000 41.99 .12 123456