Use the select() method to select columns in a DataFrame. The function takes a select expression as an argument and returns a new DataFrame with the selected columns. The expression can be a single column name, a list of column names, or a list of column name lists.
Multicolumn selection of the same column (for example, df.select(['col1', 'col1'])) is not supported.
Examples Prerequisite
Assume the table "admissions_train" exists and its index column is id. And a DataFrame "df" is created based on this table using the command:
>>> df = DataFrame("admissions_train")
>>> df masters gpa stats programming admitted id 5 no 3.44 novice novice 0 7 yes 2.33 novice novice 1 22 yes 3.46 novice beginner 0 17 no 3.83 advanced advanced 1 13 no 4.00 advanced novice 1 19 yes 1.98 advanced advanced 0 36 no 3.00 advanced novice 0 15 yes 4.00 advanced advanced 1 34 yes 3.85 advanced beginner 0 40 yes 3.95 novice beginner 0
Example 1: Expression is single column name
>>> df.select("id")
Empty DataFrame Columns: [] Index: [22, 34, 13, 19, 15, 38, 26, 5, 36, 17]
Example 2: Expression is list of column names
>>> df.select(["id", "masters", "gpa"])
masters gpa id 5 no 3.44 36 no 3.00 15 yes 4.00 17 no 3.83 13 no 4.00 40 yes 3.95 7 yes 2.33 22 yes 3.46 34 yes 3.85 19 yes 1.98
Example 3: Expression is list of column name lists
>>> df.select([['id', 'masters', 'gpa']])
masters gpa id 5 no 3.44 34 yes 3.85 13 no 4.00 40 yes 3.95 22 yes 3.46 19 yes 1.98 36 no 3.00 15 yes 4.00 7 yes 2.33 17 no 3.83