Example 1: Remove Stopwords from Text | Examples with DataFrame.apply Method| Open Analytics Framework - Example 1: Removing Stopwords from Text - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

In this example, apply a function with user-specified list of words to be removed from the text in the input table.

  1. Create the input table "web_text" from a list of strings in the "data" dictionary, and then inspect the local teradataml DataFrame as follows.
    data = { 
        'reviews_col':['Good Food rates the citys lattes and smashed avos', 
                 'The service was slow and tough to find during the meal.',
                 'most of the food was pretty good but there were some odd menu choices and some dishes lacked flavor',
                 'the food was very cold but there were some cold menu choices.',
                 'cold drinks were nice.',
                 'cold beverages were not good.']}
    df1 = pd.DataFrame(data)
    copy_to_sql(df = df1, table_name = 'web_text',
                if_exists = "replace", index = True)
    web_text = DataFrame.from_table("web_text")
    web_text

    Out:

                                                  reviews_col       index_label
    0       Good Food rates the citys lattes and smashed avos       0
    1       The service was slow and tough to find during ...       1
    2       most of the food was pretty good but there wer...       2
    3       the food was very cold but there were some col...       3
    4       cold drinks were nice.                                  4
    5       cold beverages were not good.                           5
  2. Construct a function "stopwords_removal_fun" to remove user-specified stopwords from the value of the column "reviews_col" for each row of the teradataml DataFrame passed to the function.
    from numpy import asarray
    def stopwords_removal_fun(Row, stopwords):
            new_str = Row['reviews_col'].strip()
            for word in stopwords:
                new_str = new_str.replace(word, "")
            return asarray([new_str, Row['index_label']])
  3. Determine a list of stopwords.
    stopwords = ['the', 'was', 'and', 'of', 'but']
  4. Invoke the teradataml.DataFrame.apply method with the function constructed in previous step.
    The apply method invokes the APPLY table operator in the background and stores the operation results in the output variable.
    output = web_text.apply(lambda row: stopwords_removal_fun(row, stopwords), env_name=testenv)
    output

    Out:

              reviews_col                                                              index_label
    most food pretty good re were some odd menu choices some dishes lacked flavor                2
    cold drinks were nice.                                                                       4
    cold beverages were not good.                                                                5
    food very cold re were some cold menu choices.                                               3
    The service slow tough to find during meal.                                                  1
    Good Food rates citys lattes smashed avos                                                    0