Teradata R Package Function Reference | 17.00 - 17.00 - Teradata SQL Translation - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
17.00
created_date
September 2020
category
Programming Reference
featnum
B700-4007-090K

An important feature that dbplyr provides is the ability to take expressions in R and map them to corresponding expressions in SQL. The dplyr verbs rely on this mapping to generate the correct query. See the verbs vignette for more information on using the verbs.

See Teradata® Database SQL Functions, Operators, Expressions, and Predicates manual for more information on the SQL functions and operators that Teradata supports.

This vignette is largely a reference for supported mappings. The tables below show the most commonly used mappings. The R expressions column shows default values of functions when mapped to the SQL equivalent. The SQL column shows examples of the translated expressions from the R expressions column.

SQL Translation

R expressions get converted to SQL whenever used with a verb on a remote tbl. The translated SQL column expressions are used in various clauses depending on the verb. Internally, sql_translate_env is being called to dictate how some expression such as a comparison operator or function should be translated into SQL.

Functions not explicitly in the map are translated to sql literally. This allows you to use functions defined on the SQL Engine, such as UDFs.

df <- tbl(con, 'landmass')
translate_sql(my_udf(col1, col2) + 1L)
#> <SQL> my_udf(`col1`, `col2`) + 1
df %>% mutate(udf = my_udf(row_names) + 1L) %>% show_query()
#> <SQL>
#> SELECT "row_names", "islands", my_udf("row_names") + 1 AS "udf"
#> FROM "landmass"

This also implies that mapped functions are case-sensitive when being translated. Identifiers in R are case sensitive. There is a difference between mean and MEAN. The latter is a defined function in base R, while the former is undefined. So mean is mapped as expected but MEAN is not.

translate_sql(mean(x), MEAN(y), con = dbplyr::simulate_teradata())
#> <SQL> AVG(`x`) OVER ()
#> <SQL> MEAN(`y`)

df %>% 
  mutate(x = mean(islands), y = MEAN(islands)) %>%
  show_query()
#> <SQL>
#> SELECT "row_names", "islands", AVG("islands") OVER () AS "x", MEAN("islands") AS "y"
#> FROM "landmass"

However, if a function is not mapped and since SQL is case insensitive, specifying the function with different cases in R are all equivalent:

translate_sql(avg(x), AVG(x), aVg(x), con = dbplyr::simulate_teradata())
#> <SQL> avg(`x`)
#> <SQL> AVG(`x`)
#> <SQL> aVg(`x`)
df %>% 
  group_by(row_names) %>% 
  summarize(x = avg(islands), y = AVG(islands), z = aVg(islands)) %>%
  show_query()
#> <SQL>
#> SELECT "row_names", avg("islands") AS "x", AVG("islands") AS "y", aVg("islands") AS "z"
#> FROM "landmass"
#> GROUP BY "row_names"

To suppress this behaviour, and force errors immediately when dplyr doesn’t know how to translate a function it encounters, set the dplyr.strict_sql option to TRUE. For more information, see the SQL translation vignette in dbplyr.

You can use translate_sql to see how an expression gets translated from R to SQL. Use a Teradata connection object or dbplyr::simulate_teradata() in the con parameter to get tdplyr specific mappings. Read the connection vignette for more details about the Teradata connection.

Math operators and Math functions

Math Operators
R operators R expressions SQL
+ x + y “x” + “y”
- x - y “x” - “y”
/ x / y “x” / “y”
* x * y “x” * “y”
%% x %% y “x” MOD “y”
%/% x %/% y FLOOR((“x”) / (“y”))
^ x ^ y POWER(“x”, “y”)
Math Functions
R operators R expressions SQL
abs abs(x) ABS(“x”)
acos acos(x) ACOS(“x”)
acosh acosh(x) acosh(“x”)
asin asin(x) ASIN(“x”)
asinh asinh(x) asinh(“x”)
atan atan(x) ATAN(“x”)
atan2 atan2(y, x) ATAN2(“x”, “y”)
atanh atanh(x) atanh(“x”)
ceil ceil(x) CEILING(“x”)
cos cos(x) COS(“x”)
cosh cosh(x) COSH(“x”)
cot cot(x) 1 / TAN(“x”)
coth coth(x) (EXP(2 * (“x”)) + 1) / (EXP(2 * (“x”)) - 1)
degrees degrees(x) degrees(“x”)
exp exp(x) EXP(“x”)
floor floor(x) FLOOR(“x”)
ln ln(x) LN(“x”)
log log(x, base = exp(1)) LOG(“x”) / LOG(“y”)
log10 log10(x) LOG(“x”)
radians radians(x) radians(“x”)
round round(x, digits = 0L) ROUND(“x”, 0)
sign sign(x) SIGN(“x”)
sin sin(x) SIN(“x”)
sinh sinh(x) SINH(“x”)
sqrt sqrt(x) SQRT(“x”)
tan tan(x) TAN(“x”)

Logical and Comparison operators

Logical Operators
R operators R expressions SQL
! ! x NOT(“x”)
& x & y “x” AND “y”
&& x && y “x” AND “y”
| x | y “x” OR “y”
|| x || y “x” OR “y”
xor xor(x, y) “x” OR “y” AND NOT (“x” AND “y”)
if if(cond, true, false = NULL) CASE WHEN (“cond”) THEN (“true”) END
if_else if_else(cond, true, false) CASE WHEN (“cond”) THEN (“true”) WHEN NOT(“cond”) THEN (“false”) END
Comparison Operators
R operators R expressions SQL
< x < y “x” < “y”
<= x <= y “x” <= “y”
> x > y “x” > “y”
>= x >= y “x” >= “y”
== x == y “x” = “y”
!= x != y “x” <> “y”
is.null is.null(x) ((“x”) IS NULL)
is.na is.na(x) ((“x”) IS NULL)

Conversion functions

Conversion functions
R operators R expressions SQL
as.integer as.integer(x) CAST(“x” AS INTEGER)
as.numeric as.numeric(x, precision = 5L, scale = 0L) CAST(“x” AS NUMERIC(5, 0))
as.double as.double(x) CAST(“x” AS FLOAT)
as.character as.character(x, n = NULL, charset = NULL) CAST((“x”) AS VARCHAR(1024))
as as(x, ‘y’) CAST(“x” AS y)
as.Date as.Date(x, format) CAST(“x” AS DATE FORMAT ‘YYYY-MM-DD’)

Notes

  • as.character uses the default value of Sys.getenv('TD_VARCHAR_COLUMN_SIZE') for n. The charset parameter takes a string representation of the character set to cast the expressions (e.g ‘LATIN’, ‘UNICODE’).

  • The format parameter of the function as.Date should be in the same format as that of the character column that is to be casted to Date type. For example, if the column has a character value “19851205” and to cast it to Date type, the input to format parameter should be ‘YYYYMMDD’.

  • See the Data Type Conversions chapter and Data Type Conversion Functions chapter from Teradata Vantage™ Data Types and Literals for more information.

Character functions

Character Functions
R operators R expressions SQL
tolower tolower(x) LOWER(“x”)
toupper toupper(x) UPPER(“x”)
nchar nchar(x) CHARACTER_LENGTH(“x”)
str_length str_length(x) LENGTH(“x”)
substr substr(x, start , stop = NULL) SUBSTRING(“x” FROM “start”)
substring substring(x, start, stop = NULL) SUBSTRING(“x” FROM “start”)
grep grep(pattern, x, ignore.case = FALSE) REGEXP_SUBSTR(“x”, “pattern”, 1, 1, ‘c’)
str_extract str_extract(string, pattern) REGEXP_SUBSTR(“string”, “pattern”, 1, 1, ‘c’)
sub sub(pattern, replacement, x, ignore.case = FALSE) REGEXP_REPLACE(“x”, “pattern”, “replacement”, 1, 1, ‘c’)
gsub gsub(pattern, replacement, x, ignore.case = FALSE) REGEXP_REPLACE(“x”, “pattern”, “replacement”, 1, 0, ‘c’)
str_replace str_replace(string, pattern, replacement) REGEXP_REPLACE(“string”, “pattern”, “replacement”, 1, 1, ‘c’)
paste paste(…, sep = ‘ ’) “c1” || ‘ ’ || “c2” || ‘ ’ || “c3”
paste0 paste0(…) “c1” || ’’ || “c2” || ’’ || “c3”
str_join str_join(…, sep = ‘ ’) “c1” || ‘ ’ || “c2” || ‘ ’ || “c3”
strsplit strsplit(x, split = ‘ ’, tokennum = 1L) STRTOK(“x”, ‘ ’, 1)
str_locate str_locate(string, pattern, return_opt = 0L) REGEXP_INSTR(“string”, “pattern”, 1, 1, 0, ‘c’)
str_trim str_trim(string, side = c(“both”, “left”, “right”)) RTRIM(LTRIM(string, ‘\n \t\r\v\f’), ‘\n \t\r\v\f’)
str_replace_all str_replace_all(string, pattern, replacement) OREPLACE(“string”, “pattern”, “replacement”)
str_detect str_detect(string, pattern) INSTR(“string”, “pattern”) > 0

Notes

  • Any casting done by these functions is done implicitly. To explicitly cast column expressions, see the Conversion Functions.

  • str_locate maps to REGEXP_INSTR. You can control whether the beginning or ending index is returned by setting return_opt to 0 or 1 respectively. When return_opt is 1, REGEXP_INSTR returns a value similar to str_locate, which is the position before the character following the occurrence of the match.

  • str_split maps to STRTOK. You can provide which token to retrieve after the split using the tokennum parameter.

  • grep maps to REGEXP_SUBSTR. This is like using grep(pattern, x, ignore.case, value = TRUE) in R except that the substring matching the pattern is returned instead of the full string. Consider this example (Note that sort removes NA):

head(landmass)
#> [1] "Africa"       "Antarctica"   "Asia"         "Australia"    "Axel Heiberg"
#> [6] "Baffin"

# base R grep
grep('^A', landmass, value = TRUE) %>% sort
#> [1] "Africa"       "Antarctica"   "Asia"         "Australia"    "Axel Heiberg"

# sql translated grep
df %>% transmute(x = grep('^A', row_names)) %>% pull(x) %>% sort
#> [1] "A" "A" "A" "A" "A"

# to specify the full string, specify it in the pattern
df %>% transmute(x = grep('^A.+', row_names)) %>% pull(x) %>% sort
#> [1] "Africa"       "Antarctica"   "Asia"         "Australia"    "Axel Heiberg"

# or use filter (Using arrange to order the output by the column 'row_names')
df %>% select(row_names) %>% filter(!is.na(grep('^A', row_names))) %>% arrange(row_names)
#> # Source:     lazy query [?? x 1]
#> # Database:   Teradata
#> # Ordered by: row_names
#>   row_names   
#>   <chr>       
#> 1 Africa      
#> 2 Antarctica  
#> 3 Asia        
#> 4 Australia   
#> 5 Axel Heiberg
  • When concatenating using paste, paste0, or str_join there may be extra spaces due to implicit casting of types and character set differences.

  • The str_replace_all parameter maps to the OREPLACE string operator in SQL. Using this parameter is similar to using the gsub() R function without the ignore.case parameter.

  • The str_detect parameter maps to the INSTR string operator in SQL. This parameter provides the pattern to be detected in the string.

  • See the String Operators and Functions chapter from Teradata Vantage™ SQL Functions, Expressions, and Predicates manual for more information.

Aggregate Functions

Aggregate functions
R operators R expressions SQL
mean mean(x, na.rm = FALSE) AVG(“x”)
max max(x, na.rm = FALSE) MAX(“x”)
min min(x, na.rm = FALSE) MIN(“x”)
sum sum(x, na.rm = FALSE) SUM(“x”)
var var(x, na.rm = FALSE) VAR_SAMP(“x”)
varp varp(x, na.rm = FALSE) VAR_POP(“x”)
sd sd(x, na.rm = FALSE) STDDEV_SAMP(“x”)
sdp sdp(x, na.rm = FALSE) STDDEV_POP(“x”)
n n(x) CAST(COUNT(“x”) AS BIGINT)
n_distinct n_distinct(x) CAST(COUNT(DISTINCT “x”) AS BIGINT)
median median(x) MEDIAN(“x”)

Notes

Window Functions

Note the following restrictions for window functions according to the Teradata® Database SQL Functions, Operators, Expressions, and Predicates manual (Ordered Analytical/Window Aggregate Functions):

  • The DISTINCT clause is not permitted in window aggregate functions.

  • Ordered analytical functions are not permitted in:

    • Subqueries
    • WHERE clauses
    • SELECT AND CONSUME statements
  • Ordered analytical functions are permitted in the following database query objects:

    • Views
    • Macros
    • Derived tables
    • INSERT … SELECT
  • The sort order that you specify in the window specification defines the sort order of the rows over which the function is applied; it does not define the ordering of the results. To order the results, use an ORDER BY phrase in the SELECT statement

  • Teradata does not allow ORDER BY in subqueries. This implies that the arrange verb should be used as the last verb. See the Arrange section in verbs for more information.

Window functions
R operators R expressions SQL
row_number row_number(x) ROW_NUMBER() OVER (ORDER BY “x”)
rank rank(x) RANK() OVER (ORDER BY “x”)
dense_rank dense_rank(x) DENSE_RANK() OVER (ORDER BY “x”)
percent_rank percent_rank(x) PERCENT_RANK() OVER (ORDER BY “x”)
cume_dist cume_dist(x) CUME_DIST() OVER (ORDER BY “x”)
first first(x, order_by = NULL, ignore_nulls = FALSE) first_value(“x”) OVER (ORDER BY “x”)
last last(x, order_by = NULL, ignore_nulls = FALSE) last_value(“x”) OVER (ORDER BY “x”)
lag lag(x, n = 1L, default = NA, order_by = NULL, ignore_nulls = FALSE) LAG(“x”, 1, NULL) OVER (ORDER BY “x”)
lead lead(x, n = 1L, default = NA, order_by = NULL, ignore_nulls = FALSE) LEAD(“x”, 1, NULL) OVER (ORDER BY “x”)
ntile ntile(100, order_by = NULL) (RANK() OVER (ORDER BY “x”) - 1) * 100.0 / COUNT(*) OVER ()
Window Aggregate functions
R operators R expressions SQL
mean mean(x, na.rm = FALSE) AVG(“x”) OVER ()
sum sum(x, na.rm = FALSE) SUM(“x”) OVER ()
min min(x, na.rm = FALSE) MIN(“x”) OVER ()
max max(x, na.rm = FALSE) MAX(“x”) OVER ()
n n(x) CAST(COUNT(“x”) OVER () AS BIGINT)
var var(x, na.rm = FALSE) VAR_SAMP(“x”) OVER ()
varp varp(x, na.rm = FALSE) VAR_POP(“x”) OVER ()
sd sd(x, na.rm = FALSE) STDDEV_SAMP(“x”) OVER ()
sdp sdp(x, na.rm = FALSE) STDDEV_POP(“x”) OVER ()
cumsum cumsum(x) SUM(“x”) OVER (ROWS UNBOUNDED PRECEDING)
cummin cummin(x) MIN(“x”) OVER (ROWS UNBOUNDED PRECEDING)
cummax cummax(x) MAX(“x”) OVER (ROWS UNBOUNDED PRECEDING)
cummean cummean(x) avg(“x”) OVER (ROWS UNBOUNDED PRECEDING)

Notes

  • See help(window_order) for information on specifying the ORDER BY clause for the OVER() clause. Alternatively, you can specify it using order_by(), or in the order_by parameter of the function (if it has one):
# equivalent ways of specifying the order by clause in the window specification
tbl %>% mutate(order_by(x, first(x)))

tbl %>% mutate(first(x, order_by = x))

tbl %>% window_order(x) %>% mutate(first(x))
  • You can specify the ROWS clause in OVER() using window_frame(). See help(window_frame) for details.
  • Use group_by to set the column expressions for the PARTITION BY clause in the window specification.
  • The n parameter for ntile must be numeric. Results may differ if n is passed as an integer or a double due to Teradata’s implicit casting.
  • It is a dplyr convention to use summarize for aggregate functions only. Using non-aggregate window functions with summarize may not return expected results. Use mutate (or transmute) instead. See the Window functions vignette in the dplyr package.

Bit functions

Bit functions
R operators R expressions SQL
bitwNot bitwNot(a) BITNOT(CAST(“a” AS INTEGER))
bitwAnd bitwAnd(a, b) BITAND(CAST(“a” AS INTEGER), CAST(“b” AS INTEGER))
bitwOr bitwOr(a, b) BITOR(CAST(“a” AS INTEGER), CAST(“b” AS INTEGER))
bitwXor bitwXor(a, b) BITXOR(CAST(“a” AS INTEGER), CAST(“b” AS INTEGER))
bitwShiftL bitwShiftL(a, n) SHIFTLEFT(CAST(“a” AS INTEGER), CAST(“n” AS INTEGER))
bitwShiftR bitwShiftR(a, n) SHIFTRIGHT(CAST(“a” AS INTEGER), CAST(“n” AS INTEGER))
bitnot bitnot(a) bitnot(“a”)
bitand bitand(a, b) bitand(“a”, “b”)
bitor bitor(a, b) bitor(“a”, “b”)
bitxor bitxor(a, b) bitxor(“a”, “b”)
shiftleft shiftleft(a, n) shiftleft(“a”, “n”)
shiftright shiftright(a, n) shiftright(“a”, “n”)

Notes

  • The n parameter is a non-negative integer vector of values up to 31 for the bit functions in base R. However, in the SQL Engine, the scope of the shift operation is bounded by the size of a. Specifying a shift that is outside the range of a results in an SQL error.
  • See the Bit/Byte Manipulation Functions chapter from Teradata® Database SQL Functions, Operators, Expressions, and Predicates manual for more information.