SQL Translation

An important feature that dbplyr provides is the ability to take expressions in R and map them to corresponding expressions in SQL. The dplyr verbs rely on this mapping to generate the correct query. See the verbs vignette for more information on using the verbs.

See Teradata® Database SQL Functions, Operators, Expressions, and Predicates manual for more information on the SQL functions and operators that Teradata supports.

This vignette is largely a reference for supported mappings. The tables below show the most commonly used mappings. The R expressions column shows default values of functions when mapped to the SQL equivalent. The SQL column shows examples of the translated expressions from the R expressions column.

SQL Translation

R expressions get converted to SQL whenever used with a verb on a remote tbl. The translated SQL column expressions are used in various clauses depending on the verb. Internally, sql_translate_env is being called to dictate how some expression such as a comparison operator or function should be translated into SQL.

Functions not explicitly in the map are translated to sql literally. This allows you to use functions defined on the SQL Engine, such as UDFs.

df <- tbl(con, 'landmass')
translate_sql(my_udf(col1, col2) + 1L)
#> <SQL> my_udf(`col1`, `col2`) + 1
df %>% mutate(udf = my_udf(row_names) + 1L) %>% show_query()
#> <SQL>
#> SELECT "row_names", "islands", my_udf("row_names") + 1 AS "udf"
#> FROM "landmass"

This also implies that mapped functions are case-sensitive when being translated. Identifiers in R are case sensitive. There is a difference between mean and MEAN. The latter is a defined function in base R, while the former is undefined. So mean is mapped as expected but MEAN is not.

translate_sql(mean(x), MEAN(y), con = dbplyr::simulate_teradata())
#> <SQL> AVG(`x`) OVER ()
#> <SQL> MEAN(`y`)

df %>% 
  mutate(x = mean(islands), y = MEAN(islands)) %>%
  show_query()
#> <SQL>
#> SELECT "row_names", "islands", AVG("islands") OVER () AS "x", MEAN("islands") AS "y"
#> FROM "landmass"

However, if a function is not mapped and since SQL is case insensitive, specifying the function with different cases in R are all equivalent:

translate_sql(avg(x), AVG(x), aVg(x), con = dbplyr::simulate_teradata())
#> <SQL> avg(`x`)
#> <SQL> AVG(`x`)
#> <SQL> aVg(`x`)
df %>% 
  group_by(row_names) %>% 
  summarize(x = avg(islands), y = AVG(islands), z = aVg(islands)) %>%
  show_query()
#> <SQL>
#> SELECT "row_names", avg("islands") AS "x", AVG("islands") AS "y", aVg("islands") AS "z"
#> FROM "landmass"
#> GROUP BY "row_names"

To suppress this behaviour, and force errors immediately when dplyr doesn’t know how to translate a function it encounters, set the dplyr.strict_sql option to TRUE. For more information, see the SQL translation vignette in dbplyr.

You can use translate_sql to see how an expression gets translated from R to SQL. Use a Teradata connection object or dbplyr::simulate_teradata() in the con parameter to get tdplyr specific mappings. Read the connection vignette for more details about the Teradata connection.

Math operators and Math functions

Math Operators
R operators	R expressions	SQL
+	x + y	“x” + “y”
-	x - y	“x” - “y”
/	x / y	“x” / “y”
*	x * y	“x” * “y”
%%	x %% y	“x” MOD “y”
%/%	x %/% y	FLOOR((“x”) / (“y”))
^	x ^ y	POWER(“x”, “y”)

Math Functions
R operators	R expressions	SQL
abs	abs(x)	ABS(“x”)
acos	acos(x)	ACOS(“x”)
acosh	acosh(x)	acosh(“x”)
asin	asin(x)	ASIN(“x”)
asinh	asinh(x)	asinh(“x”)
atan	atan(x)	ATAN(“x”)
atan2	atan2(y, x)	ATAN2(“x”, “y”)
atanh	atanh(x)	atanh(“x”)
ceil	ceil(x)	CEILING(“x”)
cos	cos(x)	COS(“x”)
cosh	cosh(x)	COSH(“x”)
cot	cot(x)	1 / TAN(“x”)
coth	coth(x)	(EXP(2 * (“x”)) + 1) / (EXP(2 * (“x”)) - 1)
degrees	degrees(x)	degrees(“x”)
exp	exp(x)	EXP(“x”)
floor	floor(x)	FLOOR(“x”)
ln	ln(x)	LN(“x”)
log	log(x, base = exp(1))	LOG(“x”) / LOG(“y”)
log10	log10(x)	LOG(“x”)
radians	radians(x)	radians(“x”)
round	round(x, digits = 0L)	ROUND(“x”, 0)
sign	sign(x)	SIGN(“x”)
sin	sin(x)	SIN(“x”)
sinh	sinh(x)	SINH(“x”)
sqrt	sqrt(x)	SQRT(“x”)
tan	tan(x)	TAN(“x”)

Logical and Comparison operators

Logical Operators
R operators	R expressions	SQL
!	! x	NOT(“x”)
&	x & y	“x” AND “y”
&&	x && y	“x” AND “y”
\|	x \| y	“x” OR “y”
\|\|	x \|\| y	“x” OR “y”
xor	xor(x, y)	“x” OR “y” AND NOT (“x” AND “y”)
if	`if`(cond, true, false = NULL)	CASE WHEN (“cond”) THEN (“true”) END
if_else	if_else(cond, true, false)	CASE WHEN (“cond”) THEN (“true”) WHEN NOT(“cond”) THEN (“false”) END

Comparison Operators
R operators	R expressions	SQL
<	x < y	“x” < “y”
<=	x <= y	“x” <= “y”
>	x > y	“x” > “y”
>=	x >= y	“x” >= “y”
==	x == y	“x” = “y”
!=	x != y	“x” <> “y”
is.null	is.null(x)	((“x”) IS NULL)
is.na	is.na(x)	((“x”) IS NULL)

Conversion functions

Conversion functions
R operators	R expressions	SQL
as.integer	as.integer(x)	CAST(“x” AS INTEGER)
as.numeric	as.numeric(x, precision = 5L, scale = 0L)	CAST(“x” AS NUMERIC(5, 0))
as.double	as.double(x)	CAST(“x” AS FLOAT)
as.character	as.character(x, n = NULL, charset = NULL)	CAST((“x”) AS VARCHAR(1024))
as	as(x, ‘y’)	CAST(“x” AS y)
as.Date	as.Date(x, format)	CAST(“x” AS DATE FORMAT ‘YYYY-MM-DD’)

Notes

as.character uses the default value of Sys.getenv('TD_VARCHAR_COLUMN_SIZE') for n. The charset parameter takes a string representation of the character set to cast the expressions (e.g ‘LATIN’, ‘UNICODE’).
The format parameter of the function as.Date should be in the same format as that of the character column that is to be casted to Date type. For example, if the column has a character value “19851205” and to cast it to Date type, the input to format parameter should be ‘YYYYMMDD’.
See the Data Type Conversions chapter and Data Type Conversion Functions chapter from Teradata Vantage™ Data Types and Literals for more information.

Character functions

Character Functions
R operators	R expressions	SQL
tolower	tolower(x)	LOWER(“x”)
toupper	toupper(x)	UPPER(“x”)
nchar	nchar(x)	CHARACTER_LENGTH(“x”)
str_length	str_length(x)	LENGTH(“x”)
substr	substr(x, start , stop = NULL)	SUBSTRING(“x” FROM “start”)
substring	substring(x, start, stop = NULL)	SUBSTRING(“x” FROM “start”)
grep	grep(pattern, x, ignore.case = FALSE)	REGEXP_SUBSTR(“x”, “pattern”, 1, 1, ‘c’)
str_extract	str_extract(string, pattern)	REGEXP_SUBSTR(“string”, “pattern”, 1, 1, ‘c’)
sub	sub(pattern, replacement, x, ignore.case = FALSE)	REGEXP_REPLACE(“x”, “pattern”, “replacement”, 1, 1, ‘c’)
gsub	gsub(pattern, replacement, x, ignore.case = FALSE)	REGEXP_REPLACE(“x”, “pattern”, “replacement”, 1, 0, ‘c’)
str_replace	str_replace(string, pattern, replacement)	REGEXP_REPLACE(“string”, “pattern”, “replacement”, 1, 1, ‘c’)
paste	paste(…, sep = ‘ ’)	“c1” \|\| ‘ ’ \|\| “c2” \|\| ‘ ’ \|\| “c3”
paste0	paste0(…)	“c1” \|\| ’’ \|\| “c2” \|\| ’’ \|\| “c3”
str_join	str_join(…, sep = ‘ ’)	“c1” \|\| ‘ ’ \|\| “c2” \|\| ‘ ’ \|\| “c3”
strsplit	strsplit(x, split = ‘ ’, tokennum = 1L)	STRTOK(“x”, ‘ ’, 1)
str_locate	str_locate(string, pattern, return_opt = 0L)	REGEXP_INSTR(“string”, “pattern”, 1, 1, 0, ‘c’)
str_trim	str_trim(string, side = c(“both”, “left”, “right”))	RTRIM(LTRIM(`string`, ‘\n \t\r\v\f’), ‘\n \t\r\v\f’)
str_replace_all	str_replace_all(string, pattern, replacement)	OREPLACE(“string”, “pattern”, “replacement”)
str_detect	str_detect(string, pattern)	INSTR(“string”, “pattern”) > 0

Notes

Any casting done by these functions is done implicitly. To explicitly cast column expressions, see the Conversion Functions.
str_locate maps to REGEXP_INSTR. You can control whether the beginning or ending index is returned by setting return_opt to 0 or 1 respectively. When return_opt is 1, REGEXP_INSTR returns a value similar to str_locate, which is the position before the character following the occurrence of the match.
str_split maps to STRTOK. You can provide which token to retrieve after the split using the tokennum parameter.
grep maps to REGEXP_SUBSTR. This is like using grep(pattern, x, ignore.case, value = TRUE) in R except that the substring matching the pattern is returned instead of the full string. Consider this example (Note that sort removes NA):

head(landmass)
#> [1] "Africa"       "Antarctica"   "Asia"         "Australia"    "Axel Heiberg"
#> [6] "Baffin"

# base R grep
grep('^A', landmass, value = TRUE) %>% sort
#> [1] "Africa"       "Antarctica"   "Asia"         "Australia"    "Axel Heiberg"

# sql translated grep
df %>% transmute(x = grep('^A', row_names)) %>% pull(x) %>% sort
#> [1] "A" "A" "A" "A" "A"

# to specify the full string, specify it in the pattern
df %>% transmute(x = grep('^A.+', row_names)) %>% pull(x) %>% sort
#> [1] "Africa"       "Antarctica"   "Asia"         "Australia"    "Axel Heiberg"

# or use filter (Using arrange to order the output by the column 'row_names')
df %>% select(row_names) %>% filter(!is.na(grep('^A', row_names))) %>% arrange(row_names)
#> # Source:     lazy query [?? x 1]
#> # Database:   Teradata
#> # Ordered by: row_names
#>   row_names   
#>   <chr>       
#> 1 Africa      
#> 2 Antarctica  
#> 3 Asia        
#> 4 Australia   
#> 5 Axel Heiberg

When concatenating using paste, paste0, or str_join there may be extra spaces due to implicit casting of types and character set differences.
The str_replace_all parameter maps to the OREPLACE string operator in SQL. Using this parameter is similar to using the gsub() R function without the ignore.case parameter.
The str_detect parameter maps to the INSTR string operator in SQL. This parameter provides the pattern to be detected in the string.
See the String Operators and Functions chapter from Teradata Vantage™ SQL Functions, Expressions, and Predicates manual for more information.

Aggregate Functions

Aggregate functions
R operators	R expressions	SQL
mean	mean(x, na.rm = FALSE)	AVG(“x”)
max	max(x, na.rm = FALSE)	MAX(“x”)
min	min(x, na.rm = FALSE)	MIN(“x”)
sum	sum(x, na.rm = FALSE)	SUM(“x”)
var	var(x, na.rm = FALSE)	VAR_SAMP(“x”)
varp	varp(x, na.rm = FALSE)	VAR_POP(“x”)
sd	sd(x, na.rm = FALSE)	STDDEV_SAMP(“x”)
sdp	sdp(x, na.rm = FALSE)	STDDEV_POP(“x”)
n	n(x)	CAST(COUNT(“x”) AS BIGINT)
n_distinct	n_distinct(x)	CAST(COUNT(DISTINCT “x”) AS BIGINT)
median	median(x)	MEDIAN(“x”)

Notes

The distinct() function can be used to specify whether the aggregate function should be called with the DISTINCT option (e.g. mutate(distinct_sum = sum(distinct(x))))
See the Aggregate Functions chapter from Teradata® Database SQL Functions, Operators, Expressions, and Predicates manual for more information.

Window Functions

Note the following restrictions for window functions according to the Teradata® Database SQL Functions, Operators, Expressions, and Predicates manual (Ordered Analytical/Window Aggregate Functions):

The DISTINCT clause is not permitted in window aggregate functions.
Ordered analytical functions are not permitted in:
- Subqueries
- WHERE clauses
- SELECT AND CONSUME statements
Ordered analytical functions are permitted in the following database query objects:
- Views
- Macros
- Derived tables
- INSERT … SELECT
The sort order that you specify in the window specification defines the sort order of the rows over which the function is applied; it does not define the ordering of the results. To order the results, use an ORDER BY phrase in the SELECT statement
Teradata does not allow ORDER BY in subqueries. This implies that the arrange verb should be used as the last verb. See the Arrange section in verbs for more information.

Window functions
R operators	R expressions	SQL
row_number	row_number(x)	ROW_NUMBER() OVER (ORDER BY “x”)
rank	rank(x)	RANK() OVER (ORDER BY “x”)
dense_rank	dense_rank(x)	DENSE_RANK() OVER (ORDER BY “x”)
percent_rank	percent_rank(x)	PERCENT_RANK() OVER (ORDER BY “x”)
cume_dist	cume_dist(x)	CUME_DIST() OVER (ORDER BY “x”)
first	first(x, order_by = NULL, ignore_nulls = FALSE)	first_value(“x”) OVER (ORDER BY “x”)
last	last(x, order_by = NULL, ignore_nulls = FALSE)	last_value(“x”) OVER (ORDER BY “x”)
lag	lag(x, n = 1L, default = NA, order_by = NULL, ignore_nulls = FALSE)	LAG(“x”, 1, NULL) OVER (ORDER BY “x”)
lead	lead(x, n = 1L, default = NA, order_by = NULL, ignore_nulls = FALSE)	LEAD(“x”, 1, NULL) OVER (ORDER BY “x”)
ntile	ntile(100, order_by = NULL)	(RANK() OVER (ORDER BY “x”) - 1) * 100.0 / COUNT(*) OVER ()

Window Aggregate functions
R operators	R expressions	SQL
mean	mean(x, na.rm = FALSE)	AVG(“x”) OVER ()
sum	sum(x, na.rm = FALSE)	SUM(“x”) OVER ()
min	min(x, na.rm = FALSE)	MIN(“x”) OVER ()
max	max(x, na.rm = FALSE)	MAX(“x”) OVER ()
n	n(x)	CAST(COUNT(“x”) OVER () AS BIGINT)
var	var(x, na.rm = FALSE)	VAR_SAMP(“x”) OVER ()
varp	varp(x, na.rm = FALSE)	VAR_POP(“x”) OVER ()
sd	sd(x, na.rm = FALSE)	STDDEV_SAMP(“x”) OVER ()
sdp	sdp(x, na.rm = FALSE)	STDDEV_POP(“x”) OVER ()
cumsum	cumsum(x)	SUM(“x”) OVER (ROWS UNBOUNDED PRECEDING)
cummin	cummin(x)	MIN(“x”) OVER (ROWS UNBOUNDED PRECEDING)
cummax	cummax(x)	MAX(“x”) OVER (ROWS UNBOUNDED PRECEDING)
cummean	cummean(x)	avg(“x”) OVER (ROWS UNBOUNDED PRECEDING)

Notes

See help(window_order) for information on specifying the ORDER BY clause for the OVER() clause. Alternatively, you can specify it using order_by(), or in the order_by parameter of the function (if it has one):

# equivalent ways of specifying the order by clause in the window specification
tbl %>% mutate(order_by(x, first(x)))

tbl %>% mutate(first(x, order_by = x))

tbl %>% window_order(x) %>% mutate(first(x))

You can specify the ROWS clause in OVER() using window_frame(). See help(window_frame) for details.
Use group_by to set the column expressions for the PARTITION BY clause in the window specification.
The n parameter for ntile must be numeric. Results may differ if n is passed as an integer or a double due to Teradata’s implicit casting.
It is a dplyr convention to use summarize for aggregate functions only. Using non-aggregate window functions with summarize may not return expected results. Use mutate (or transmute) instead. See the Window functions vignette in the dplyr package.

Bit functions

Bit functions
R operators	R expressions	SQL
bitwNot	bitwNot(a)	BITNOT(CAST(“a” AS INTEGER))
bitwAnd	bitwAnd(a, b)	BITAND(CAST(“a” AS INTEGER), CAST(“b” AS INTEGER))
bitwOr	bitwOr(a, b)	BITOR(CAST(“a” AS INTEGER), CAST(“b” AS INTEGER))
bitwXor	bitwXor(a, b)	BITXOR(CAST(“a” AS INTEGER), CAST(“b” AS INTEGER))
bitwShiftL	bitwShiftL(a, n)	SHIFTLEFT(CAST(“a” AS INTEGER), CAST(“n” AS INTEGER))
bitwShiftR	bitwShiftR(a, n)	SHIFTRIGHT(CAST(“a” AS INTEGER), CAST(“n” AS INTEGER))
bitnot	bitnot(a)	bitnot(“a”)
bitand	bitand(a, b)	bitand(“a”, “b”)
bitor	bitor(a, b)	bitor(“a”, “b”)
bitxor	bitxor(a, b)	bitxor(“a”, “b”)
shiftleft	shiftleft(a, n)	shiftleft(“a”, “n”)
shiftright	shiftright(a, n)	shiftright(“a”, “n”)

Notes

The n parameter is a non-negative integer vector of values up to 31 for the bit functions in base R. However, in the SQL Engine, the scope of the shift operation is bounded by the size of a. Specifying a shift that is outside the range of a results in an SQL error.
See the Bit/Byte Manipulation Functions chapter from Teradata® Database SQL Functions, Operators, Expressions, and Predicates manual for more information.

Teradata Package for R Function Reference | 17.00 - Teradata SQL Translation - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

SQL Translation

Math operators and Math functions

Logical and Comparison operators

Conversion functions

Notes

Character functions

Notes

Aggregate Functions

Notes

Window Functions

Notes

Bit functions

Notes