MAP_REDUCE 是由 Google 引入的编程模型,允许轻松开发可扩展的并行应用程序,以便在大型商品机群中处理大数据。MAP_MERGE 运算符是 MAP_REDUCE 运算符的专用运算符。
代码语法MAP_REDUCE(table/table variable name>, ) group by
in the TUDF> as
) |
procedure parameters>)
table/table variable name>.
我们以包含句子及其 ID 的表为例。如果要计算包含特定字符的句数和表中每个字符的出现次数,可以通过以下方式使用 MAP_REDUCE 运算符:
示例代码映射器函数create function mapper(in id int, in sentence varchar(5000))returns table (id int, c varchar, freq int) as begin using sqlscript_string as lib; declare tv table(result varchar); tv = lib:split_to_table(:sentence, ' '); return select :id as id, result as c, count(result) as freq from :tv group by result;end;
示例代码缩减器功能create function reducer(in c varchar, in vals table(id int, freq int))returns table (c varchar, stmt_freq int, total_freq int) as begin return select :c as c, count(distinct(id)) as stmt_freq, sum(freq) as total_freq from :vals;end; 示例代码do begin declare result table(c varchar, stmt_freq int, total_freq int); result = MAP_REDUCE(tab, mapper(tab.id, tab.sentence) group by c as X, reducer(X.c, X)); select * from :result order by c;end; 上述代码的工作方式如下:
table(a int, b varchar, c timestamp, d int) 且键值列为 b 和 c,则值表的布局为 table(a int, d int)。
如果使用只读过程作为减速器,则可以从 MAP_REDUCE 运算符获取多个表输出。要绑定 MAP_REDUCE 运算符的输出,只需将表变量应用为减速器规范的参数。例如,如果要将上述示例中的减速器更改为只读过程,请应用以下代码。
create procedure reducer_procedure(in c varchar, in values table(id int, freq int), out otab table (c varchar, stmt_freq int, total_freq int))
reads sql data as begin
otab = select :c as c, count(distinct(id)) as stmt_freq, sum(freq) as total_freq from :values;
end;
do begin
declare result table(c varchar, stmt_freq int, total_freq int);
MAP_REDUCE(tab, mapper(tab.id, tab.sentence) group by c as X,
reducer_procedure(X.c, X, result));
select * from :result order by c;
end;
可以传递额外的参数作为映射器或减速器的参数。 示例代码create function mapper(in id int, in sentence varchar(5000), in some_extra_arg1 int, in some_extra_arg2 table(...), ...)returns table (id int, c varchar, freq int) as begin ...end;create function reducer(in c varchar, in values table(id int, freq int), in some_extra_arg1 int, in some_extra_arg2 table(...), ...)returns table (c varchar, stmt_freq int, total_freq int) as begin ...end;do begin declare result table(c varchar, stmt_freq int, total_freq int); declare extra_arg1, extra_arg2 int; declare extra_arg3, extra_arg4 table(...); ... more extra args ... result = MAP_REDUCE(tab, mapper(tab.id, tab.sentence, :extra_arg1, :extra_arg3, ...) group by c as X, reducer(X.c, X, :extra_arg2, :extra_arg4, 1+1, ...)); select * from :result order by c;end; 备注输入表参数、输入列参数、额外参数等的顺序没有限制。还可以在映射器/缩减程序 TUDF 或过程中使用缺省参数值。
以下限制适用: