I would like to create a custom transformation function which would process each table in a stream independently. I can’t figure out how to do it. It seems that the only options for custom functions is either to use reduce to transform each table in the stream to a single record, or transform the entire stream.
I must be missing something, because obviously built-in function like limit() do exactly what I want a custom function to do. I just can’t figure out the syntax from the docs, or I’m completely missing it.
I have a stream of tables with a group key. I want to transform each table into one or more records keeping the same group key. Is there a way to do this?
I guess what I’m looking for is a table.map() function which takes as input each table in the stream and returns a new table to replace it in the stream with same grouping keys.
Let’s say I have input stream like this:
mydata =
Name Value
A 1
A 2
B 1
B 3
B 4
I want to be able to write a custom function that maps each table into a new table based on some logic (similar to reduce, but not restricted to a single record)
myfunc = (table) =>
// some magic sauce to create a new table based on existing table data
// and possibly merging with other streams
in other words, table A transformed into having an added row, and B lost a row, plus the tables extended with new columns.
My point about limit() in the initial question, is that I’m assuming this is somewhat how limit must work. It needs to know the table boundaries (or group key) so that it knows what to limit and can transform each table in the stream.
Sorry if I’m not making much sense. My original question was that I see documentation for custom aggregation function per table (using reduce) and it appears that all other custom functions work on the entire stream. I was wondering if there was a means to do what I want and I was just missing it in the docs.