Stream processing is one of the key data processing modes, related to dataflow programming. It was dominant in the punch-card era, and is becoming prevalent again, in the era of huge data, ubiquitous sensors and distributed computing. Its characteristic is incremental, sequential processing with bounded buffering, which lets one handle possibly unbounded amount of data in limited space. Another characteristic is the ease of specifying it as a Xmas-lights diagram: if some further processing is needed, just plug in another segment.
Although the diagrams are easy to draw, they are difficult to implement with low latency and in low memory. This talk is about the key optimization: stream fusion, which is combining several simple processing steps into one complex step, reducing the amount of intermediary data and communication overhead. Specifically, we will talk about complete fusion: not just reduction but complete elimination. This is hard, especially for diagrams with "fat pipes" (flatmap) and "joins" (zip).
This talk introduces the ongoing work on strymonas, which is a high-performance code generation library (DSL) that converts a diagram-like specification into hand-written-like code -- with assured complete fusion. We describe the main ideas behind the complete fusion of diagrams with joins, and illustrate on the example of the software FM radio.