45 min

George Theodorakis | Scabbard: Single-Node Fault-Tolerant Stream Processing | #12 Disseminate: The Computer Science Research Podcast

- Education

Summary (VLDB abstract):Single-node multi-core stream processing engines (SPEs) can process hundreds of millions of tuples per second. Yet making them fault-tolerant with exactly-once semantics while retaining this performance is an open challenge: due to the limited I/O bandwidth of a single-node, it becomes infeasible to persist all stream data and operator state during execution. Instead, single-node SPEs rely on upstream distributed systems, such as Apache Kafka, to recover stream data after failure, necessitating complex clusterbased deployments. This lack of built-in fault-tolerance features has hindered the adoption of single-node SPEs. We describe Scabbard, the frst single-node SPE that supports exactly-once fault-tolerance semantics despite limited local I/O bandwidth. Scabbard achieves this by integrating persistence operations with the query workload. Within the operator graph, Scabbard determines when to persist streams based on the selectivity of operators: by persisting streams after operators that discard data, it can substantially reduce the required I/O bandwidth. As part of the operator graph, Scabbard supports parallel persistence operations and uses markers to decide when to discard persisted data. The persisted data volume is further reduced using workload-specifc compression: Scabbard monitors stream statistics and dynamically generates computationally efcient compression operators. Our experiments show that Scabbard can execute stream queries that process over 200 million tuples per second while recovering from failures with sub-second latencies.

Questions:Can start off by explaining what stream processing is and its common use cases? How did you end up researching in this area? What is Scabbard? Can you explain the differences between single-node and distributed SPEs? What are the advantages of single-node SPEs? What are the pitfalls that have limited single-node SPEs adoption?What were your design goals when developing Scabbard?What is the key idea underpinning Scabbard?In the paper you state there are 3 main contributions in Scabbard can you talk us through each one;How did you implement Scabbard? Give an overview of architecture?What was your approach to evaluating Scabbard? What were the questions you were trying to answer?What did you compare Scabbard against? What was the experimental set up?What were the key results?Are there any situations when Scabbard’s performance is sub-optimal? What are the limitations? Is Scabbard publicly available? As a software developer how do I interact with Scabbard? What are the most interesting and perhaps unexpected lessons that you have learned while working on Scabbard?Progress in research is non-linear, from the conception of the idea for Scabbard to the publication, were there things you tried that failed? What do you have planned for future research with Scabbard?Can you tell the listeners about your other research? How do you approach idea generation and selecting projects? What do you think is the biggest challenge in your research area now? What’s the one key thing you want listeners to take away from your research?
Links:PaperGitHubGeorge's homepage
Hosted on Acast. See acast.com/privacy for more information.