14 episodes

Chats with authors of the latest Computer Science research papers. Hosted by Jack Waudby, researchers will discuss the problem(s) they tackled, solutions they developed, and how their findings can be applied in practice. This podcast is for industry practitioners, researchers, and students and aims to further narrow the gap between research and practice. Each series will focus on different Computer Science conference.
Hosted on Acast. See acast.com/privacy for more information.

Disseminate: The Computer Science Research Podcast Jack Waudby

    • Education

Chats with authors of the latest Computer Science research papers. Hosted by Jack Waudby, researchers will discuss the problem(s) they tackled, solutions they developed, and how their findings can be applied in practice. This podcast is for industry practitioners, researchers, and students and aims to further narrow the gap between research and practice. Each series will focus on different Computer Science conference.
Hosted on Acast. See acast.com/privacy for more information.

    Per Fuchs | Sortledton: a Universal, Transactional Graph Data Structure | #13

    Per Fuchs | Sortledton: a Universal, Transactional Graph Data Structure | #13

    Summary (VLDB abstract):
    Despite the wide adoption of graph processing across many different application domains, there is no underlying data structure that can serve a variety of graph workloads (analytics, traversals, and pattern matching) on dynamic graphs with transactional updates. In this episode, Per talks about Sortledton, a universal graph data structure that addresses the open problem by being carefully optimizing for the most relevant data access patterns used by graph computation kernels. It can support millions of transactional updates per second, while providing competitive performance (1.22x on average) for the most common graph workloads to the best-known baseline for static graphs – csr. With this, we improve the ingestion throughput over state-of-the-art dynamic graph data structures, while supporting a wider range of graph computations under transactional guarantees, with a much simpler design and signifcantly smaller memory footprint (2.1x that of csr).

    Links:PaperPer's LinkedInGraph Framework EvaluationImplementation
    Hosted on Acast. See acast.com/privacy for more information.

    • 41 min
    George Theodorakis | Scabbard: Single-Node Fault-Tolerant Stream Processing | #12

    George Theodorakis | Scabbard: Single-Node Fault-Tolerant Stream Processing | #12

    Summary (VLDB abstract):Single-node multi-core stream processing engines (SPEs) can process hundreds of millions of tuples per second. Yet making them fault-tolerant with exactly-once semantics while retaining this performance is an open challenge: due to the limited I/O bandwidth of a single-node, it becomes infeasible to persist all stream data and operator state during execution. Instead, single-node SPEs rely on upstream distributed systems, such as Apache Kafka, to recover stream data after failure, necessitating complex clusterbased deployments. This lack of built-in fault-tolerance features has hindered the adoption of single-node SPEs. We describe Scabbard, the frst single-node SPE that supports exactly-once fault-tolerance semantics despite limited local I/O bandwidth. Scabbard achieves this by integrating persistence operations with the query workload. Within the operator graph, Scabbard determines when to persist streams based on the selectivity of operators: by persisting streams after operators that discard data, it can substantially reduce the required I/O bandwidth. As part of the operator graph, Scabbard supports parallel persistence operations and uses markers to decide when to discard persisted data. The persisted data volume is further reduced using workload-specifc compression: Scabbard monitors stream statistics and dynamically generates computationally efcient compression operators. Our experiments show that Scabbard can execute stream queries that process over 200 million tuples per second while recovering from failures with sub-second latencies.

    Questions:Can start off by explaining what stream processing is and its common use cases?  How did you end up researching in this area? What is Scabbard? Can you explain the differences between single-node and distributed SPEs? What are the advantages of single-node SPEs? What are the pitfalls that have limited single-node SPEs adoption?What were your design goals when developing Scabbard?What is the key idea underpinning Scabbard?In the paper you state there are 3 main contributions in Scabbard can you talk us through each one;How did you implement Scabbard? Give an overview of architecture?What was your approach to evaluating Scabbard? What were the questions you were trying to answer?What did you compare Scabbard against? What was the experimental set up?What were the key results?Are there any situations when Scabbard’s performance is sub-optimal? What are the limitations? Is Scabbard publicly available?  As a software developer how do I interact with Scabbard? What are the most interesting and perhaps unexpected lessons that you have learned while working on Scabbard?Progress in research is non-linear, from the conception of the idea for Scabbard to the publication, were there things you tried that failed? What do you have planned for future research with Scabbard?Can you tell the listeners about your other research?  How do you approach idea generation and selecting projects? What do you think is the biggest challenge in your research area now? What’s the one key thing you want listeners to take away from your research?
    Links:PaperGitHubGeorge's homepage
    Hosted on Acast. See acast.com/privacy for more information.

    • 45 min
    Kevin Gaffney | SQLite: Past, Present, and Future | #11

    Kevin Gaffney | SQLite: Past, Present, and Future | #11

    Summary: In this episode Kevin Gaffney tells us about SQLite, the most widely deployed database engine in existence. SQLite is found in nearly every smartphone, computer, web browser, television, and automobile. Several factors are likely responsible for its ubiquity, including its in-process design, standalone codebase, extensive test suite, and cross-platform file format. While it supports complex analytical queries, SQLite is primarily designed for fast online transaction processing (OLTP), employing row-oriented execution and a B-tree storage format. However, fueled by the rise of edge computing and data science, there is a growing need for efficient in-process online analytical processing (OLAP). DuckDB, a database engine nicknamed “the SQLite for analytics”, has recently emerged to meet this demand. While DuckDB has shown strong performance on OLAP benchmarks, it is unclear how SQLite compares... Listen to the podcast to find out more about Kevin's work on identifying key bottlenecks in OLAP workloads and the optimizations he has helped develop.

    Questions: How did you end up researching databases? Can you describe what SQLite is? Can you give the listener an overview of SQLite’s architecture? How does SQLite provide ACID guarantees? How has hardware and workload changed across SQLite’s life? What challenges do these changes pose for SQLite?In your paper you subject SQLite to an extensive performance evaluation, what were the questions you were trying to answer? What was the experimental set up? What benchmarks did you use?How realistic are these workloads? How closely do these map to user studies? What were the key results in your OLTP experiments?You mentioned that delete performance was poor in the user study, did you observe why in the OLTP experiment?Can you talk us through your OLAP experiment?What were the key analytical data processing bottlenecks you found in SQLite?What were your optimizations? How did they perform? What are the reasons for SQLite using dynamic programming?Are your optimizations available in SQLite today? What were the findings in your blob I/O experiment? Progress in research is non-linear, from the conception of the idea for your paper to the publication, were there things you tried that failed? What do you have planned for future research? How do you think SQLite will evolve over the coming years? Can you tell the listeners about your other research?What do you think is the biggest challenge in your research area now? What’s the one key thing you want listeners to take away from your research?
    Links: SQLite: Past, Present, and FutureDatabase Isolation By SchedulingKevin's LinkedInSQLite Homepage
    Hosted on Acast. See acast.com/privacy for more information.

    • 48 min
    Matthias Jasny | P4DB - The Case for In-Network OLTP | #10

    Matthias Jasny | P4DB - The Case for In-Network OLTP | #10

    Summary: In this episode Matthias Jasny from TU Darmstadt talks about P4DB, a database that uses a programmable switch to accelerate OLTP workloads. The main idea of P4DB is that it implements a transaction processing engine on top of a P4-programmable switch. The switch can thus act as an accelerator in the network, especially when it is used to store and process hot (contended) tuples on the switch. P4DB provides significant benefits compared to traditional DBMS architectures and can achieve a speedup of up to 8x.

    Questions: 0:55: Can you set the scene for your research and describe the motivation behind P4DB? 
    1:42: Can you describe to listeners who may not be familiar with them, what exactly is a programmable switch? 
    3:55: What are the characteristics of OLTP workloads that make them a good fit for programmable switches?
    5:33: Can you elaborate on the key idea of P4DB?
    6:46: How do you go about mapping the execution of transactions to the architecture of a programmable switch?
    10:13: Can you walk us through the lifecycle of a switch transaction?
    11:04: How does P4DB determine the optimal tuple placement on the switch?
    12:16: Is this allocation static or is it dynamic, can the tuple order be changed at runtime?
    12:55:  What happens if a transaction needs to access tuples in a different order then that laid out on the switch? 
    14:11: Obviously you can’t fit all data on the switch, only the hot data, how does P4DB execute transactions that access some hot and some cold data that’s not on the switch?
    16:04: How did you evaluate P4DB? What are the results?  
    18:28: What was the magnitude of the speed up in the scenarios in which P4DB showed performance gains?
    19:29: Are there any situations in which P4DB performs non-optimally and what are the workload characteristics of these situations?
    20:36: How many tuples can you get on a switch? 
    21:23: Where do you see your results being useful? Who will find them the most relevant? 
    21:57: Across your time working on P4DB, what are the most interesting, perhaps unexpected,  lessons that you learned? 
    22:39: That leads me into my next question, what were the things you tried while working on P4DB that failed? Can you give any words of advice to people who might work with programmable switches in the future? 
    23:24: What do you have planned for future research? 
    24:24: Is P4DB publically available?
    24:53: What attracted you to this research area?
    25:42: What’s the one key thing you want listeners to take away from your research and your work on P4DB?
    Links: PaperPresentationWebsiteEmailGoogle ScholarP4DB

    Hosted on Acast. See acast.com/privacy for more information.

    • 27 min
    Tobias Ziegler | ScaleStore: A Fast and Cost-Efficient Storage Engine using DRAM, NVMe, and RDMA | #9

    Tobias Ziegler | ScaleStore: A Fast and Cost-Efficient Storage Engine using DRAM, NVMe, and RDMA | #9

    Summary: In this episode Tobias talks about his work on ScaleStore, a distributed storage engine that exploits DRAM caching, NVMe storage, and RDMA networking to achieve high performance, cost-efficiency, and scalability. 
    Using low latency RDMA messages, ScaleStore implements a transparent memory abstraction that provides access to the aggregated DRAM memory and NVMe storage of all nodes. In contrast to existing distributed RDMA designs such as NAM-DB or FaRM, ScaleStore stores cold data on NVMe SSDs (flash), lowering the overall hardware cost significantly. 
    At the heart of ScaleStore is a distributed caching strategy that dynamically decides which data to keep in memory (and which on SSDs) based on the workload. Tobias also talks about how the caching protocol provides strong consistency in the presence of concurrent data modifications.

    Questions: 0:56: What is ScaleStore? 
    2:43: Can you elaborate on how ScaleStore solves the problems you just mentioned? And talk more about its caching protocol?
    3:59: How does ScaleStore handle these concurrent updates, where two people want to update the same page?
    5:16: Cool, so how does anticipatory chaining work and did you consider any other ways of dealing with concurrent updates to hot pages?
    7:13: So over time pages get cached, the workload may change, and the DRAM buffers fill up. How does ScaleStore handle cache eviction? 
    8:57: As a user, how do I interact with ScaleStore?
    10:19: How did you evaluate ScaleStore? What did you compare it against? What were the key results? 
    12:31: You said that ScaleStore is pretty unique in that there is no other system quite like it, but are there any situations in which it performs poorly or is maybe the wrong choice?
    14:09: Where do you see this research having the biggest impact? Who will find ScaleStore useful, who are the results most relevant for? 
    15:23: What are the most interesting or maybe unexpected lessons that you have learned while building ScaleStore?
    16:55: Progress in research is sort of non-linear, so from the conception of the idea to the end, where there things you tried that failed? What were the dead ends you ran into that others could benefit from knowing about so they don’t make the same mistakes?  
    18:19: What do you have planned for future research?
    20:01: What attracted you to this research area? What do you think is the biggest challenge in this area now? 
    20:21: If the network is no longer the bottleneck, what is the new bottleneck?
    22:15: The last word now: what’s the one key thing you want listeners to take away from your research?

    Links: SIGMOD Paper
    SIGMOD Presentation
    Website
    Email  
    Twitter
    Google Scholar

    Hosted on Acast. See acast.com/privacy for more information.

    • 23 min
    Chuzhe Tang | Ad Hoc Transactions in Web Applications: The Good, the Bad, and the Ugly | #8

    Chuzhe Tang | Ad Hoc Transactions in Web Applications: The Good, the Bad, and the Ugly | #8

    Summary: Many transactions in web applications are constructed ad-hoc in the application code. For example, developers might explicitly use locking primitives or validation procedures to coordinate critical code fragments. In this episode, Chuzhe tells us these ad-hoc transactions, database operations coordinated by application code.
    Until Chuzhe’s work, little was known about them. In this episode he chats about the first comprehensive study on ad hoc transactions. By studying 91 ad hoc transactions among 8 popular open-source web applications, he and his co-authors found that (i) every studied application uses ad hoc transactions (up to 16 per application), 71 of which play critical roles; (ii) compared with database transactions, concurrency control of ad hoc transactions is much more flexible; (iii) ad hoc transactions are error-prone-53 of them have correctness issues, and 33 of them were confirmed by developers; and (iv) ad hoc transactions have the potential to improve performance in contentious workloads by utilizing application semantics such as access patterns. 
    During the interview he discusses the implications of ad hoc transactions to the database research community.

    Questions: 0.58: What is concurrency control and why is it important for web applications?
    3:00: How do applications today use concurrency control? Do they use classical database transactions? Or do they use other approaches?
    4:09: How are these ad-hoc transactions used in practice? What was the primary focus of this paper?
    5:13: You mentioned you studied various open-source applications to investigate ad-hoc transactions, which applications did you look at?
    6:16: So what did you find when studying these different web applications? What do these ad-hoc transactions look like in the wild? Can you elaborate on how they differ 
    8:59: When you compared ad-hoc transactions vs classic transactions? Are comparing potentially incorrect ad-hoc transactions vs correct transactions, if so are performance gains just not accepting it might be potentially incorrect at some point?
    10:25: We’ve spoken about how ad-hoc transactions were incorrect. Can we talk about the root cause of this, what were the common mistakes people were making with ad-hoc transactions?
    12:16: What was the performance gain of ad-hoc transactions?
    15:47: Are there other studies of transactions in the wild? If so, how do their findings compare to yours?
    18:38: What does all this mean in practice? Why don’t people just use database transactions? What puts people off using them and thinking I’ll just roll my own?
    21:10: Where do you see your findings having the biggest impact?
    24:42: What do you have planned for future research?
    26:46: What was the most interesting or perhaps unexpected lesson you learnt whilst working on ad-hoc transactions?
    29:13: What attracted you to database concurrency control research?
    30:53: What is the one key thing the listener should take away from your research?

    Links: Presentation
    Paper
    Chuzhe's Website
    Feral Concurrency Control
    What are we doing with our lives? Nobody cares about our concurrency control research

    Hosted on Acast. See acast.com/privacy for more information.

    • 32 min

Top Podcasts In Education

Mel Robbins
Dr. Jordan B. Peterson
Jordan Harbinger
The Atlantic
TED
Rich Roll

You Might Also Like