Publications


  • RedisGraph GraphBLAS Enabled Graph Database
    P. Cailliau, T. Davis, V. Gadepally, J. Kepner, R. Lipman, J. Lovitz, K. Ouaknine
    IPDPS Workshops ’19: IEEE Parallel and Distributed Processing Symposium Workshops
    2019, Rio de Janeiro, Brazil. DOI: 10.1109/IPDPSW.2019.00054
    PDF ,

    @article{cailliau2019redisgraph,
    title={RedisGraph GraphBLAS Enabled Graph Database},
    author={Cailliau, Pieter and Davis, Tim and Gadepally, Vijay and Kepner, Jeremy and Lipman, Roi and Lovitz, Jeffrey and Ouaknine, Keren},
    journal={arXiv preprint arXiv:1905.01294}, year={2019} }

    Definitive

    RedisGraph is a Redis module developed by Redis Labs to add graph database functionality to the Redis database. RedisGraph represents connected data as adjacency matrices. By representing the data as sparse matrices and employing the power of GraphBLAS (a highly optimized library for sparse matrix operations), RedisGraph delivers a fast and efficient way to store, manage and process graphs. Initial benchmarks indicate that RedisGraph is significantly faster than comparable graph databases.


  • A Performance Study of AsterixDB
    Keren Ouaknine, Michael Carey
    Big Data ’17: IEEE International Conference on Big Data.
    December 2017, Boston, Massachusetts. pages 2812-2820
    PDF ,

    @inproceedings{ouaknine2017performance,
    title={A performance study of AsterixDB},
    author={Ouaknine, Keren and Carey, Michael},
    booktitle={Big Data (Big Data), 2017 IEEE International Conference on},
    pages={2812--2820},
    year={2017},
    organization={IEEE}
    }

    Definitive

    Apache AsterixDB is a relatively new Big Data management platform providing ingestion, storage, management, indexing, querying, and analyses of vast quantities of semi-structured information on scalable computer clusters. This paper compares the execution and performance of an early release of Apache AsterixDB with two popular platforms, Apache Hadoop and HPCC Systems, over the 17 PigMix benchmark query scenarios. We discuss the results and also how they have influenced the AsterixDB effort.


  • Optimization of RocksDB for Redis on Flash
    Keren Ouaknine, Oran Agra, Zvika Guz
    ICCDA ’17: Proceedings of the International Conference on Compute and Data Analysis ACM.
    June 2017, Lakeland, Florida. pages 155-161
    PDF ,

    @inproceedings{ouaknine2017optimization,
    title={Optimization of RocksDB for Redis on Flash},
    author={Ouaknine, Keren and Agra, Oran and Guz, Zvika},
    booktitle={Proceedings of the International Conference on Compute and Data Analysis},
    pages={155--161},
    year={2017},
    organization={ACM}
    }

    Definitive

    RocksDB is a popular key-value store, optimized for fast storage. With Solid-State Drives (SSDs) becoming prevalent, RocksDB gained widespread adoption and is now common in production settings. Specifically, various software stacks embed RocksDB as a storage engine to optimize access to block storage. Unfortunately, tuning RocksDB is a complex task, involving many parameters with different degrees of dependencies. As we show in this paper, a highly tuned configuration can improve performance by an order of magnitude over the baseline configuration. In this paper, we describe our experience optimizing RocksDB for Redis-on-Flash (RoF) - a commercial implementation of the Redis in-memory key-value store that uses SSDs as RAM extension to dramatically increase the effective per-node capacity. RoF stores hot values in RAM, and utilizes RocksDB to store and manage cold data on SSD drives. We describe our methodology for tuning RocksDB parameters and present our experiments and findings (including both positive and negative tuning results) on two clouds: EC2 and GCE. Overall, we show how tuning RocksDB improved the database replication time for RoF by more than 11x. We hope that our experience will help others adopt, configure, and tune RocksDB in order to realize its full performance potential.


  • The PigMix Benchmark on Pig, MapReduce, and HPCC Systems
    Keren Ouaknine, Michael Carey and Scott Kirkpatrick
    Big Data Congress ’15: International Congress on Big Data IEEE.
    June 2015, Manhattan New-York. pages 643-648.
    PDF ,

    @inproceedings{ouaknine2015pigmix,
    title={The PigMix Benchmark on Pig, MapReduce, and HPCC Systems},
    author={Ouaknine, Keren and Carey, Michael and Kirkpatrick, Scott},
    booktitle={Big Data (BigData congress), 2015 IEEE International Congress on},
    pages={643--648},
    year={2015},
    organization={IEEE}
    }

    Definitive

    Soon after Google published MapReduce, their paradigm for processing large amounts of data, the open-source world followed with the Hadoop ecosystem. Later on, Lexis Nexis, the company behind the world's largest database of legal documents, open-sourced its Big Data processing platform, called the High-Performance Computing Cluster (HPCC). This paper makes three contributions. First, we describe our additions and improvements to the Pig Mix benchmark, the set of queries originally written for Apache Pig, and the porting of Pig Mix to HPCC. Second, we compare the performance of queries written in Pig, Java MapReduce, and ECL. Last, we draw conclusions and issue recommendations for future system benchmarks and large-scale data-processing platforms.


  • Reducing Performance Evaluation Sensitivity and Variability by Input Shaking
    Dan Tsafrir, Keren Ouaknine, Dror G. Feitelson
    MASCOTS ’07: IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
    October 2007, Istanbul, Turkey, pages 231-237
    PDF ,

    @inproceedings{tsafrir2007reducing,
    title={Reducing performance evaluation sensitivity and variability by input shaking},
    author={Tsafrir, Dan and Ouaknine, Keren and Feitelson, Dror G},
    booktitle={Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2007. MASCOTS'07. 15th International Symposium on},
    pages={231--237},
    year={2007},
    organization={IEEE}
    }

    Definitive

    This paper presents a surprising result: changing a seemingly innocuous aspect of an experimental setup can cause a systems researcher to draw wrong conclusions from an experiment. What appears to be an innocuous aspect in the experimental setup may in fact introduce a significant bias in an evaluation. This phenomenon is called measurement bias in the natural and social sciences. Our results demonstrate that measurement bias is significant and commonplace in computer system evaluation. By significant we mean that measurement bias can lead to a performance analysis that either over-states an effect or even yields an incorrect conclusion. By commonplace we mean that measurement bias occurs in all architectures that we tried (Pentium 4, Core 2, and m5 O3CPU), both compilers that we tried (gcc and Intel's C compiler), and most of the SPEC CPU2006 C programs. Thus, we cannot ignore measurement bias. Nevertheless, in a literature survey of 133 recent papers from ASPLOS, PACT, PLDI, and CGO, we determined that none of the papers with experimental results adequately consider measurement bias. Inspired by similar problems and their solutions in other sciences, we describe and demonstrate two methods, one for detecting (causal analysis) and one for avoiding (setup randomization) measurement bias.