Applying Efficient MapReduce Testing to Create Robust Big Data Systems
In the era of Big data, MapReduce, a programming model to perform large-scale data processing, has become widespread across various industries such as healthcare, banking, and retail. Applications of MapReduce in such industries include machine learning data analysis, data syncs, and asynchronous task execution. Because MapReduces handle enormous data, errors in their execution can potentially corrupt an organization’s entire data system. Therefore, it is crucial for computer scientists and IT professionals to establish good testing techniques to catch these errors in advance. In this talk, we will highlight three major categories of bugs that plague MapReduces. In order to avoid these bugs and produce robust data systems, we will discuss four important testing strategies: counter-based testing, QA environment testing, hermetic environment testing, and AB testing. The presentation will conclude with an in-depth analysis of the AB testing method and its strength in detecting all three types of MapReduce bugs.