Resolving Lock Contention in Apache Solr: A Performance-Analysis Detective Story
This case study is an instructive example of how performance analysis is a multi-faceted process that often leads one in surprising directions. Apache Solr Near Real Time (NRT) Search allows Solr users to search documents indexed just seconds ago. It’s a critical feature in many real-time analytics applications. As Solr indexes more and more documents in near real time, end-user expectations for performance get higher and higher. However, recently the Cloudera Search team found that Solr NRT indexing throughput often hit a bottleneck even when there are plenty of CPU, disk, and network resources available. Latency was average, in the hundreds of milliseconds range. Considering that Solr NRT indexing is a mainly machine-to-machine operation, without a human waiting for indexing to complete, that latency range was actually fairly good. Furthermore, some customers reported other issues under heavy Solr NRT indexing workloads, such as connection resets,...