Posts

Showing posts from August, 2016

Resolving Lock Contention in Apache Solr: A Performance-Analysis Detective Story

Image
This case study is an instructive example of how performance analysis is a multi-faceted process that often leads one in surprising directions.  Apache Solr Near Real Time (NRT)  Search  allows Solr users to search documents indexed just seconds ago. It’s a critical feature in many real-time analytics applications. As Solr indexes more and more documents in near real time, end-user expectations for performance get higher and higher. However, recently the  Cloudera Search  team found that Solr NRT indexing throughput often hit a bottleneck even when there are plenty of CPU, disk, and network resources available. Latency was average, in the hundreds of milliseconds range. Considering that Solr NRT indexing is a mainly machine-to-machine operation, without a human waiting for indexing to complete, that latency range was actually fairly good. Furthermore, some customers reported other issues under heavy Solr NRT indexing workloads, such as connection resets,...

How-to: Ingest Email into Apache Hadoop in Real Time for Analysis

Image
source: Cloudera blog Apache Hadoop  is a proven platform for long-term storage and archiving of structured and unstructured data. Related ecosystem tools, such as  Apache Flume  and  Apache Sqoop , allow users to easily ingest structured and semi-structured data without requiring the creation of custom code. Unstructured data, however, is a more challenging subset of data that typically lends itself to batch-ingestion methods. Although such methods are suitable for many use cases, with the advent of technologies like  Apache Spark ,  Apache Kafka , and  Apache Impala (Incubating) , Hadoop is also increasingly a real-time platform. In particular, compliance-related use cases centered on electronic forms of communication, such as archiving, supervision, and e-discovery, are extremely important in financial services and related industries where being “out of compliance” can result in hefty fines. For example, financial institutions are under r...

Big Data Trendz