“Queries are a significant portion of our customers’ big data workloads, so we know the performance of these workloads is critical. IT and applications teams can now get visibility into their Hive and Impala queries in one place, compare the runs of their queries and take advantage of the recommendations Query Spotlight provides,” says Ash Munshi, CEO, Pepperdata. “We’re confident Query Spotlight can increase the performance of their Impala queries while helping them decrease overall costs.”
Are your Apache Impala queries running slow and not achieving peak performance? Given Impala’s complexity, troubleshooting can be very difficult. Optimizing query performance is near impossible without the right tools. Good news: Pepperdata Query Spotlight now supports Apache Impala.
Query Spotlight makes it easy for operators and developers to understand the detailed Hive query performance characteristics of their queries and workloads, together with infrastructure-wide issues that impact these workloads. With the addition of Impala support, this important category of query workloads can now be tuned, debugged, and optimized for better performance and reduced costs.
The Apache Impala Advantage
What is Apache Impala in big data? Why is it a popular big data processing platform?
Apache Impala is an open-source MPP (Massive Parallel Processing) SQL query engine built to process large volumes of data. Impala delivers extremely high performance and low latency, as opposed to other popular SQL engines for Hadoop.
The role of Apache Impala in big data processes is to enhance theenhancesthe performance parameters by eliminating the need to migrate large data sets to designated processing systems or transform data formats before analysis. Essential Apache Impala features include:
- Hadoop Distributed File System (HDFS)
- Storage support for Apache HBase
- Support for various Hadoop file formats (i.e. text, Avro, LZO, RCFile, SequenceFile, and Parquet)
- Kerberos Authentication
- Apache Sentry’s detailed and role-based authorization
- Fast and easy recognition of metadata, ODBC driver, and SQL syntax from Apache Hive
Apache Impala’s rapid growth and expansion in just two years stem from the fact that Amazon Web Services and MapR both now support it.
Impala Apache utilizes standard components including HBase, HDFS, YARN, Sentry, and Metastore. This capability allows Impala users to enjoy the benefits of combined SQL support, in addition to the flexibility and scalability of Apache Hadoop. With Impala, you can process stored data in HDFS at light speed using traditional SQL knowledge. You can also access data stored in Amazon S3, HBase, and HDFS—even without Java knowledge.
Query Spotlight: Visibility into Apache Impala
Query Spotlight for Apache Impala gives developers and operators the big picture of their platform performance and helps them slash their operating costs. From detailed stats, query plans, breakups of every query duration, and more—the visibility is unparalleled. Query Spotlight also provides visibility into Impala databases and tables. The recommendation engine includes system-level recommendations as well as query-level recommendations—joins included. The tool also generates more effective and ideal Apache tuning configurations.
In addition to visualizing detailed query information on resource utilization and database views, Query Spotlight enables Impala users to create and receive alerts about Apache Impala queries, remediate issues, and optimize query performance. Query Spotlight enables developers to:
- See Impala-specific SQL query planning and execution information.
- Rapidly ascertain query plan problems.
- Analyze Impala query performance.
- Apply optimum Apache tuning configurations
- Identify bottlenecks that contribute to slow queries.
- Speed up time to resolution.
Operators can quickly narrow down problematic queries in a multi-user environment and use query performance insights to optimize cluster resources and improve productivity. To summarize, Query Spotlight now supporting Apache Impala brings the following benefits to the table:
- Visibility into all your Hive and Impala queries in one place in a similar format
- Recommendations to improve the performance of your queries
- Comparison of query runs with chargeback reports
More than one third of IT spend is spent on troubleshooting, performance, and availability. On top of that, 80% of organizations are going beyond their big data budgets. Inefficient queries are a big part of this, creating missed SLAs and slow database resources. Query Spotlight for Apache Impala changes all this for the better.
Interested how Pepperdata could optimize your big data infrastructure? Sign up for a free trial now!