The top Java anomaly detection tools you should know
Application failures can happen due to a wide set of reasons, and
there are tools that address each one of the possible sources for
errors, such as log management tools, error trackers, performance
monitoring solutions and so on. We’ve actually researched this quite a
bit, and found the different methods of logging in production, most common ways to solve Java application errors and how application monitoring tools can assist in detecting errors.
And here comes the BUT…
The data that these tools collect is often made up of lots of noise.
How can we know what’s important and what’s not? That’s where anomaly
detection tools fit in. In the following post we’ll go over some of the
tools that focus on detecting and predicting when anomalies might
happen. Let’s check them out.
X-Pack
is an extension to the ELK Stack that offers anomaly detection. It uses
algorithms that help users understand the behavior of their logs,
detecting when they’re not acting as usual. The package relies on logs
as its data source, letting the users understand how specific metrics
might impact the product and how users experience it.
Key features:
Detecting anomalies within Elasticsearch log data and metrics
Identifying security issues by monitoring network activity and user behavior
Identifying log events that usually lead to an anomaly
How it works:
X-Pack uses Elasticsearch log data and models a baseline of its
behavior. By analyzing the logs from the application, servers and
services, X-Pack can detect trends and periodicity of use, and analyze
the data to try to predict when an issue might occur.
The anomaly detection feature is enabled by default when installing
X-Pack, and it implements existing ELK cluster privileges and built-in
roles to make it easier to control which users have authority to view
and manage the jobs, data feeds, and results.
X-Pack anomaly detection timelineSecret sauce: X-Pack anomaly detection is
auto-enabled, aggregating data directly from Elasticsearch and is made
for those who use ELK and want an anomaly detection solution as part of
the Elastic suite of tools. Bottom line: The “unfair” advantage X-Pack has is
its integration with the Elastic suite of tools. With that said, if
you’re using ELK, you probably already know that you’re not limited to
using Elastic’s own tools, there’s a wide ecosystem to choose from.
Also, if you’re not using ELK, this tool is not the one for you.
2. Loom Systems
Loom Systems
offers an analytics platform for anomaly detection in logs and metrics.
It detects anomalies in logs, and also provides anomaly detection
within operational analytics.
Key features:
Automated log parsing and analysis from different applications
Recommended resolutions – Based on the company’s solution database
Business operation anomaly detection
How it works:
On the technical side, Loom collects log data, parses it to break
down log lines to separate fields, and applies anomaly detection
algorithms according to each fields data type. Alongside log events,
Loom’s algorithms can handle other textual sources or streams of events,
and create anomaly baselines for them.
The baselines and thresholds set by Loom are dynamic, which means
that they change and adapt according to the user’s behavior and
application updates. Each anomaly is accompanied by an explanation of
what happened, along with recommended resolutions.
Loom anomaly detection and insights dashboardSecret Sauce: Along with detecting anomalies, Loom
offers its knowledge base that shares solutions across the company,
helping other developers and teams understand why an anomaly occurred
and how it was handled. Bottom line: Loom uses application logs and metrics
to try to understand how applications normally behave, and offers
recommended resolutions and action items.
OverOps
tells you when, where and why code breaks in production. It is the only
tool that gives you the complete source code and variable state across
the entire call stack for every error, and lets you proactively detect
when new errors are introduced into the application.
Key features:
Full visibility into code and variable state to automatically reproduce any error
Proactive detection of all new and critical errors by code release
Native Java agent that doesn’t rely on log files
Working with any StatsD complaint tool for custom anomaly detection visualization
No code and configuration changes, installs in 5 minutes through SaaS, Hybrid, and On-Premises
A badass dashboard with a dark theme
How it works:
OverOps
is a native monitoring agent that operates between the JVM and the
processor, extracting information from the application itself. It
doesn’t require any code changes, and it doesn’t rely on the information
that was logged, but instead on the information coming directly from
the application. OverOps helps companies like Fox, Comcast and
TripAdvisor transform manual reactive processes of sifting through logs,
and turn them into proactive automated processes.
OverOps uses REST APIs to offer advanced visualization and anomaly
detection abilities to its users, and correlates the variable state of
the application with internal JVM metrics (such as CPU utilization, GC
and others), when application errors occur across microservices and
deployments. OverOps integrates with any StatsD compliant tool to offer
custom visualization of anomalies, along with any other view you’re
interested in monitoring. OverOps also offers machine learning algorithms in Java, Python and Go via bundling Kapacitor and Yahoo EGADS.
It also integrates with any anomaly detection tool, by adding a link to
every error in the logs. Clicking on that link shows you a detailed
view of the real root cause of the issue. The complete source code and
variable state at the moment of error, across the entire call stack. Secret sauce: OverOps knows log files suck. That’s
why it has zero reliance on log files, and the data comes directly from
the JVM itself. Since OverOps is the only tool to give you the complete
source, state and stack for each error, it offers a 360 view of
anomalies and issues within your application. Bottom line: Detecting anomalies is important, but
it’s not going to help if you don’t have the real root cause and
variables that lead to it.
Events in the OverOps dashboard include the full stack trace and variable state at the time an exception occursWatch a live demo of OverOps.
4. Coralogix
Coralogix
clusters and identifies similarities in log data. The tool focuses on
common flows, detecting the log messages that are connected to them, and
alerting when an action didn’t cause the expected outcome.
Key features:
Loggregation – Bundle and summarize logs that have the same pattern
Flow anomaly – Identification of connected actions, and detection of anomalies within them
Version based anomalies – Specifying anomalies that only occurred after a new version of the user’s product was deployed
How it works:
Coralogix operates under the assumption that most logs are similar,
when the only thing that differentiates them from one another is the
variables within them. That’s why Coralogix auto clusters the data to
identify patterns, and connects the dots between the data. If an action
calls for a certain response and doesn’t get it, that’s when an anomaly
is detected.
Coralogix flow anomaly dashboard
Secret sauce: Coralogix has the ability to aggregate logs into their
original templates and analyze that data to understand anomalies.
Bottom line: Coralogix bundles logs with similar patterns, focusing
on the different fields within each message. By doing so, the company
can detect anomalies within certain actions and flows, and focus on the
biggest anomaly picture on not on single incidents that might occur in
the application.
5. Anodot
Anodot
offers an anomaly detection system with the relevant analytics for the
users. Their focus is on detecting anomalies in databases of any type,
along with identifying anomalies in business related data.
Key features:
Behavioral correlation and grouping of similar logs
Business data anomalies detection to offer anomaly detection within marketing campaigns, clicks and performance indicators
Alerts handling – Reducing noise by grouping similar anomalies into one alert
How it works:
Anodot uses their algorithms to isolate issues and correlate them
across a number of parameters. On the practical side, the company
determines the normal range of the application or the action, and gives
it a score that it has to keep.
When an event changes that score, the system assesses the importance
of the anomaly based on the status of the data, and how long it acted
this way. Anodot always alerts the user of the anomaly, whether it’s
good or bad, so that they can handle it as they see fit.
Anodot anomaly detection and analytics
Secret sauce: Anodot can auto-select the most relevant algorithm
needed for the data pattern, which changes and adapts as the patterns
change.
Bottom line: Anodot focuses on logs, metrics and business indicators,
which can address not only the development team, but other members of
the company as well.
Speaking of anomaly detection…
Numenta
offers an open source project that takes a broader look at the world of
anomaly detection. Its technology can detect anomalies in servers and
applications, along with human behavior, geospatial tracking data (GPS
tracking), and prediction and classification of natural language.
Basically, any dataset that has a baseline or trends.
The most interesting thing about Numenta is the Numenta Anomaly Benchmark
(NAB). It’s a benchmark that allows evaluation of algorithms for
anomaly detection in streaming, real-time applications. It allows you to
test your current algorithms, see benchmarks from the community and get
a deeper understanding as to how to detect anomalies.
The library is open sourced, and comprised of over 50 labeled
real-world and artificial time series data files plus a scoring
mechanism designed for real-time applications. If you’re already using
an anomaly detection algorithm, Numenta can help you evaluate it. Also,
if you’re looking for an open source tool, this might be the answer for
you.
Final thoughts
Anomaly detection helps gain better insights out of production
applications. Each tool has its own way to identify anomalies. The most
important thing we should remember is that it’s not only about the
dashboard; it’s about the data. That’s why we urge you to explore each
one, and base your final decision on the one tools that give you the
best value according to the problem that you’re trying to solve.
No comments:
Post a Comment