This post deals with anomaly detection. There are three broad approaches to Anomaly detection
Consider the following transactional data set
Note that transactions pass through created->paid->cancelled in order.
The analyst usually needs to answer questions like
The key issue with this kind of data is that druid does not support row level updates. In a typical data warehouse the data would be updated using transaction_id as a primary key. This is however not possible in…
One of the key requirements in machine learning is to monitor the inputs and predictions of the pipeline in real time. This need arises as sudden large disturbances in inputs and/or the outputs leads to disturbances in down stream application and activities that consuming predictions.
In this post we will look at a simple ML pipeline posting input and predictions to a kafka topic. Data from this topic will be continuously ingested into a druid cluster. …