Modern analytics solutions have gone beyond pageviews for a while now. Creating funnels to show progress is now mainstream, and retention cohorts are easy to create. Reading this data, however, has become harder than ever; there is so much context required to understand what these numbers mean, that the data easily loses its meaning.
In order to base decisions on the data alone, the context of that data needs to be well-defined. Take the following example:
This feature in our app does not work well, only 50 users clicked on it
Obviously, there is context missing here. The trivial one is timespan. Maybe the reporting interval is very short, voiding the conclusion.
Now imagine you can have whatever context you require: how many users have seen the button? How hard was it to get there? How many users that clicked the button were actually successful on your goal? Is this button used multiple times in a user’s journey? What’s the size of it?
If you have answers to all those questions, it probably makes sense to reconsider the original conclusion. It’s also very well possible that all this context only blurs the original answer, which was nice and simple. We understand that as well, and that’s why we’re building a data pipeline (and of course some nice visualisation) to first surface all the context you can think of, and more importantly, don’t show most of it.
Contextual analytics in less context?
Wait, you just said you need all the context you can get? Correct, but it’s impossible to go through all that context using just your brain. You need AI assistance!
Using a pipeline full of ETL, cleaning, labelling, aggregation & mapping models, followed by statistical analysis, and frequently also heuristic approaches, we process analytics data from almost any source (I never said the ETL was easy), as well as context retrieved from the product’s UI and external sources. This leaves us with a vast amount of data, complete with all the context you could ever need.
The source data set size regularly surpasses the 1TB mark, so while some of the models would be easy to run if you could fit the data in memory, extra effort is required to run them on bigger data sets. We employ a combination of analysis in BigQuery and Python to get our models through this data. The resulting data is much smaller, but not small enough for a (normal) human being to process. This is where something we call “signals” come in. Scoring the context based on predicted relevance, highlighting what is exceptional, both on the positive and the negative side of the spectrum.
Better informed conclusions
By looking at the entire behaviour base, and all the context that provides, we can go back to our previous example, only looking at context with a strong signal, and conclude:
This feature has a lot of potential: there are only 50 clicks (with a CTR higher than 90% of the other buttons on the same screen), but it’s hard to reach (takes 80% more steps than the average feature), all users that actually click have a high (top 10 percentile) chance of becoming successful after just one interaction (less than 80% of other CTAs).
The numbers in the revisited example may seem daunting or hard to digest, but look how smart visualisation really makes the decision obvious, by removing as much non-distinguishing data as possible.
The data points that we did not have in the beginning of this blog add a lot of context that will allow one to come to a better conclusion, and that data is already available; it’s just not processed nor surfaced properly in most cases. We’re continuously hunting for new context to add, and optimizing the pipeline to deal with the added complexity, without overloading the end-user of the analytics data.
What’s in it for me?
If you think these are interesting problems to work on, you like working with a lot of behavioural data and make it predictive, Objectiv is the place for you and you’re in luck: we have a job opening for a data scientist at the moment!