Skip to main content
Photo by Charles Deluvio on Unsplash

In 2008, Youtube launched its first recommendation system, ranking videos based purely on popularity. Thirteen years, 2.29 billion users, and over 80 billion dollars in revenue later, Youtube’s algorithm is one of the largest scale and most sophisticated industrial recommendation systems in existence.

And I can tell that monolithic system to work for me.

With a single click (and some hours of continuous watching), I can easily flood my entire feed with dogs’ videos, whose real estate was pre-occupied by videos of entirely different topics. Even before I could tell it, Youtube magically displays tons of videos that match my other interests, ranging from Lofi music to inspiring Ted talks about procrastination (huh!).

We don’t really need to know codes to talk to machines.

Rise of the planet of The Algorithm

If we look at the top 10 social media platforms by the number of users, half of them have a recommendation system.

And we are communicating with the newsfeed through those systems every day.

We click on a video, we stay on a Reel for a long time, we subscribe to a channel, we save a post for later. All those non-verbal ways of communication are the equivalent version of us verbally saying: “hey I want to watch/read/see something like that more”. The only difference: we don’t have to say that.

In other words, our passive consumption has become our active communication with machines. If any, these machines are the most extreme fanatic of the statement “Actions speak louder than words”. Heck, they don’t even need us to say anything to know what we want. Yet, we still spend hours on end on TikTok and Instagram every day.

Youtube’s recommendation system. Image source: Google

The Algorithm (my way of bundling these recommendation systems) has not only impacted the way we entertain ourselves.

It affected the way we shop.

Ever click on a product on your favorite eCommerce platform only to see it popping up like mushrooms in the same app’s feed, your Facebook feed, Instagram stories, and weirdly but un-surprisingly, in the game you’re playing? Then, one day you realized that you have already bought it without a concrete plan.

It affected the way we find jobs.

Try viewing and saving a few job posts on LinkedIn and you’ll find similar jobs landed in your inbox in the upcoming weeks. Or you can simply connect to a recruiter with this position you are aiming for and LinkedIn will suggest hundreds of other recruiters who are likely to be looking for the same role. No more asking around for a job referral or scrolling through endless job postings. This is why I rarely connect with recruiters when I’m currently (and happily) employed. The suggestion of recruiters from awesome companies and the act of showing posts (mostly recruitment) that my already-connected recruiters interacted with is just distracting.

It affected the way we find soul mates.

I remember Tinder, 3 years ago, was no more than just a simple left-right-swiping app. Now, it recommends popular profiles (assumably got swiped right by many people) to entice you to upgrade in order to view and swipe them too. Profiles that match our interests and are in close proximity are also recommended to increase a match chance. As Tinder’s algorithm finds more ways to hit us with more dopamine waves (from matches, from knowing we’ve got superliked by someone…), we tend to resort to our phone rather than hangouts with friends for real dating experiences.

The Algorithm and self-service in data analytics

I wonder how long until we see a more prominent impact of The Algorithm on the way people self-serve in BI tools.

Although self-service is still a buzzword in the data industry and has much to do with our feeling, we cannot deny the similarities between the way we “self-serve” information on social media and the way we “self-serve” reports and dashboards.

(Please note that I’m talking from the perspective of a non-technical data/insight consumer, not report creators)

First, the information needs to be curated in order to be consumed. This is an inherent challenge due to the ever-expanding pool of information plus our dynamic interests.

For a video (out of billions!) to be “rightly” recommended to us, Youtube’s algorithm must first try to build a profile based on our watching behaviors, then build collections of topics that match that persona. Similarly, data consumers are presented with hundreds of tables pulled from different data sources. Unless somebody (yes, you, analysts) tells us which table to look up, or best yet cherry-picks the needed fields, it would take us – data consumers – days to find a simple business metric. Even if we manage to go through the tables ourselves and find the metrics we need, more traps are ahead. As Laurie Voss (Senior Data Analyst from Netlify) has put it nicely, data consumers are likely to use the wrong statistical method to calculate something (e.g median vs average), and that equals catastrophic self-service.

Second, the social media’s and analytics’ consumables share the same set of measurements. We have watch time, likes, dislikes, sharing on social media and so do we with BI tools. We can even record how many times a dashboard is viewed, or which metrics have been used the most when people are exploring datasets. What if we can utilize such information to touch the “unknown unknown”?

Say, when a person repeatedly views a report about the company’s revenue by country, we can (build an algorithm to) suggest similar reports related to the measure revenue such as revenue by cities (drilling down) or revenue by categories (drilling through). Our report can even go the extra mile to automatically alert us if the revenue year over year has dropped consistently for the past few months.

Although the concept of drill-down and drill-through have long been put into practice by big players like Holistics, PowerBI, or Tableau, it still requires a data analyst to predefine certain logic and ways of structuring the dashboards/reports. If the Algorithm can take care of the setup for analysts, it would certainly give them more space to work on challenging exploratory analyses, while giving the business users insights they didn’t know to ask for.

In my previous example, I provide my preference for the company’s revenue as an input, and the algorithm gives me what it thinks I will care about. That’s how Youtube Recommendation System is working.

In fact, we’ve been witnessing some early signals of this trend in the data industry.

Metabase has this futuristic feature called “Automatic Exploration” that, as the name suggests, auto-generates reports based on the metric(s) that we’re exploring.

Image source: Metabase


Technically, Metabase combines our metric(s) with different dimensions (e.g. locations, categories) or uses different analytical calculations (e.g. distribution, top N, sum, average…). Although this largely removes the effort needed to find the dimensions to create those reports, and tinker with visualization settings, it’s not “smart” enough. Some reports just do not make any sense. And even if they do, it still requires some knowledge to pinpoint what’s going on in that report.

The advanced version of Metabase’s “Automatic Exploration” is Amazon QuickSight’s machine learning feature called “ML Insights“. This feature goes one step further to detect the anomalies across the entire dataset, then generates user-friendly narratives. I remembered talking to one of my customers (I work in the BI industry) and they literally couldn’t stop rambling about how magically helpful such a feature was for them. That was the moment I realized how much we can minimize the “self-service” experience of a user.

ML Insights narratives. Image source: AWS QuickSight

Although this effortless version of getting insights out of a BI tool went against our traditional definition of “self-service”, it still is self-service. Yes, we seem to not take any physically deterministic actions, but our need is clearly conveyed and precisely interpreted via passive, subtle actions. And as long as we can reach our goals of extracting the needed data on our own, that’s self-service.

In short: We are not necessarily taking better actions with tech. But tech is getting better at depicting our actions. The future of AI-powered self-service doesn’t seem far-fetched.

Shall we fire the analysts?

I don’t think so.

Even Youtube is still relying on certified experts to quality check their “borderline” content. This human assessment then trains The Algorithm to model its decision and scale itself across the entire video base.

To determine borderline content, evaluators assess factors that include, but aren’t limited to, whether the content is: inaccurate, misleading or deceptive; insensitive or intolerant; and harmful or with the potential to cause harm.

Even Domo, a leading pioneer in using AI and ML to power its BI product, still admits the challenge of letting machine learning systems run on their own:

It is against the law, for example, to make hiring decisions based on race, gender, age, or sexual orientation, but there are proxies for these attributes in big datasets.

Similar to Youtube, we still need humans to determine the ethical data practice for machines to abide by.

The Algorithm can be leveraged to handle the sheer volume of data, but not (yet) the nature of the business questions. An analyst will know when to use, say, median (how much time does it take most of our customers to complete the onboarding) vs average (what’s the average duration for completing our onboarding process). A business question is inherently complex and thus not formulaic enough for machines to easily handle.

Also, data analysts still play an important role in data governance. Maintaining a single source of truth and training business users to ask the right questions is the prerequisite to any sustainable and scalable self-service experience.

For example, your user multiplies Quantity sold and Price then calls it Revenue in a report. The same user, weeks later, multiplies the number of customers and the Price then calls it Revenue, too. Now, the Revenues look the same, textually, but they can be largely different. What if 100 customers placed orders but only 80 had a successful purchase and 20 returned the orders?

Imagine these two Revenue fields are used interchangeably across the reports without anyone actually noticing.

Leave a Reply