We recently talked to Alex Li, the founder of augmented analytics platform Kubit and former CTO of Smule, about the best practices of mobile analytics. Below is the transcript of the podcast, edited for length and clarity.
Alex Li is the founder of Kubit, an analytics platform that turns any data warehouse into analytics powerhouse by making deep data discovery easy for everyone. Prior to founding Kubit, Alex served as CTO of Smule, the developer of Magic Piano and Sing Karaoke with 50 million monthly active users. Over the course of his seven-year leadership, Alex solved the complex challenges posed by Smule's exponential user growth and the data explosion that came with it. Before his role at Smule, Alex was the VP of Engineering at Booyah, and the principal architect at Jasper Wireless. He was also a founding member of eBay Classifieds. Alex has particular expertise in mobile app and platform development and relishes the opportunity to solve operations and data problems.
Can you give the audience a brief introduction about yourself, specifically how you’ve made your way to the world of startups and founded Kubit?
I've been in software business for almost 20 years, and entrepreneurship has always been something in my mind. I worked for four startups in the Bay Area, some of which were huge successes, while others were big failures. When I finished my seven-year term at Smule, where I built an engineering team from 10 to 100 people and scaled the business to 50 million active monthly users, I realized that it was time for me to rethink my next step.
I've learned a lot while building this music business by using data to make product development and overall business strategy decisions. Specifically, I realized that people were constantly performing repetitive tasks or reinventing wheels, even in mobile, a specific field. That makes me think I might be able to build a platform to make it easier for every mobile developer to adopt the best practice in the industry and save their pain and effort to get through all the lessons I've learned before. That's why I started the augmented analytics company Kubit 2 years ago, and I was lucky enough to get financing from the venture capitalists.
Can you further explain what augmented analytics is and the benefits of using it compared to traditional analytics?
When we talk about traditional analysis, most people think about reporting tools or BI (business intelligence) tools. Representative vendors include Tableau and Looker, which provide good data visualization. However, not everyone can use them to digest the data, nor can they make beautiful reports. Without training in these tools, you must either file a ticket or send an email to professional analysts, wait a few hours or days before you get any results from the data. Most likely, the first round of results isn't exactly what you're looking for, and you need to refine your strategies or questions and send your requests back. This type of traditional analytics works to some extent; however, it certainly cannot adapt to the fast iteration nature of modern consumer - oriented business, especially mobile business, which usually releases updates or new features once or twice a week.
Augmented analytics, a term invented by Garnter, can help where traditional analytics fail. It helps your business through self-service analytics, automation and collaboration.
Self-service means that it turns the entire data warehouse into a useful tool for everyone. Instead of asking professional analysts for help, you can write your own query or generate a fancy report without knowing SQL or any technical scripting language. The UI is very easy to understand and uses a natural business language to construct your reports, making data exploration much easier.
Automation means that the system uses AI or machine learning to remove users from repetitive tasks, making the whole process more thorough and efficient. E.g. Kubit can run hundreds of queries whenever an anomaly is found in your KPI.
Collaboration is the last piece. Traditionally, whenever people want to discuss some insights or findings in the data, they usually take a screenshot or send a link to a chart through an email thread or a Slack channel. The problem with this kind of communication or collaboration is that no one knows how you get to that point, as they may have to guess or reproduce manually what you have done, causing confusion and a lot of wasted effort. That's why in Kubit, we 're bringing the entire conversation inside the analytical tool itself, and every piece of information is at your fingertips. You can always know where it comes from, and what kind of data or query are behind it. In the layman's term, we call it slack for analytics.
You come from a very unique background - CTO and VP engineering at both large companies and fast growing startups, such as Smule, Booyah, Jasper Wireless, and Ebay. What was your perspective, seeing the evolution of those analytic tools? How should larger companies and startups think about using augmented analytics?
Nowadays, data scientist is a very hot job title and every company can't hire enough data scientists for their teams. Traditional analytics fall short in modern days as everyone wants to get insights from the data. Big companies like eBay always have a very siloed analytics team, which is a group of professional analysts. But outside of that group, nobody else has any clue where the data is and how to analyze it. The only way to get insights from data is sending request to the data team. But this kind of process never scales because data analysis is a very exploratory process, meaning that when you see one data set, you are going to have another 20 questions after that.
On the contrary, augmented analytics, specifically what Kubit was built, allow everyone in the company to fully understand what the data means and to run hundreds of queries on their own without involving another human being in the middle. This significantly reduces both the overhead and the problems caused by miscommunication. We make this whole process even more efficient through automation and collaboration in order to make people fully aware of the thinking process of others and how they draw conclusions.
Just one question to follow up on that. What areas should corporations pay attention to so that they can fully leverage the automation to make better decisions?
First of all, human needs to develop a fluid and efficient process before automating it. Once we know the process, we build order logic in the backend trying to simulate the next decision point or the next query that humans would want to run. However, in order for any kind of AI algorithm to be efficient, it needs to know the context of the data. All these AI engines, including ours, must be adapted to the situation of your business , in particular your data. We need to put this algorithm or engine on top of your data set, as well as monitor how this engine works, and in particular how human users consume the output generated by the AI engine. The complete process creates a feedback loop that helps us tune the engine to adapt to your environment.
I'm going to give you an example. A lot of companies always look at the gender of their users to see if there is a preference by different users. It makes sense in most companies, but if the majority of the company's user base is gender-specific, such as Pinterest, gender splitting is not necessarily a decisive factor. That's where a lot of these AI engines need to be smart about.
What are the best practices suggested for companies to improve or maintain their data quality?
Data quality is the last thing they pay attention to in most companies. However, there is a 70% chance that companies are chasing their tails. They may spend two weeks analyzing the data and looking at it from a different angle and eventually realize that there are data quality issues. In fact, this causes a significant amount of waste in the investigation process. In my opinion, data quality should always be the number one priority for any company dealing with data analytics.
My suggestions for good data quality practice include two parts – maintaining a data dictionary and constantly monitoring data.
Data dictionary is the definition of all data and events on your system. It's not something you 're just writing it down and giving it to everyone, then you're finishing the job. Every single property must be properly documented and shared with the entire company. Mobile businesses release once or twice a week, so hundreds of thousands of lines of code are being changed every day. There is always an update to the data dictionary entries, which must be maintained on a continuous basis to keep up with trends and allow you to correlate issues with changes in the data dictionary.
The second part is monitoring. You need. a sophisticated monitoring system, which not just blindly checks the numbers, but also understands the nature of the business. For example, mobile business is very seasonal - Monday is usually a low point while Saturday and Sunday are usually very high. With such a system in place, you can proactively call out anomalies before anyone wastes time troubleshooting them.
Do you think it makes sense for corporations to build their own data analytics platform? Or should they just buy it?
In fact, I've been dealing with this issue a lot over the last 10 years working in mobile companies, and I've been trying both of them. My experience is very specific to growth stage startups, but I think it also applies to large-scale companies.
From my experience, it is always better to own your data , which means that you first need to own the definition of the data, and ideally, you should also own it in your own data warehouse instead of sending it out. In fact, this is a common pitfall of a lot of modern analytics platforms that require you to install SDK on your mobile app, and then this SDK will collect all the events and send them.
The consequences of losing control of data are twofold. First, you 're never going to be able to see these data again. If you want to build more features later, you 're going to have to beg people to send your data back to you. Worse still, now you have two sources of truth. If the two sources don't match, which one would you trust? You don't know what's going to happen on the third party side, and there's no way you can get into their data warehouse and try to troubleshooting things.
In short, I would recommend that companies define a data dictionary and upload the data to their own data warehouse. They can then purchase reporting tools, query automation , data collaboration platform etc. to make sense of the data. These tools or platforms should not be built because they are completely different software. Companies that supply these products can iterate much faster than hiring a team in your own consumer business to build enterprise software.
That's helpful advice, and it's also very workable. Based on your extensive experience in data analytics, what are the most common pitfalls companies should be aware of?
I would say that one of the major pitfalls is that people think that analytics is equal to having engineers take requests, write SQL, and return the results. This never works because the analytics data set differs from your day-to-day operational or transactional data set. In order for analytics to be efficient and to really get the insight you want, you need to implement your app in a very different way. Anyone who wants to start some data analytics, the first questions they need to face is to build a parallel data infrastructure or data pipeline to collect the events or data needed to meet the needs of analytics or KPIs. It also requires the design of your data warehouse or data model.
The other common pitfall is to think that hiring a data scientist can solve all the problems. This is not going to work either because what a data scientist or analyst can do is look at the data and make sense of it, but someone has to prepare the data, clean the data and maintain the quality of the data. As such, you've got to work on both sides. You need a data scientist or analyst to define your needs. You also need an engineering team to build the data collection infrastructure. This engineering team will include both the backhand and the frontend: the backend engineers can process data while the frontend engineers can use the data. This is a very sophisticated engineering process that requires a lot of investment and commitment.
The third pitfall is that thinking analytics is all about visualization and therefore relies only on traditional BI tools. These tools require professionals to prepare reports and write queries. Typically, it's going to be one person who builds the analysis and everyone else consumes it. As a result, nobody can verify and fully understand the data. This is where Kubit's value proposition makes sense. You need a third party service to make your data easy to consume and make everyone a professional analyst on their own.
At present, most people are still working from home, and we even see that some tech firms are allowing their employees to do that indefinitely. In your opinion, how can remote data team work effectively and collaborate well?
When you work from home, even if you have zoom, slack, and email, you can never replace face-to - face and whiteboard sessions. It's so hard to really explain to others what's in your head, especially any data analytics issues. To me this is opportunity to emphasize on the whole self-service as well as collaboration perspective of the augmented analytics.
You want people to be able to do exploration on their own as much as possible, instead of having to jump on a call and spend two hours explaining what you need, and then getting Excel files missing something you want. If everyone can use the query builder, the path builder, the funnel builder to build their own queries on an analytical platform, it can significantly improve your work efficiency and make it possible to operate remotely.
On the collaborative side, it's really hard to follow any analytics findings in Slack or email threads. This is because email or Slack is typically mixed with all kinds of messages and different branches of discussion are happening at the same time, making it extremely difficult to map everything together and validate the data presented there. When working remotely, all these issues and barriers are amplified. That's where I think our workspace collaboration is going to help. In Kubit, you never leave the tool - the data and your communication stay together. You can quickly understand and see the mind map or the thought process in another person's head. As such, you can do your own investigation before fully understanding the problem, and provide your insights.
All in all, I think that this whole situation simply makes augmented analytics, in particular self-service and collaboration, much more important factors in any company's data practice.
For people who are interested in augmented analytics, how do they connect with you or Kubit after this podcast?
On our website, we have a lot of information including blogs and videos to show demo of our products, as well as anyone can schedule a 15 minute demo directly with us by filling in a simple form with your email address.
We also offer free trials on our website, which is Kubit.ai. You can go there and sign up, we don't need your credit card. After that, we will guide you through the process of loading your data, and you will get the full feature without any restrictions or limitations. You can use it for 30 days free of charge.
On the other hand, you are always more than welcome to send me an email at email@example.com. Or you can send it to firstname.lastname@example.org to reach a wider audience. We are always here to help you out.