Are you “V” or “A”? What type of Data Scientist are you?

Are you more of ESTJ or rather INFP? If you ever went through MBTI personality test you probably know, what I am referring to. If you did not come across Myers-Briggs test, I strongly suggest you take the free test here, you will learn a lot about yourself (and better understand also main topic of this blog).

It has been a long time since first people found their jobs as data analysts (in 1980’s). Structured Query Language (or SQL as we know it now) gained in popularity heavily with advent of ANSI standardized rules. Most of the 1990 – 2010 period, you did not have to think about what sort of the analyst you are. The only data around (anyway) had been the structured data in databases or formatted data files. Processing data for e-shop was just about the same task as analyzing the utility data of water or gas consumption. That led also to golden era of data analyst: if you mastered SQL (and some statistics on top of that), you could easily swim between jobs or even industries.

Admittedly, I joined the labour force in this period as well, so when choosing the career path of data analyst, I did not crack my head too hard with thinking what kind of analyst I would like to be. After all, being analyst felt so generalist-ish. Almost like living in Ford-times and being perplexed by question, if you can drive car, truck or bus. Hey, man, what’s the difference?!

But After 2010 things got a bit different twist. As I regularly go to several expert conferences on analytics, I started to notice that strange feeling building somewhere in my background. I have seen colleague speakers to present their use cases of analytics and they felt somewhat different to what my team was mainly working on. I still understood the principles, I could replicate the projects, if asked to, but have realized that we deem those marginal and rarely pursue them in our company.

Few years down the road, and I was getting certain: Analytics is not a single continent, where you can move freely from one area to another. It is rather an archipelago of torn islands and tectonic forces push it pieces more and more apart. You still can see there is party on the other island, but you would need to get really wet to get there to take part in it. I started to investigate how many pieces has the analytics continent disintegrated into. Then, on one of my business flights, the article from Kai-Fu Lee in Nov 2018 Europe edition of Fortune nudged me. Wow, these are exactly the clusters, I have been thinking of! A bit different naming and some thoughts in another order, but it strongly converged with what I believed in.

Thus, I decided to wrap my older thoughts around ideas seeded by Kai-Fu Lee and came with VIBA, (most likely) the first data scientists’ persona model attempt, helping you to orientate where you stand on the scale of analytics. It is dwarfed by usefulness of MBTI (or similar) profiling, but gives you basic frame of where you are … And most importantly where you should go. It constitutes 4 different tribes of analytics, each living on its island, slightly moving apart from each other with every quarter of year. V-I-B-A namely stands for:

VIBA_VVisual & Voice analytics & Words analytics. Primarily attempting to detect patterns in images, video, voice and text. Building their models on unstructured data, that needs to be annotated to help the engine to learn the patterns.  Trying to make sense of perception-based data and to compete with (or even replace) the human senses.  Mostly working on different topologies of neural networks, tapping into tools like Keras, TensorFlow and NLP. If you work for this area, you encounter mostly objects, sound samples or text excerpts. Projects usually have longer leading session for exploring data and extensive training. Requirements of analytical progress are framed by Innovation teams or R&D departments.

Internet and Social media related. Striving to describe user flow or conversion in online apps, e-commerce or through interactions on social media. Analytics based mainly on streams of user generated data from clicks & purchases, often monitoring rather short time window of data.  VIBA_IRelying on Time series analysis, some forms of Machine learning or Graph database analytics. Supporting Online and Marketing departments of the larger companies or start-ups in these fields. To survive on this island, you need to be familiar with Google Analytics and/or APIs to major social media networks. Crunching and storing of data often happens in external cloud, analytical results are needed in (near) real-time basis. Projects have short span of time; their results often morph into permanent monitoring of the discovered patterns.

VIBA_BBehavioral Analytics. Serving requests coming from Traditional sales channels, CRM dept’s or Product management teams. Delivers predictive models about crucial business behaviour of the clients/users. Alternatively, clusters the portfolio of users or stipulates their life-time value. Inputs come in form of numerical or categorical features about clients, often still calculated from underlying relational databases. Features are rather aggregated, human suggested info about behaviour. Analytical effort on this island requires at least (6+) months of history to arrive at stable results. To operate in this area, you still need to master SQL and some of (maybe proprietary) analytical packages for regressions, decision forests or classification algorithms. Analysis barely happens in cloud, most of the data is stored on own premises. Projects happen in days, max weeks span, leading to regular, ongoing scoring of the user-base via developed models.

Automating & Autonomous. This type of analytics aims to generate rule engines and sophisticated models to drive decision in certain process, on request of Operations or R&D teams. Takes (streamed,) low level sensory data, coming from experiment or real-life readings. Tries to makeVIBA_A sense of them through multiple layers of own features. Using Deep Learning methods teaches machine to decide within stated error rate allowance. Data are either real-time of historical logs of some process or motions’ sequence. Uses opensource Neural network-based packages like TensorFlow, Keras or Caffe. Working in this area, you most likely come across human driven processes being handed over to machines. Analytics happen on/close to mechanical hardware or centrally in cloud for large span of parallel process iterations.

Do you already know, what is YOUR PRIMARY letter in analytics? To support you in this effort, I created a free VIBA test I suggest you take it right now, to assure (or surprise) yourself about it.

As mentioned before, V-I-B-A profiles are distinct, though you might be on edge of more than one “letters”. What is important to understand, that each VIBA letter has its own reason to exist. Similarly, to MBTI, there is no good or bad profile, just one more suited for some jobs as the another is for other job. You might be pleased with where VIBA profiles you or less so. As with personality test, you even might chose to change yourself from one VIBA island to another. Or stick harder to tribe you belong to now. In both cases it is essential for you to understand what constitutes to be a good “V”, ”I”, “B” or “A”. For that matter I am preparing (soon to come) a separate blog explaining what each letter should do improve their mastery and what skills one needs to acquire, if (s)he wants to hop to another VIBA island.

I am more than happy to receive on any feedback or learnings that you might arrive at while working with VIBA. If you want to share VIBA with your friends to compare if you are part of same tribe, feel free to pass on this blog post.  If you intend to use the methodology in your professional (or academic) work, please, pay tribute by properly citing authorship of this article.

Publikované dňa 22. 11. 2018.