FILIP’s 2023 READING SUGGESTIONS

As you probably know, I am heavy, heavy reader. To the point that for last 14 years in row I committed in my NewYear’s resolution to read at least 10 000 pages of books in given calendar year. So far, I failed only once in last 14 years. So, cross your fingers for my 2023 reading endeavors, please.

 

In order to really read 10K pages in year, one needs several things. But good reading list is unavoidable. With average book volume enclosing 300-400 pages (you can do the math) that one needs about 30 different books per year. In most of the years I tried to publish the books suggestions, you can revert back to Here, or there. This year I turned the table and I am sharing my reading queue in advance. Thus, feel free to get inspired:

 

Alex J. Gutman, Jordan Goldmeier  : BECOMING A DATA HEAD

https://www.amazon.de/-/en/Alex-J-Gutman/dp/1119741742

Decision to put this book on my reading list was stemming from the curiosity.  The book reviews suggest that this book is good entry-book for executive trying to be data-driven or AI-ready. Being SVP Data & Analytics (and seasoned Data Scientist) myself, hardly the fit for my career phase. But I have seen so many books claim (and fail) to introduce you to Data Science bushes, that I was tempted on how this book will be doing? Yet another flat-falling promise?

No, quite the contrary! This book really walks its talk. Namely walks you as user through different stages of Data analytics and Data Science smoothly. Even the basic concepts are explained in no-nonsense style that does not require any previous knowledge from you, but also does not insult (your intelligence) neither gets you bored, if ou are reading things already obvious to you. You can also decide how “far into the woods” do you want to dive and stop reading any time you think this is exactly the level of understanding that is enough for you. Or maybe you look even deeper to understand the principles of what you just read?

I strongly recommend this book for anybody trying to change career into data jobs. I find it also great present for any manager or executive if you want to enlighten them in data.

 

Daniel Vaughan: ANALYTICAL SKILLS for AI & DATA SCIENCE

Topic | Analytics, Data-driven, Decision making

Let me start the review with saying this book is really special for at least 2 reasons: 1]  Through out 20 years of my analytical endeavors I have seen people entering the DS arena either from business acumen and bubbling up their tech skills standards OR from solid tech university major try to close the business understanding gap. I read many books that try to serve the technical part of DS ingredients (business audience), but I have been long looking for book that tries to upskill the already technically savvy Data Scientist (or Analyst) with extra toolbox on How to be really useful with their supreme skills (in business). Daniel Vaughan manages to deliver one. Therefore, this book is really good read primarily for those friends with algorithms but frustrated about not making DS impact.

2] I stumbled on this book while (one of my regular) searching through titles that still miss in my library. I read the reviews and got intrigued (for reasons stated above in 1] paragraph above). But it was a bet. A good one. How did I realized that it is natural extension for my reading line? Do you know that feeling when you come to party and get introduced to new person with whom after talking the whole evening with (as you can’t stop talking to that person) feel like would have (or will be) your best friend because you have so much in common from past? Well then this kind of feeling I gained when reading though this Analytical Skills for AI & DS book. Most of the arguments mentioned got me nodding (in consent) and whenever author provides suggesting reading I found 8/10 of these suggestions already in my private book-shelf.

Long story short: this is great book for teaching/cementing skills in how to apply analytical skills (or even heavy duty Data Science, if you dare) to business problems. If you are technical person, this might erase “understand the business” from your up-skilling to-do list. If you are already busy savvy, this gives you clear idea what to ask from your more technical colleagues.  Here and there the author drops a heavier math annotation or snippet of Python code, but they are more of illustration and skipping them does not take your learning even inch lower. It is indeed amazing how smoothly Daniel Vaughan walks you through important concepts and principles, no matter what your starting literacy in Business, Econometrics, Data Analytics or DS algorithms is. Certainly worth the bucks.

Link | https://www.amazon.de/-/en/Daniel-Vaughan/dp/1492060941

 

Peter Pru : ECOMMERCE EMPIRE

Topic | E-commerce, Marketing strategies

This book got into my reading list over E-commerce community suggestion as one the must-reads.
After reading fully through, I honestly remain unimpressed. To be fair, book is really trying to encourage people to start/boost their e-commerce venture and doing it in easy explained and structured manner. It also has some interesting twist to #upsell strategies, that is not basic or first-thought type. It also has good chapter on competition research and initial product #assortment focus.

But overly “American” style of pushing for things, overselling own story as general truth and setting the #revenue (not CMx or profit) milestones as measures for success leave a bitter feeling. Also last chapter’s sudden pivot towards self-care and mental balance (as a lever to do e-commerce) is miss to the rest of the book content.

All in all, makes it a controversial choice. I was probably not a target group of this book (and will not become my book-present liebling either). But if you still want to start small e-shop, it´s probably worth few hours of read.

Link | https://www.amazon.de/-/en/Peter-Pru/dp/1736230905

 

 

Justin Grimmer, Margaret E. Roberts , Brandon M. Stewart: TEXT AS DATA

Topic | Data, NLP, Analytics

As the title of the book rightly suggests, text has been for long perceived as special “animal”. On the edge of the data analytics, much more obscure than analysis of the relational data by SQL or by Predictive analytics. Text analytics was also managed by dedicated (python) packages and often by NLP-specializing-only staff. If you were not one, you would probably just reach out for (simplified) predefined functions in NLTK (or similar code library).
Those times are over. Text is mainstream. If you were not convinced before ChatGPT burst, now there is no way to disprove it. But Text analytics still finds the audience (and practitioners) left in pre-text era, only having rough idea how to address data that is stored in troves of text.

Therefore, This book comes as a kind of gift. If you admit to be one of those having general (read limited) only understanding of insight extraction from text and how to set-up the text analytics in your team, if you have not been treating text equally heavy as ML or Reinforcement learning, this book helps you to close that gap. It’s well written and always illustrated on telling examples. If you missed to buy the ticket for departing text analytics “train”, this is your fast track to get on it.

Linkhttps://www.amazon.de/-/en/Justin-Grimmer/dp/0691207550

 

Edosa Odaro: MAKING DATA WORK 

Topic | Transformation, Data Engineering

Some books are easy read, you literally swing through them. Other are more sweat to finish. Edosa Odaro created a compelling data transformation back-bone in his Making Data Work. He really tries to explain how to approach moving from legacy data stacks to more contemporary mode of data operations and analytics. But …

The language chosen is very eloquent, but hard to read. As if the book should rather impress reader with author’s level of language use rather than want you to finish the book and take away lessons learned. Each chapter is narrated like story and should be fun to read. In theory. However, Edosa also starts the discussion from obscure Lehman brothers fall incident which leaves you puzzled for first 2-3 chapters on if the title of the book has not been (actually) mistakenly swapped with another book. The build up of the argument is slow in first parts of the book, you might feel several times like dropping the read without making it to the last page.

And there is another aspect to overcome. Author spent significant time in Finance and his views (on data innovation) match that environment well. However, they also indirectly reveal, how much financial institutions missed the (data) train in last decade. Many “issues” thoroughly debated on several pages are often smoothly resolved by other industries and thus drama in stories might seem a but “storm in the glass”. I was thinking how hard (and awkward) might the story be for people not having same financial background (which I fortunately did have), often probably confused why is this or that such a big deal.

But Making Data Work has some true thought gems (like well-reasoned proof that Silo thinking is actually desirable in Data & Analytics). Just they are well hidden in hay-stack of author’s memories and seek for noticeable language figure-skating. To finish this book is not an easy task. As much as I think it was worth the time, I also have to admit this is not book for everyone. So read on your own risk and be ready that first 30-40 pages you might not see the end of the tunnel.

Link | https://www.amazon.com/Making-Data-Work-Edosa-Odaro/dp/1032224436

 

Steve KrugDON’T MAKE ME THINK (Revisited)

 

Topic |  Design, Product analytics, UX

Web and App’s became our window of everyday activities, social interaction, shopping and most of of work (certainly so during COVID). In 1990’s and 2000’s institutions and businesses were trying to impress us by physical real estate. But how do us digital institutions treat now?

This book is for everyone, who wants to grasp the basics (yes, it is starting from ground) of how to design digital interface on web or app. Even though this might sound like UX designer guideline (which I was happy user if it was), it is really served in down to earth language and does not require from you any design domain knowledge. (but it leaves you with some after you read through).

It is not long read and I strongly encourage anybody interacting in our with Web and App’s (or have a say in their design) to at least skim through this. No regret move!

Linkhttps://www.amazon.de/-/en/Steve-Krug/dp/0321965515

 

Kate Strachnyi: ColorWise

Topic | Visualization & Business Intelligence

 

As somebody shaping (literally) thousands of visualization year after year, I welcome books describing the rules and good (and bad) practices for creating visualizations. I have few in my library (and suggested them in my previous reading lists), but they often talk more about what kind of graph to chose and how to shape the composition. Many of them take use of color for granted (or touch the issue only from the side).

The ColorWise is book giving “color choice” and “color coding” in graphs and visualization full spot-light.  It explains the background of colors in very non-academic way and surely taking you beyond your previous knowledge about color usage. It also gives clear guidance on how to create your graph color schemes, if you are anchored with some of the brand (must-have) colors. What is more, it goes also deeper into psychology of different color schemes and warns you about cultural or color deficiency pitfalls of your graphs. If you are already pro, you will often nod your head with “Exactly!” on your lips … and you still learn few new aspects to think about. If you are “regular” color user, your color coding skills will take significant boost. I strongly recommend for anybody , who needs to produce dashboards or presentations regularly in their work.

Link | https://www.amazon.com/ColorWise-Storytellers-Guide-Intentional-Color/dp/1492097845

 

Tony Fadell: BUILD

Topic | Career (in Tech) , Product building, Management

The shortest version of review for this book would be “Masterpiece!”. It is really transformation, as Gary Hammel’s or Jim Collins’ books. But if you need a bit more reasons to buy/read this, here is why:

Many admire Tony Fadell for what he achieved. He built iPod for Apple and basically saved Apple from falling. And then humbly he built iPhone on top. And if that would not be enough for you, then he built (from scratch) the brand new company Nest that started the whole SmartHome category of technology and sold it to Google for few billions. So certainly inspiring person enough. But if you are not a tech geek, you probably did not hear his name before or care too much. Nor did I. And I regret so.

His book BUILD is interesting mixture of advice and guidance for people who want to have their life (and career) a bit more in their hands. He narrates the story from the adolescence through earlier years in job up to CEO-part of your life. And yes, maybe you will never (want to) be CEO, but the story is still a good guidance. He tells how to think about first job(s), what to look for, what to avoid. Are you transitioning from expert to your first managerial role, there is great chapter (two) for it. Do you dream to launch start-up, there is solid story on how to make it more likely to happen. Did you start the company? Well, here is the chapter on how to organizationally survive growth from 5 to 1000 employees.  Your company has a first version of product built and does not know what version 2 and 3 (and beyond) should be about? Fadell tells clearly how to look for that pathway. It might sound fluffy, but whoever you are in business, I am quite sure you can take some benefit from some chapter of this book. Yes, occasionally you have to pardon him Tony’s american optics, but the smell of it is more like fragrance you know, but would not wear yourself, not a sensoric disgust.

I especially admire a chapter on how data plays different role in building individual phases of the product. It gives you clear idea guidance on where data is horse and where it is (still needed but rather) cart. Going through 3 layers of management (Team Lead to SVP) myself, I can confirm that his views of how to perceive your role(s) is very accurate and I was amazed how he can compress the essence into (often just) few pages of the text. All in all, this book is Masterpiece (uh, I told you that already, right? 🙄). And I strongly suggest you to read it. The earlier the better. Because some of the lessons he gives I had to learn hard way and I only wished he wrote that book earlier. Have a great read!

Link | https://www.amazon.de/Build-Unorthodox-Guide-Making-Things/dp/1787634108

 

Ludmila Filipova: THE PARCHMENT MAZE

https://www.amazon.de/-/en/Ludmila-Filipova/dp/1483969444

<review to come soon>

 

 

 

 

Juan Enriquez: RIGHT/WRONG | HOW TECHNOLOGY TRANSFORMS OUR ETHICS

https://www.amazon.de/-/en/Juan-Enriquez/dp/0262044420

<review to come soon>

 

 

 

 

Robert M. Sapolsky: BEHAVE | The bestselling exploration of why humans behave as they do

https://www.amazon.de/-/en/Robert-M-Sapolsky/dp/009957506X

<review to come soon>

 

 

 

 

Patrick Gilbert : JOIN OR DIE | Digital Advertising in the Age of Automation

https://www.amazon.de/-/en/dp/1632217686

<review to come soon>

 

 

 

 

Neil Hoyne: CONVERTED | The Data-Driven Way to Win Customers’ Hearts

Topic | Data-driven, Business, Marketing

Many progressive companies try to be (or declare themselves to be) Data-driven. As much as it is music for the Data team’s ear, with all honesty, it is often more aspirational “badge” than reality.

Because, as with many other phenomena, being data driven is more about what you do than what you declare. Neil Hoyne’s book is, in this regard, a nice mirror to look into. He takes the process of running the company (from client acquisition to profit cash-in) and topic by topic challenges you, if you really do it data based. With full disclosure sometimes I don’t agree with arguments to illustrate that, but his down-to-earth, no-bullshit zoom-out on business processes using data or not is admirable and appreciated.

This book is great gift for mid- and top-managers, who you want to inspire to take their business steering to (sustainable) higher level. It’s short read that any of the leaders can squeeze-in. It is also great read for data professionals who want to (finally) win the trust of their business leaders for plugging in data into the crucial decision making. I would not be surprised if this became one of the “must reads” for managers in years to come. Hence my suggestion for you to take a lead 😉

Linkhttps://www.amazon.de/-/en/dp/0593420659

 

 

Chuck Hemann, Ken Burbary : DIGITAL MARKETING ANALYTICS

https://www.amazon.de/gp/product/0789759608

<review to come soon>

 

 

 

 

 

Once I finish the books, I will write short review, so that you can be double sure, if worth the reading time for you. For all those non-reviewed suggestions, don’t be shy to take a bit of the reader’s risk together with me 🙂

Disclaimer: Please note that the links to the Amazon are without any referral id and I am not receiving any kind of commission or kick-back for whatever you chose to purchase. I am attaching the links here just to arm you with place to research more about the book.

Did ChatGPT pass Data Science technical interview?

On last day of November 2022, bit in the shadow of the Cyber week craze, there has been released by OpenAI team for free testing the new ChatGPT. It is aimed to be an chat-bot using strong GPT 3.5 natural language model, capable of not only casual conversation but also able to answer real (even tough) expert questions, or write creatively texts, poetry or even whole stories.

As the features (and performance) of the model are pretty awesome step-up to what we have seen so-far, its launch immediately rolled the snowball of testing it in plethora of the domains. The craze seems to be actually so intense, that it is believed to be the first digital tool/service to reach 1 million of new users within 5 days of its official release. (To be fair, I think it is the first recorded one only, I am quite sure that in countries like India or China it is not unheard of gaining 1 mil users fast for something really catchy 😊)

But back to core story. The ChatGPT use-case, that was bringing the most havoc on LinkedIn and many blogs and news portals, is fact that can produce real snippets of code based on very simple specification of what the code should do. You can go really like “Show me code to predict survival rate on Titanic” and it returns in snap the Python code to fetch the data, create predictive model and run it, all in gleaming, well commented Python coding language. Or so it looks.

In effort to create my own opinion, I tried (and collected others’) attempts on coding inquiries to investigate the real quality of the code. I made a short summary of this early investigation in this this LinkedIn post. Tl;DR = it was not flaw-less code; if you try to run it, you will still often stumble upon errors, BUT … For somebody not having a clue how to attack the problem, it might be more than an inspiration.

 

Few days later, my dear friend (and former colleague) Nikhil Kumar Jha came with the idea to ask the ChatGPT one of the technical interview questions he remembered from the time I was hiring him into my team. He passed me the question and answer in message. And I have to say, the answer was pretty solid. That made my mind twisting. So, we quickly agreed to take the whole battery of the test that I use for technical interview for Data scientist and submit the ChatGPT “candidate” through the whole interview hassle. Rest of this blog tries to summarize how did the robot do and what are the implications of that. But before we get there: What do you think: Has the ChatGPT passed the technical round to be hired?

Technical interview to pass

Before jumping into (obviously most) juicy answer to question at the end of previous paragraph, let me give you a bit of the context about my interview as such. The market of the Data Scientist and Machine Learning engineers is full of “aspirational Data Scientists” ( = euphemism for pretenders). They rely on the fact that it is difficult to technically screen the candidate into details. Also the creativity of the hiring managers to design very own interview questions is relatively low, so if you keep on going to interview after interview, over several tens of rounds you can be lucky to brute force some o them (simply by piggybacking on the answers from failed past interviews).

To fight this, I have several sets of uniquely designed questions, that I rotate through (and secret follow-up questions ready for those answering the basic questions surprisingly fast). In general, the technical round needs to separate for me the average from great and yet genius from great. Thus, it is pretty challenging in its entirety. Candidate can earn 0 -100 points and the highest score I had in my history was 96 points. (And that only happened once; single digit number of candidates getting over 90 points from more than 300 people subjected to it). The average lady or gentleman would end up in 40 – 50 points range, the weak ones don’t make it through 35 points mark even. I don’t have a hard cut-off point, but as a rule of a thumb, I don’t hire candidate below 70 points. (And I hope to get to 85+ mark with candidates to be given offer). So now is the time to big revelation…

Did the ChatGPT get hired?

Let me unbox the most interesting piece here first and then support it with a bit of the details. So, dear real human candidates, the ChatGPT did not get hired. BUT it scored 61 points. Therefore, if  OpenAI keeps on improving it version by version, it might get over the minimal threshold (soon). Even in tested November 2022 version, it would beat majority of the candidates applying for Senior data science position. Yes, you read right, it would beat them!

That is pretty eye-opening and just confirms what I have been trying to suggest for 2-3 years back already: The junior coding (and Data Science) positions are really endangered. The level of the coding skills needed for entry positions are, indeed, already within the realm of Generative AI (like ChatGPT is). So, if you plan to enter the Data Science or Software engineering career, you better aim for higher sophistication. The lower level chairs might not be for the humans any more (in next years to come).

What did robot get right and what stood the test?

Besides the (somewhat shallow) concern on passing the interview as such, more interesting for me was: On what kind of questions it can and cannot provide correct answers? In general, the bot was doing fine in broader technical questions (e.g. asking about different methods, picking among alternative algorithms or data transformation questions).

It was also doing more than fine in actual coding questions, certainly to the point that I would be willing to close one-eye on technical proficiency. Because also in real life interviews, it is not about being nitty-gritty with syntax, as long as the candidate provides right methods, sound coding patterns and gears them together. The bot was also good at answering straight forward expert question on “How to” and “Why so” for particular areas of Data Science or Engineering.

Where does the robot still fall short?

One of the surprising shortcomings was for example when prompted on how to solve the missing data problem in the data set. It provided the usual identification of it (like “n/a’, NULL, …), but it failed to answer what shall be done about it, how to replace the missing values. It also failed to answer some detailed questions (like difference between clustered and non-clustered index in SQL), funny enough it returned the same definition for both, even though prompted explicitly for their difference.

Second interesting failure was trying to swerve the discussion on most recent breakthroughs in Data Science areas. ChatGPT was just beating around the bush, not really revealing anything sensible (or citing trends from decade ago). I later realized that these GPT models still take months to train and validate, so the training data of GPT is seemingly limited to 2021 state-of affairs. (You can try to ask it why Her Majesty Queen died this year or what Nobel prize was awarded for in 2022 in Physics 😉 ).

To calm the enthusiasts, the ChatGPT also (deservedly and soothingly) failed in more complicated questions that need abstract thinking. In one of my interview questions, you need to collect the hints given in text to frame certain understanding and then use this to pivot into another level of aggregation within that domain. Hence to succeed, you need to grasp the essence of the question and then re-use the answer for second thought again. Here the robot obviously got only to the level 1 and failed to answer the second part of the question. But to be honest, that is exactly what most of the weak human candidates do when failing on this question. Thus, in a sense it is indeed at par with less skilled humans again.

How good was ChatGPT in the coding, really?

I specifically was interested in the coding questions, which form the core of technical screening for Data Science role. The tasks that candidate has to go through in our interview is mix of “show me how would you do” and “specific challenge/exercise to complete”. It also tests both usual numerical Data Science tasks as well as more NLP-ish exercises.

The bot was doing really great on “show me how would you do …” questions. It produced code that (based on descriptors) scores often close to full point score. However, it was struggling quite on specific tasks. In other words, it can do “theoretical principles”, it fails to cater for specific cases. But again, were failing, the solutions ChatGPT produced were the usual wrong solutions that the weak candidates come with. Interestingly, it was never a gibberish, pointless nonsense. It was code really running and doing something (even well commented for), just failing to do the task. Why am I saying so? The scary part about it that in all aspects the answers ChatGPT was providing, even when it was providing wrong one, were looking humanly wrong answers. If there was a Turing test for passing the interview, it would not give me suspicion that non-human is going through this interview. Yes, maybe sometimes just weaker candidate (as happens in real life so often as well), but perfectly credible human interview effort.

Conclusions of this experiment

As already mentioned, the first concern is that ChatGPT can already do as good as an average candidate on interview for Senior Data Scientist (and thus would be able to pass many Junior Data Scientist interviews fully). Thus, if you are in the industry of Data analysis (or you even plan to enter it), this experiment suggests that you better climb to the upper lads of the sophistication. As the low-level coding will be flooded by GPT-like tools soon. You can choose to ignore this omen on your own peril.

For me personally, there is also second conclusion from this experiment, namely pointing out which areas of our interview set need to be rebuilt. Because the performance of the ChatGPT in coding exercises (in version from November 2022) was well correlated with performance of human (even if less skilled) candidates. Therefore, areas in which robot could ace the interview question cleanly, signal that they are probably well described somewhere “out in wild internet” (as it had to be trained on something similar). I am not worried that candidate would be able to GPT it (yes we might replace “google it” with “GPT it” soon) live in interview. But the mere fact that GPT had enough training material to learn the answer flawlessly signals, that one can study that type of questions well in advance. And that’s the enough of concern to revisit tasks.

Hence, I went back to redrafting the interview test battery. And, of course, I will use “ChatGPT candidate” as guineapig of new version when completed. So that our interview test can stand its ground even in era of Generative AI getting mighty. Stay tuned, I might share more on the development here.

 

Older articles on AI topic:

AI tries to capture YET another human sense

Want to learn AI? Break shopping-window in Finland

REMOTE LEARNING now on AI steroids

5+1 interesting AI videos

 

 

A/A test? No typo, it really exists!

Everybody knows A/B test, as it became essential tool for exploring new user preferences or patterns and also a way to systemically innovate through chain of (managed) experiments. But A/A test, seriously? Yes, this is not typo (after all on QWERTY keyboard A and B are quite far-away from each other 😉 ) and there are good reasons for this test to exist (and being used). But let’s get to it step-by-step …

 

Ups, it happened again …

Imagine you had a glitch in system that led to sudden (maybe even undesired) A/B test of the customer experience. Some people received the service as expected, while others have been cut off from this element of experience. You do a post-mortem and you see that this positively influenced their shopping behavior. However, as this was a complicated glitch, you can’t easily replicate the test to find out if actually switching off the experience element completely would be a better long-term strategy. I mean, it was still valid A/B test, but as the assignment to groups was not controlled you are not sure if the up-tick in B-group shopping behavior can be attributed to the change in experience or just skewed (not random) assignment of user sample by glitch itself. How would you tell?

 

 

Wow, really?

Your existing A/B testing platform had fallen behind the curve and thus you decided to shop for alternative solution. After implementing the new tool, all of the sudden some of your experiments start to show significantly larger gaps between test and control group. Some of the gains are almost too good to be true. But you already cancelled your previous software subscription, so you cannot replicate the same experiment on previous platform any more. How would you find out if all of the sudden your campaigns started to work better or just new tool is “wired differently” to run the tests?

 

There is a way

As you can read above, both of the depicted scenarios are stemming from real life of online marketers. I bet you might have experienced some variation of those firsthand.  But luckily, for both there is solution, you don’t have to throw away the data and start all over the again. And, yes, the solution is indeed our mysterious A/A test. So how does it really work?

 

A and A (again) , seriously?

The original idea of A/B test is quite simple: If mutually comparable groups of users are subjected to different treatment (and this treatment is the only substantial difference they have), if any of the groups behaves significantly differently as a result of the test, then there is high probability (yes, don’t forget it is still probabilistic finding) that these change in experience and change in behavior are linked to each other. The major vulnerability of this experiment, unfortunately, is the assumption that user samples A and B are really comparable. So what happens if we already have result of A/B testing but we have no proof/info how correctly the selection into groups happened?

Well, luckily this logic of the experimenting works also the other way around. If we have two groups undergoing identical treatment and resulting in also same behavior, we have high probability (yes, probabilistic here again), that the groups have had similar user distribution as well.

Therefore, if we have two groups and want to find out if they are some-what similar in user mix, we can subject them to same treatment and watch if they receive significantly similar result on high confidence level. And that exactly is the essence of our A/A test. Here the A+A signifies not that same group was used twice, but that different groups (in original experience A- and B-group) are subjected to same treatment in second experiment. This way we can try to “learn something about similarity’ of the groups ex post, already after completed A/B test.

 

Using A/A test is thus easy way to double check (or compensate for lack of) initial user assignment. Please note that while A/B test uses sample similarity to point out behavior difference, it’s A/A cousin uses (absence of) behavior difference to point out samples similarity. That also means that A/A does not:

  • say anything (additional) about the strength of behavioral difference from original A/B test, nor it serves as any prove of it. (if the difference in final behavior between A/B groups was statistically insignificant, it remains so even after successful A/A test).
  • prove general similarity of original A and B groups in general. It signals just similarity for behavior(s) relevant for original A/B experiment.
  • generate any new insights about the users (which is often contested by opponents of A/A as waste of testing capacities).

 

The main value added of the A/A testing is that it is possible to run (almost) no matter what the original experiment was and is easy to set-up. After all, you just need to wire the groups to same branch of the process. Therefore, A/A test is a quick remedy to “unusual” set-ups/hick-ups in proper A/B testing.

Its simplicity, of course, comes with some controversy. Some practitioners argue against A/A tests as not being the most robust way to prove A/B groups similarity (heavy multinomial distribution comparisons are), being not fully possible to rerun (e.g., if original B-group condition altered long-term perception of service) as well as being opportunity (or real) costs of not running other experiments instead.

I am far from promoting A/A tests as silver bullet, evade it if any of the (above mentioned) counter-arguments are very true for your own situation. However, the mere A/A tests existence and its proper set-up should be part of your toolbox; May the situation turn it to be the cheapest (and quickest) way to heal the experiment (improper) set-up. Especially so, if you are to assess the results of A/B tests conducted by somebody else before.

SURPRISING CLUBHOUSE AUDIENCE INSIGHTS (you have not seen yet)

Do you love Clubhouse? But would you also appreciate a bit more data on how this social media tick? Well, then here you are. Read on for some insights you most likely have not seen so far.

The project MEASURING CLUBHOUSE is get-together of Data analysts and scientist, grouped under THE MIGHTY DATA CLUB who happen to enjoy Clubhouse, but also see lack of serious data about this social media. Hence, we decided to run several deep-dive studies on Demographics, User behavior, Clubhouse rooms and their dynamics. All in all, the team generated already deck with more than 120 slides. Though the outputs serve primary to help local League of  Club Owners, there are quite a few slides that might be eye-opening for general public. Let’s walk you through selection of them.

[ If you are interested in learning more about the project or you would like to use/cite some of the findings, don’t hesitate to contact us on info@mocnedata.sk or directly on Clubhouse:  @FilipVitek ]

HOW DID CLUBHOUSE GOT VIRAL (beyond USA)

 

EARLY ADOPTERS CAME FROM 4 MAJOR AREAS

 

DEMOGRAPHICS – HOW TO GET TO THE HOLY GRAIL OF MARKETING

 

DEEPER LOOK REVEALS CLUBHOUSE’S  CUMULATION IN METROPOLITAN AREAS

 

THERE IS ACTUALLY QUITE A LOT TO KNOW, IF YOU TRY (HARDER) …

 

 ROOM DYNAMICS STUDY: CLEAR PATTERN OBSERVED …

 

DOES MODERATOR ACTIVITY REALLY PAY OFF ? 

 

SPEAKING OF CH ROOMS: HOW IN ADVANCE DO WE PLAN THEM?

 

… AND HOW DO WE NAME/DESCRIBE THEM?

CLUBHOUSE DATA – FIRST INSIGHTS

This is showcase of the MEASURING CLUBHOUSE IN CEE project yielding first real data about user preferences and behavior on Clubhouse. If you are interested to learn more about the project do not hesitate to contact the authors at info@mocnedata.sk or directly at Clubhouse:

Author’s profile on Clubhouse = https://www.joinclubhouse.com/@filipvitek  | TheMightyData Club profile = https://www.joinclubhouse.com/the-mighty-data-club

Let’s get to sample data graphs extracted about Clubhouse userbase:

HOW CLUBHOUSE GOT VIRAL 

DISTRIBUTION OF USER INTEREST (as declared in BIO)

 

(SLOVAK) CLUBHOUSE USERBASE DEMOGRAPHICS

 

CLUBHOUSE IS CLOSEST TO LINKED-IN

 

IN ROOM DYNAMICS : MEASURING THE TOTAL UNIQUE VISITORS

MEASURING CLUB REACH

CLUB REACH vs FOUNDER’S REACH dynamics

 

IF YOU WANT TO BE GLOBAL PLAYER …

 

UNCLEAR ABOUT CLUB AUDIENCE STRATEGY

 

 

 

 

 

WHICH SERVICES WILL NOT SURVIVE CLUBHOUSE ENTRY?

If you are reading this blog post I probably don’t have to introduce you to the belief that Clubhouse, new audio, social network and culture phenomenon will be big. May or may not you be Clubhous-er yourself, you still going to be affected by (what some might call the 3rd wave of) audio fascination. How exactly that is possible is what I would like to walk you through in next lines.

Many do attribute the sudden spike in Clubhouse (CBH) userbase to COVID pandemic situation. To be fair, lock-downs (and alikes) certainly do play into cards of virtual, over distance chat app. But to largely dispel this simplifying argument, let me point out that Clubhouse emerged in March 2020 and it took 3 waves of COVID (until Jan 2021) for CBH to gain the traction. What is more, Facebook released in Q2 2020 the CatchUp App (in many aspects mimicking the CBH functionalities) and silently hushed it under table after few months of, ehm, no real interest. Thus, for those betting on Clubhouse will be dead with civilization walking out of the pandemic, I would suggest to rethink the fundaments.

What exactly is the CBH’s value added?

I don’t want to bother you with CBH Newbie course on what functions this audio social network does (or does not) offer, as there are comprehensive sources along this line. However, for debating what will happen to neighboring services,  we should understand what extras CBH bring to the table:

  • Democratizes Audio podcast creation. You may argue that there has been already a plethora of tools to kick-start your audio podcast host career. But none of them has been as simple as bare unmute button in your CBH app. Yes, with Clubhouse your threshold to generate mass audio content is down to smartphone ownership.
  • Summon the legend. You can certainly assemble your prominent guest into podcast studio, but only if you are close enough not to make the travel-in-travel-out time investment not ridiculous compared to actual interview air time. With CBH you can have as little as minute or ten of VIP without her/him making single step out of their apartment. What is more, VIP’s participation can be summoned from the up, by single + sign and the other side has option to politely turn the invite down. That cuts speaker arrangement to literally seconds.
  • Enables instant room in crowded place. One of the less evident, but striking features of CBH is fact that you can call townhall meeting on any topic any point of time. Sure, before COVID you could have taken amplifier and start shouting on the square. The difference is that CBH emulates the same thing, but with the “square” being 24/7 crowded by people. Let me illustrate this with my very own experience: I took part in room hosting panel debate. As the panel came to its end, I opened instant room with title “How did you like this debate” and within 5 min had 2/3 of the original debate headcount in my room exchanging thoughts on how panelist actually did. (just to put that into context my CBH followership has been back then 1/100 of most of the panel debate speakers)
  • Immediate, short life content. Some might consider this detriment while others genius of CBH, but all the audio content generated here is expected to die with room being closed. Yes, there is the recording option heatedly debated, but the original DNA of Clubhouse seems to be rather instant gratification. With podcasts you can pause or reload later. If you see your star talking on CBH you better rush to hear what (s)he has to share. Or you might regret it. And that is magnetic dimension of the CBH content.
  • Authority is earned, not given. As literally anybody can be CBH host, over the time, the CB authority will be earned. You can be villager with internet signal barely enough to run CBH and challenge the best journalist Pro in interview skills. Sure, only if you can. But what if you can for real?

There has been millions of articles and blog posts celebrating (or downplaying) the essence of CBH and I do not intend to queue for your attention on this. However, I have been personally lacking is a bit more strategic, forward-looking thoughts on who will benefit, suffer (or down right die) from CBH entry to the scene. As much as there surely might be some wild implications (that I do not dare to speculate about), I think a fair analysis of what impact on services neighboring with CBH would not hurt. Let me, therefore, take one-by-one into spotlight and assess how much of the opportunity or threat CBH might account for.

CBH vs. PODCASTS

If I was asked to describe CBH without using words audio and social media, I would probably wrap my explanation into Podcasts. That is because, from all the thigs of the World, audio podcasts are (both in form and benefit for end-user) probably as close to CBH as you can get. One might think of CBH being actually “just extension of podcasts” (which I would deem mistake). So what exactly is CBH likely to summon on its nearest podcast sibling?

As much as they are similar, the main difference between CBH and Podcasts is that they are on-demand. Meaning, you can choose time when you listen to podcast, you can rewind it or re-listen if you didi no catch some notion fully. That is supreme to CBH in which all is gone in a second and you can only listen to it when it happens. On the other side, CBH tops podcast on interactivity.

EXPECTED CBH IMPACT: Podcasts will be negatively hit in short term (as people devote more time to CBH  leading to cut in podcast consumption). But in mid-term they recover mostly, leading to split and specialization of hosts towards either more interactive (CBH) or more structured and condensed (Podcasts) gigs. And there surely will be a lot of hybrid formats in between (like recording CBH rooms for podcast format or podcast hosts experiment with more interactive rooms)

CBH vs. MEETUP

If you look for exemplary victim of the CBH rising, then Meetup platform is the one. The simple reason for this bold statement is that CBH is something like superset of Meetup functions (whose mission is mainly to bring to attention of sizable (and likeminded) crowd to event happening). And offering this as-a-service to B2B customers mainly.  Well, CBH can do all that Meetup was doing and it can actually host the very event itself. And its all for free (at least for now). Meetup’s outlook got already doomy with pandemic cutting the option of meetups take physically place. In respond to this, Meetup took the only logic step to motion towards virtual events. And .. BANG! That was the trap drop of CBH. I would be shocked if Meetup ever recovers from this.

EXPECTED CBH IMPACT: Killed by CBH

 

CBH vs. FACEBOOK

When assessing the face-off with giants of social media, it is a bit of the moving target shot. Facebook has proven already early Feb 2021 that it does not intend to sit and wait for CBH to eat into its kingdom (and profits attached to it). So commenting on clash of these two forces is bit more like chess match commentary. However, Zuckerberg indicated that CBH is deemed to be serious foe and will face all the competitive treatment from Facebook. From my point of view, CBH can only claim the “bored so scrolling through FB feed” part of Facebook screen-time. Networking, direct messaging, sharing video and pictures, that all is missing in CBH. On the other hand, until CBH gets Android resolved, Facebook can effectively counter-strike with own audio-chat platform carving enough breathing room for itself in audio social media space. EXPECTED CBH IMPACT: Only minor decrease in Facebook activity.

 

CBH vs. Radio

Assessment of traditional radios’ hopes is looks a bit more difficult on first sight. But if you strip out layers of historical sentiments, picture resembles the CBH vs. Meetup stand-off in sense of CBH can offer everything that FM radio has to offer. Not surprisingly, there are already 24/7 airing radio formats on CBH running. Therefore, the fight with CBH boils down to 2 factors: A] the coverage of Internet signal vs Radio signal ; and B] IT literacy of the radio audience. Thus, yes, in some geo areas Radio will maintain monopoly and also true that senior listeners are less likely to appear on CBH platform. But you can smell this holds true only to certain time. Because on both dimensions the development is running against the traditional radio odds. EXPECTED CBH IMPACT: Gradual but severe decline.

 

CBH vs. TRADITIONAL CONFERENCING

Conference (tourism) is yet another industry with nothing-to-envy fate. Completely trashed by 2020 pandemic, on its needs and now facing serious threat or replacement by ap like CBH. Without too much further due, let me comment that traditional (and virtual) conferencing is facing “Embrace Change or Bleed Out” dilemma. For those seeing opportunity in CBH (after all you spare all the rooms rent and equipment rental and people even eat they own food along), they might revive on/with CBH. But for majority of the others, this is probably pretty bleak game-over.  EXPECTED CBH IMPACT: Devastating damage.

 

CBH vs. YOUTUBE (and other video streaming)

The most complex (and thus most difficult to evaluate) is the relationship of CBH to video streaming services. Upon shallow consideration, CBH should not dare to think of hitting Youtube. After all, Youtube videos have all the audio content + video (and its monetization) on top of it. Also many YouTubers have vast influencer power which they don’t easily let go. I have to credit Miroslav Petrek for interesting twist to thinking about Youtube vs. CBH match. He pointed out that much of the Youtubers’ content is actually mere audio with some background theme. Thus, yes, the sexy faces or figures of Youtube host might not have chance to shine on CBH room, but does it really matter that much? And does the fact that I can talk in person to my star (that I would only have to passively listen to) outweigh this shortcoming?

EXPECTED CBH IMPACT: Some damage will be done, as the influencer sound-including content monopoly of YouTube {and its cousins) will split among audio-only and audio+video. And there will be also rise of Clubhouse-influencers damming a bit Youtube’s role in this area.

 

CBH vs. LinkedIn

LinkedIn as predominantly the professional social network stands  a bit aside from the main battlefield of CBH entry. Also the use-cases and value added of networks like LinkedIN (or German speaking Xing) has little overlap with CBH features. Therefore, we can assume CBH will have no means to challenge LinkedIn’s positioning of “place to look for talents” (or jobs which is just other side of the same coin). Though to be fair, recruiters and headhunters are pretty populated among early adopters of CBH. However, whatever (speaker) reputation you manage to build on CBH you might need to cement this into resume readable/usable in recruitment process. So in countries where CBH userbase will surpass LinkedIn account penetration, CBH rise might even (slightly) facilitate the growth of LinkedIN. As a side note, the HR services industry’s interest into CBH might be well fueled by chance to both scout (or showcase) the potential candidate for hiring organization. So be careful with answering the CBH questions, as some of them might come from you potential boss. EXPECTED CBH IMPACT: Positive synergy with CBH growth

 

CBH vs. TikTok

Besides President Trump’s effort to weir it, TikTok was (deemed to be) the next big  thing on social media scene. With its unique short term video content appeal, it was gaining traction with mainly younger cohorts of the internet population. So how does CBH rise impact this? Well, video still trumps audio, but we debated that this might be myopic way looking at this already in YouTube discussion above. Also TikTok is more one-way communication prone and hence you need to ask: Would you , as teenager, try to win heart of your dream girl with trying to hook her on funny videos or rather chose option to talk her into dating in private CBH chamber? Attention of once-to-be-adults is flickering. And CBH (also officially forbidden for under-aged, or exactly because of it) is fresh breath in direction from TikTok. EXPECTED CBH IMPACT: Some damage, diverting the attention of TikTok’s key target group.

 

CBH vs. Telecoms

This comparison pair might surprise you, because CBH actually does not claim any foothold in telecommunication arena. Well, but take 2 steps back and think about it again. Isn’t the CBH room just a group conf-call? Don’t get me wrong, I am not trying to suggest that CBH might be threat to Telco’s, quite the contrary, as Telecoms are on pathway to walk away from voice calls anyway. So they don’t mind any other platform morphing yet another (minor) portion of voice traffic into internet packets. What is also important to note that given the choice between streaming video content and audio content, the latter is certainly less bandwith drag. Ultimately massive adoption of CBH would help to take some download stream drain and lead to customers more happy with capacity at hand. CBH thus might indirectly help to make the cell network more stable and satisfying.

EXPECTED CBH IMPACT: Minor positive impact

 

CBH vs. Zoom

Working for TeamViewer, remote work and teleconferencing software provider, I can tell you that sizable part of the video calls either abstain from or would not mind move from video call to audio-only meeting. After all, this was how the things had been done before the internet boom. In COVID times the video extra is often warmly welcomed extra, but when CBH-like service is still acceptable fall-back option. Therefore, CBH will probably not endanger (or slow down) the growth of Zoom, TeamViewer or MS Teams. But it can cut into low-end tiers of the paid customer base all of the sudden having yet another free service option to consider. What is more most of the pricing strategies of videoconferencing step/up with number of participants allowed (for given price tier). 5000 participants for free sounds like “hole into ship” of some of those business models. However, this all can only happen if CBH mushrooms itself into all relevant mobile phone platform. In its current state (lacking screen sharing or presentation mode), CBH is not too much of a business threat for video conferencing.

EXPECTED CBH IMPACT: Minor negative impact

 

Summing up THE LANDSCAPE

We managed to debate quite a few stand-offs involving Clubhouse and other neighboring services. To put that all into single context following infographic should give you transparent overview of where the blood and where the Champaign will be running on CBH staircase:

 

 

Author is, together with Gabriel Toth founding member of “DATA on SLOVAK CLUBHOUSE COMMUNITY” project and is active Clubhouse-speaker under nick @FilipVitek.