What (should) good Feature engineering look like?

We are living in era where Data analytics upgrade form mere main stream small talk topic to key business driver. Almost everybody tries to get better in Advanced analytics, AI or other data-based activities. Some do so from sheer conviction of its importance, some are forced by group think or competition moves. Let’s admit it, for most organizations and teams this is quite leap. So they, …

Leaning on methods

… lean on whatever literature or manuals there are around. I feel strange mixture of sympathy and disdain for them, while almost all self-learning courses and books on analytics primarily focus on methods of Data Science. You can read on which algorithms are suited for which tasks, how to program them or use open source packages to calculate them (as black boxes) for you. If you dwell in analytics for a bit more time, you probably smile already and know where I am heading with this point. Large part of the novices (and late- or returning-comers) fail to understand that the HOW (what AI method) is getting more and more commodity. Thus, the more you try to specialize in those categories the less prospective your future in analytics might be. Don’t get me wrong, I am far from suggesting that new data Scientist should (or need) not to understand the methods behind, quite the contrary. My point is that trying to get better at writing logistic regression or training decision tree is like trying to get better in potatoes digging in era of tractors. Machines got so good in applying the analytical constructs that it becomes Don Quijote-ish to fight with them. Algorithms are becoming commodities, even commodities that are free there in community. So where shall you channel your skill improvement in analytics?

Where to go?

From all the areas that are needed for successful analytics (see here), Feature engineering seems to be the most promising bet in attempt to make significant impact. While it is the first step of the analytical funnel, whatever defects (or data loss) you introduce there is carried over the whole analytical process. It is proven by Kaggle competitions that informational disadvantage is almost never compensated by “smarter” algorithm down the road of model training. What surprises me, though, that in ever mushrooming inundation of books and analytics, you find very little on Feature engineering. Ironically, it is difficult to stumble even upon a definition of what feature engineering should and should not do. Not even talking about best practices in this area.

That is exactly the reason I decided to sum up in this blog shortly what my experience in Machine Learning suggests that Good Feature engineering should include:

1] Extending the set of features. Whenever you start to build some advanced analytical model, you only start with some group of raw parameters (= inputs). As the Features are ultimately what the food are for humans, your live is much lively if you have both enough and variety of the food. To achieve that in analytical efforts feature engineering need to address make sure that you “chew” the raw inputs and get some aggregated features as well. For example, you might be having history of customer purchases, but it is also important to calculate what is the lowest, usual and highest ever amount purchased. You also would find useful to know if the customer was buying certain part of the year more or what is the time since his/her last purchase. These are all aggregates. As you can probably smell from the examples, these are often standardized (like MIN, MAX, AVG, …) and foreseeable steps to take, so good Feature engineering should include automated generation of the aggregates. However, besides the obvious aggregates, one needs also to create new pieces of information that are not directly represented in raw inputs. When you are trying to predict interest in buying ice cream for particular person, it is certainly interesting to know for how long this person has been already buying ice cream products and what is total consumption of it. But if you need to predict consumption in certain short time frame, the ratio of ice creams per week would be more telling then the total consumption metrics.  That is when it comes to transformations, cross-pollination of features and completely new dimensions. Therefore, good feature engineering should extend the set of features BOTH aggregated features and newly derived features.

2] Sorting out the hopeless cases. Plenty is certainly good start, because from quantity you can breed quality. Being open in step one but also means you will have a lot of hopeless predictors that obviously (from its design) can have little impact on prediction. There are many reasons, why some features should be weeded out, but let me elaborate on one very common pitfall. Imagine you have 5 different sofas from spartan one to really fancy ones. Your model is to decide which if the sofa versions will be most appealing to the customer. If you have parameter that has only two values (think male and female) and these are evenly distributed in the sample, it is difficult for this parameter to classify customers into 5 groups with just 2 values (Yes, I said difficult, not impossible). The other way around, if you have colour scheme of the sofas coded into 10000 colour hue and you have about 5000 customers in the sample, some colour options will not have even single customer interested so their predictive relevance gets very questionable as well. Feaure engineering should give you plethora of inputs but should also spare you from pointless cases.

3] Prune to have more efficient model training and operation. Some methods down right struggle under burden of the too many input parameters (think Logistic regression). Yet other can swallow the load but take ages to recalculate or retrain the model. So, it is strongly encouraged to allow models more breathing space. In order to do so one has to care for two dimension: You should not allow clutter from too many distinct values in one variable. For instance modelling health of the person based on the age is probably sensible assumption, but nothing serious will come out, if you count the age meticulously in number of days person had when accident happened, rather then in years (or even decades of years). So good Feature engineering should do (automatic) binning of the values. Secondly the model should not struggle with maintaining the parameters that bring little additional value for the precision of the prediction and it is role of Feature engineering to prune the input set. Good model is not the most precise one, but the one with least parameters needed to achieve acceptable precision.  Because only then you achieve quality through quantity maxim mentioned before.

4] Signal, if I missed anything relevant. Lastly but not least, good feature engineering approach should be your buddy. Your guide through the process. It should be able to point you to information space areas that are under-served or completely missing from your input set. If you are modelling probability to buy certain product, you should have features on how pricy the product is included. If you don’t the feature engineering system should remind you of that. You might scratch your head how would the system know. Well there are 2 common approaches how to achieve this. You can either define permanent dictionary of all possible features and create some additional layers/tags (like pricing issues) on top of the raw list. Upon reading the set you are just about to consider the system would detect there is no representative from the pricing category. If you do not have the resources to maintain such a dictionary, then you can use historical knowledge as the challenger. Your system reviews similar models done (by your) organization and collects the areas of the features used in given models. Then it can verify, if you covered all historically covered areas also in your next model. Though this might sound like wacky idea, in larger teams or in organizations with “predictive model factories” having such a tool is close to must.

We went together through requirements that Good feature engineering should include. Now that you know, you can return to drawing board of your analytical project and think about how to achieve that. If you happen to work in Python you can get a bit more inspiration on this topic from my recent presentation at AI in Action Meetup from Berlin.

 

Bread for 8 Facebook likes. You can already buy food for personal data.

Data are labelled to be commodity in corporate speak for already quite some time. Monetization (e.g. selling of) the data is not just trendy concept but turned to be essence of business for horde of smaller or medium companies. (let me name Market locator as one of the shiny examples) However, so far, the data trade has been a B2B business mainly. Ordinary person could not pay for his regular food supplies with his/her own data. That is over for good. Grocery store has been opened in Germany, where you pay for purchase solely by personal data.

DATEN_markt_chleba

Milk for 10 photos, bread for 8 likes

No, this is not any April’s fool or hidden prank. In German city of Hamburg Datenmarkt (= datamarket) has opened its first store, where you pay solely by personal data from Facebook. Toast bread will cost you 8 Facebook likes or you can take pack of filled dumplings for full disclosure of 5 different Facebook messages. As you would expect the Grocery to, Datenmarkt ran lately special weekly action on most of the fruit with price cut down to just 5 Facebook photos per kilo.

DATEN_markt_frucht

Payment is organized into similar process as you know from the credit card payment via POS terminal. But instead of entering the PIN into the POS terminal to authorize the transaction you have to log into your Facebook account and indicate which Likes given, Facebook photos posted or messages from FB messenger, you decided to pay with. The cashier receipt also showcases the data that you decided to sacrifice. These provided data are at full disposal of the Datenmarket investors to sell them (along your metadata) to any third party interested. By paying for the goods by your social data, you give full consent to Datenmarkt for this trading.

DATEN_markt_ucet_II

Insights we give away

What might look like elegant and definitive solution to homeless people starving has actually much higher ambitions. Authors of this project want to also raise the awareness, that in our age any data point has its value. Reminding us that the sheer volume of data we freely give away to Titans like Google, Facebook or Youtube, has actually real monetary value attached to them.

In more mature markets the echo of “what is the value of digital track of person” resonates more intensively with every quarter of they year. To put it differently, if we hand over data to “attention brokers” (as labelled them Columbia University, USA professor Tim Wu, studying the matter) what do we actually get in return for those data? We might come to the point where outstanding debt forces us not only to “smash our piggy bank” but to sacrifice our Facebook assets as well.  Most of the governments have set monetary equivalent to human life (often calculated as lifetime taxes contribution). But value of our digital life remained still to be added. Bizarre? Well, not any more in our times.

Food Yes, but not for the services

The opening of the Hamburg Datenmarket, , sadly, confirms the hypotheses researched by behavioural economists for more than 5 years. In monetization of their data people simply tend to stick to same principles as by sweets (or other tiny sins). Immediate gratification wins above promise of some future value. Most of the citizens comprehend that their personal data might have some value. If given the option of using the personal data to get cheaper or better services (e.g. insurance or next haircut for free) soothing up to 80% of them turn the offer down. However, if the same people were given choice of something tangible immediately, they were willing to trade their personal data for as trivial things as one pizza.

DATEN_markt_gulky

Now your thoughts go: “Wait, but the above mentioned digital giants are not offering anything tangible, all of the Google, Facebook and YouTube business is in form of the services, isn’t it? And you may be right but the point here is that all of them offer immediate gratification. (immediate likes on Facebook, hunger for quick search on Google or immediate fun with YouTube videos.) Quite a few people would see turning their digital data (deemed to be as material as air we breath in and out) for something consumable as “great deal”. If this trend confirms to be true, it is likely that service exchanging yogurt or ticket to concert for our personal data will be mushrooming around. If you have similar business idea in your head, you better act on it fast. Commission for themightydata.com portal for seeding this idea to your head remains voluntary for you 😊.

Hold your (data) hats …

Immediately after roll-out of personal data payment for existing goods, there will be someone willing to give you goods for personal data loan (or data mortgage). In other words, you will hand over your author rights in advance, even before the content is created. For those in financial difficulties this still might seem more reasonable option to give away than fridge or you shelter. But here, we are already crossing the bridge to digital slavery of a kind. Keep in mind that the most vulnerable would be the young lads, still having long (and thus precious) digital life ahead, but short on funds to (e.g. by new motorbike) in this life stage. The whole trend will be probably infused by some industries (like utilities) that long the data to be their saviour from recent misery of tiny margins.

Soon we shall expect that besides Bitcoin, Ethereum of Blackcoin fever, person would be able to mine (from own Facebook account) the currency like FB-photoCoin or FB-LikeCoin. Ans our age will gain yet another bizarre level on top. Just imagine the headlines “Celebrity donated to charity 4 full albums of her Facebook photos? But, nobody can eat from that? Or can he now?

You might be interested to read:

3 WAYS how to TEACH ROBOTS human skills

What PETER SAGAN learned about his sprint from Helicopter?

CHILDREN had it REALLY tough on Titanic