I recently downloaded a great overview of AI and machine learning.
I gave my details as Alphonse Ahmad, an 18 year-old data science student from Brussels.
This business now has Alphonse’s email address and location on its CRM records.
I’ve also subscribed to the newsletter and marketing communications.
To the marketing folk Alphonse is the real deal and his data has been captured at a deterministic level.
What I’ve done is not big or clever, but nor is it uncommon.
Many people game email signup in return for something, be it to download AI and machine learning whitepapers or a coupon.
However, I found it vaguely amusing that I can fudge data to download an article that speaks about how data is a vital to utilising the undoubted value that AI and machine learning bring to the world.
My sense is that whilst we await the great and good to transpose as best they can the methodology around HOW they make use of two of the industries greatest buzz words we need to take a step back to better understand WHAT they are training these algorithms against.
Be it probabilistic, deterministic, bid stream or offline data, we rely on a truth set against which we predict our view of the world.
Truth sets require scale or critical mass, but crucially they also require a depth of accuracy to ensure that the assumptions our machines and algorithms make are accurate.
Let’s look at the recent UK General Election polls as one case in point; the YouGov poll reflected an more accurate view of the final results given that their truth set accounted for a more youthful audience compared to other polls whose panel audience is predicated towards an older demographic based on previous years electoral turn out.
Another example of need for quality truth sets is IBM’s Watson, the supposed “poster child” of AI and machine learning tech.
Tech Review notes the following around Watson’s accomplishments in the health sector: “If Waston has not yet accomplished a great deal along those lines one big reason is that it needs certain types of data to be trained. And in many cases such data is in very short supply.”
A recent Fortune article goes into more detail highlighting IBM’s $2.6 billion buy of Truven Health – a company that has access to over 200 million lives (in the healthcare market such records are referred to as “lives”).
I think it’s fair to say that whilst advertising is beginning to make strides to measure real world actions as a way of tracking success (eg, footfall attribution) the sad truth is success is still often equated in terms of clickthrough rates or views, so our need for quality truth sets may not be as sophisticated as the case of Watson, but it is required nonetheless.
To that end I hope a number of interesting trends will start to emerge and indeed I encourage industry to embrace the rationale behind them.
Question quality of a truth set
A better understanding of how firms establish a truth set will become as important as pushing for greater clarity around the tech and process within machine learning and AI black boxes.
Question the scale and modernity of a truth set
We currently leverage high quality econometric and statistical models that still have great relevance when it comes to assessing the relevancy and accuracy of data.
However, is it not time to consider business with vast subscriber bases as having validity in their own right when it comes to measuring the accuracy of data within new channels such as mobile?
Probe the fundamental quality of deterministic data
There is no doubting vetted deterministic data is of the highest quality, however not all deterministic is the equal.
For example, is age and gender data derived from say a phone contract that has been subject to credit checks of a higher quality then say that of a user log it asking for age and gender?
Probabilistic is not all bad
By the same token probabilistic data can be a minefield in terms of assessing its quality. However, by having a better understanding of the quality of truth sets is there a trade off between price, accuracy and privacy compliancy between probabilistic data from a transparent truth set and that of unverified deterministic data?
Buyers and sellers should be willing to test more and explore what strategies work for them.
The heightened value of data in an AI and machine Learning world
As the Watson Health example shows, behind any great tech needs to be exceptional data to process against.
Owners of high quality data should be savvy to this and if not all ready doing so should take a view on exploring data monetisation strategies.
This article was first published here