This year 160,000 people enrolled in a Stanford online artificial intelligence course (HT: Chris Blattman). While those that completed the course received a certificate including their rank in the class, they didn't get any credit from Stanford. Why did so many people enroll?
My own prior is that tertiary education performs three roles: consumption, signaling, and training. Education is consumption because it is fun to learn new things, particularly if those things are presented well. A signaling aspect of college is necessary to explain why employers are willing to pay a large premium for a humanities major over a mere high school graduate. There is training in college as well. After four years of term papers I was able to write down my thoughts without embarrassing myself.
So which apply: Is the Stanford certificate worth something on the labor market, is it fun to learn AI basics from a couple of top researchers, or is there some productive use for the course material?
Scientists: can't find enough subjects for your upcoming experiment? Why not try the opt-out method? Instead of waiting for people to volunteer, automatically enroll everyone in your experiment, and give them the option to volunteer not to be in the experiment. When treating a cardiac arrest victim, paramedics in Seattle will flip a coin to determine which type of care to provide. The only way to opt out, apparently, is to wear a bright red bracelet with the words "no study" printed on it. (ht: David McKenzie)
"Data crunching by Canadian Tire, for instance, recently enabled the retailer's credit card business to create psychological profiles of its cardholders that were built upon alarmingly precise correlations. Their findings: Cardholders who purchased carbon-monoxide detectors, premium birdseed, and felt pads for the bottoms of their chair legs rarely missed a payment. On the other hand, those who bought cheap motor oil and visited a Montreal pool bar called "Sharx" were a higher risk. "If you show us what you buy, we can tell you who you are, maybe even better than you know yourself," a former Canadian Tire exec said.
This smells like a kitchen-sink regression to me. If you do a simple regression of some dependent variable (risk level of credit card user) on a ton of independent variables which have no true relationship with risk (chair pad material, preferred bird seed), some will come out as statistically significant due to random chance.
As a fun little exercise, suppose that we have 1000 observations of credit card users, and we have 500 independent variables relating to purchases which actually have nothing to do with risk. Suppose that risk varies uniformly from zero to one, and the independent variables relating to purchases which also vary uniformly from zero to one. I wrote a little matlab program to simulate this situation. I ran the experiment 100 times, and the median number of statistically significant (95% confidence) independent variables was 25 (surprising, right?). Since the particular variables which are significant is random, they might be weird things like "number of pepperoni pizzas ordered from Pizza Hut', or 'visits to paid public toilets'. However, this sort of data would be useless for prediction.
Maybe the credit cards have some more sophisticated tricks up their sleeves, but I am having a hard time relating my love of Montreal pool bars to my credit card payment behavior.
trys = 100;
obs = 1000;
regs = 500;
avg = zeros(trys,1);
for k = 1:trys
dat = rand(obs,regs);
[a,b,c] = regress(dat(:,1),[ones(size(dat,1),1),dat(:,2:end)]);
found = find((b(2:end,1)<0 & b(2:end,2)<0)|(b(2:end,1)>0 & b(2:end,2)>0));
avg(k) = size(found,1);