RUINING THE WEB FOR EVERYONE
I first did this with Match.com, but I was frustrated by the lack of accuracy in the search. Thankfully I found Plenty of Fish, and its easy to use, RESTful URL, search function, that also allowed for search based on alcohol, tobacco and drug use. I think I’m in data scraping heaven.
First some ground rules. I searched PoF for each age and for each gender. For example, women 26 to 26 within 10 miles of 10023. Plenty of Fish spits back a ‘number of results’ on the search, but only for searches that spit back less that 700 results. Once you hit 700, all it says is “700+”, whether its 721 or 8045. So we have to keep it under 700 for all the age ranges. And since 28 is the most popular age, especially for men, in some cases this meant I had to add on attributes. For example in LA and NYC, I had to specify that it only be individuals that listed any height, as this got me down to ~600 men in the 28yo category.
I repeated each city search, with a search for the number of users who claimed social or heavy drinking, smoking and drugs, independently. And after that it was a simple matter of graphing the data and pulling out the interesting bits.
Take a look at the project, Plenty Of Stats. The Ruby scraping code is completely open source, and the data is licensed under Creative Commons.
Nothing too surprising, huge gender disparity in favor of women, 18yo’s claim to drink and smoke at twice the national rate, and not alot of people are willing to admit to social drug use, although studies show it’s much higher.
What I did find interesting was that San Francisco is ~2.25:1 male to female ratio on Plenty Of Fish. I guess I just assumed that such a wired city would draw more gender equality in the online user space. But I’m guessing the effect of majority male tech firms has made for a massive gender gap in the Bay Area. The best odds by far are NYC, although it’s still no nearly equal.