Friday, October 05, 2012

Tags on top 100 UK sites

I took the top 100 UK web sites from Alexa and searched the homepage for web analytics tags. For nine websites I could not find any JS files or named tags, or the site returned an error.

I found that 23% use Doubleclick, 44% use Google Analytics, 9% use Nielsen, 5% use Omniture, 9% use Comscore. Tagman is only used by 1 site. Five sites have 3 tags altogether:

Thursday, October 04, 2012

scrape random movies from IMDB

I have created a scraper for IMDB. It creates a graph as it downloads all referred actors/directors, keywords, languages etc - basically features which could be put into a recommender or similar system.


Output looks like genre/Family name/nm0784124/ name/nm1293791/ name/nm0265620/ name/nm0754781/ keyword/bear country/jp language/ja genre/Romance genre/Mystery name/nm0130215/ name/nm0280541/ name/nm0302384/ name/nm0309129/ name/nm0130191/ name/nm0560478/ name/nm0001607/ name/nm0908001/ name/nm0912604/ name/nm0133597/ name/nm0489010/ name/nm0005166/ name/nm0593411/ name/nm0929869/ keyword/soap keyword/tragedy keyword/betrayal keyword/shipper country/us language/en genre/Drama name/nm0430267/ name/nm3136900/ name/nm0018495/ name/nm0068168/ name/nm0231191/ name/nm0263099/ name/nm0341647/ name/nm0367731/ name/nm0792129/ name/nm0909848/ country/gb language/en company/co0248652/