Google


Friday, October 05, 2012

Tags on top 100 UK sites

I took the top 100 UK web sites from Alexa and searched the homepage for web analytics tags. For nine websites I could not find any JS files or named tags, or the site returned an error.

I found that 23% use Doubleclick, 44% use Google Analytics, 9% use Nielsen, 5% use Omniture, 9% use Comscore. Tagman is only used by 1 site. Five sites have 3 tags altogether:

http://cnet.com
http://dailymail.co.uk
http://independent.co.uk
http://orange.co.uk
http://telegraph.co.uk

Thursday, October 04, 2012

scrape random movies from IMDB

I have created a scraper for IMDB. It creates a graph as it downloads all referred actors/directors, keywords, languages etc - basically features which could be put into a recommender or similar system.

See http://pastebin.com/ezkW0Ru1

Output looks like


http://www.imdb.com/title/tt1192995:genre/Animation genre/Family name/nm0784124/ name/nm1293791/ name/nm0265620/ name/nm0754781/ keyword/bear
http://www.imdb.com/title/tt1030901:genre/News country/jp language/ja
http://www.imdb.com/title/tt1016481:genre/Drama genre/Romance genre/Mystery name/nm0130215/ name/nm0280541/ name/nm0302384/ name/nm0309129/ name/nm0130191/ name/nm0560478/ name/nm0001607/ name/nm0908001/ name/nm0912604/ name/nm0133597/ name/nm0489010/ name/nm0005166/ name/nm0593411/ name/nm0929869/ keyword/soap keyword/tragedy keyword/betrayal keyword/shipper country/us language/en
http://www.imdb.com/title/tt1294723:genre/Short genre/Drama name/nm0430267/ name/nm3136900/ name/nm0018495/ name/nm0068168/ name/nm0231191/ name/nm0263099/ name/nm0341647/ name/nm0367731/ name/nm0792129/ name/nm0909848/ country/gb language/en company/co0248652/
http://www.imdb.com/title/tt1335935:genre/Documentary
http://www.imdb.com/title/tt1209362: