Saturday, November 06, 2010

Pop influencers

I did another graph similar to the jazz one, starting with Lady Gaga. Again I just keep influencers (indegree>=1) and get this network.

Jazz network

I have written a Python script which scrapes 'influenced by' from My start was one of the greatest jazz musicians of all time: John Coltrane. I then put it into a DL file and visualised it, the node size is set by indegree (ie those who are quoted as influencers). Louis Armstrong and Coleman Hawkins are key influencers. The disadvantage of this is that it usually goes back in time, so if you started with a younger jazz musician you should capture more of the golden years of jazz 1960s.

Here is the code:

Tuesday, November 02, 2010


Here is a Spotify playlist of Lang Lang, Richter, Gould and some others playing Beethoven's Appassionata. Gould plays it the slowest, whereas Serebriakov plays it the fastest.

Sunday, October 31, 2010

Relevant ads

I am getting Spanish ads from the US Dept of Agriculture. They are spending their money wisely.

Friday, October 08, 2010

Mahler's Symphonies

I have assembled Lebrecht's list of great Mahler symphony recordings in one Spotify list:
Mahler Lebrecht

Thursday, September 23, 2010

Rock Paper Scissors bot

I read this post and wrote some Python code which uses 2 of the hints given in the diagram. This bot will win 57% of the games.

I simulate a human draw using the % given and hence the bot never draws scissors unless it's a response to a 2 in a row from the human player.

#simulate a human player
#and answer with best strategy

import random


def humandraw(hist):
    if rand<.296: d='P'
    elif rand<.296+.354: d='R'
    else: d='S'
    if len(hist)>1:
        if hist[-1]==hist[-2]:
            while d==hist[-1]:
                if rand<.296: d='P'
                elif rand<.296+.354: d='R'
                else: d='S'           
    return d

def winner(p1,p2):
    if p1==p2: win=0
    elif p1=='R' and p2=='S': win=1
    elif p1=='S' and p2=='R': win=2
    elif p1=='P' and p2=='R': win=1
    elif p1=='R' and p2=='P': win=2
    elif p1=='S' and p2=='P': win=1
    elif p1=='P' and p2=='S': win=2
    return win
for g in range(games):
    if rand<.5: a='P'
    else: a='R'
    if len(hist)>1:
        if hist[-1]==hist[-2]:
            if hist[-1]=='P': #will draw S or R
                if rand<.5: a='R'
                else: a='P'
            elif hist[-1]=='R': #will draw P or S
                if rand<.5: a='R'
                else: a='S'
            elif hist[-1]=='S': #will draw P or R
                if rand<.5: a='S'
                else: a='P'
    #print a,d,win
    if win==1: wins+=1
    elif win==0: draws+=1

print games,wins,draws
print float(wins)/(games-draws)

Monday, September 13, 2010

Google's new data structure

Google Caffeine — the remodeled search infrastructure rolled out  across Google's worldwide data center network earlier this year — is not based on MapReduce, the distributed number-crunching platform that famously underpinned the company's previous indexing system. As the likes of Yahoo!, Facebook, and Microsoft work to duplicate MapReduce through the open source Hadoop project, Google is moving on.

Saturday, August 14, 2010

Cab fares in London

I have done some calculations on the back of the TfL examples given on their websites and the median price per mile (excluding the £2.20 minimum) during the day is £3.44 per mile. TfL give you the fare range so I have used the mid fare.

You can make your estimates using this table or using the Excel in the link below.

Distance Approx  journey  time Monday to Friday  06:00 - 20:00 (Tariff code 1) Mid price per mile Monday to Friday 20:00 - 22:00 Saturday and Sunday  06:00 - 22:00 (Tariff code 2) Mid price per mile Every night 22:00 - 06:00 Public holidays (Tariff code 3) Mid price per mile

1 mile 5 - 12 mins £4.60 - £8.40 £6.50 £4.30 £5.00 - £8.60 £6.80 £4.60 £5.20 - £8.60 £6.90 £4.70

2 miles 8 - 15 mins £7.20 - £11.20 £9.20 £3.50 £7.20 - £11.20 £9.20 £3.50 £8.00 - £12.40 £10.20 £4.00

4 miles 15 - 30 mins £11 - £19 £15.00 £3.20 £13 - £19 £16.00 £3.45 £15 - £23 £19.00 £4.20

6 miles 20 - 40 mins £17 - £28 £22.50 £3.38 £19 - £28 £23.50 £3.55 £24 - £34 £29.00 £4.47

Between Heathrow  and  Central London 30 - 60 mins £43 - £75

£43 - £75

£43 - £75

Avg per mile




Miles 19




Tuesday, July 27, 2010

Email broadcast checklist

Here is a checklist of what to ask before you send out an email campaign
  • Are your target counts right?
  • Do you have text and HTML version?
  • Are your pictures uploaded?
  • Do you have mirror and unsubscribe links?
  • Do you have the right ratio of copy and pictures?
  • Do you avoid spam keywords?
  • Is tracking set up?
  • Have you got an auto-reply for the reply-to address?
  • Is personalisation set up?
  • Have you taken steps not to be blacklisted?

Tuesday, June 29, 2010

Lines exe

The exe in the linked zip file prints the first or last rows of a very large file. You need to extract all files into the same folder as your large file, then in DOS run
lines [your file]
This will print 10 rows and also count number of rows.
You can change how many rows you see and if it's top or bottom like
lines -l 20 -t B [your file]
where 20 is number of rows and B is bottom.

If you want the output directed to a new file do
lines [your file] > [new file]

Friday, June 25, 2010

Government organisations by staff cost (salary)

I have had a look at the workforce cost data release. I have calculated a cost per FTE and taken those with more than 100 FTE. The costs seem rather high and might include expenses and pension contributions I guess. Nevertheless ODA and NDA get the most. The highest ranking proper department is International Development.

Organisation name Main/parent/sponsoring department (where applicable) Cost per FTE
Nuclear Decommissioning Authority DECC £94,634.83
Partnerships for Schools Department of Education £86,021.51
Government Actuary's Department Government Actuary's Department £79,230.77
Personal Accounts Delivery Authority Department for Work and Pensions £72,138.96
Office of Rail Regulation DfT £64,484.19
Department for International Development Department for International Development £64,410.68
NS&I H M Treasury £63,750.32
SEEDA BIS £63,739.65
Cabinet Office Cabinet Office £63,322.03
The Pensions Regulator DWP £60,993.98
Office of Government Commerce H M Treasury £60,389.45
Audit Commission Communities and Local Government £59,457.83
Pension Protection Fund DWP £59,306.57
Young People's Learning Agency DFE £59,174.88
Tenant Services Authority Communities and Local Government £57,858.23
Dept. of Health (core) Dept. of Health £57,550.96
Treasury Solicitors Department Treasury Solicitors Department £57,225.06
Advantage West Midlands BIS £57,012.42
SOCA (including CEOP) Home Office £56,919.73
CLG - Central Department Communities and Local Government £56,716.60
Export Credits Guarantee Department Export Credits Guarantee Department £56,398.10
Core DECC Department of Energy and Climate Change £56,393.67
Department for Culture, Media and Sport Department for Culture, Media and Sport £56,235.54

Top 10 government websites

Here are the top 10 government websites by number of unique users in March 2010. The first six are all service driven.

DoH and Justice actually attract more visitors than No10. The OPSI page is a bit of a surprise but it seems to act as host.

Website Unique Users Mar-10 11,433,948 6,975,509 4,360,065 1,715,043 1,501,135 1,447,337 1,038,601 1,003,197 879,726 800,676

Data is here

Saturday, June 12, 2010

Structured v unstructured data

Is the data on the web unstrcutured? It is to some extent, especially
what is said by individuals (buzz). The rest should be pretty
structured because you can exploit the HTML/PHP/XML syntax. With the
advent of the semantic web it is becoming more structured. The web is
becoming more like a neural net.

Crisis and employment

The UK job market has not been hit as hard by the economic crisis as
some expected. On the other hand the upturn in the US has mainly been
jobless. I think what has happened is that companies have streamlined
to increase productivity. In the UK people have been put on shorter
work rather than being fired. In the US firing is easier. This crisis
will exacerbate the underlying trend for unskilled jobs. We need a
educational revolution in the West not a scaling back as suggested by
the Lib Tory coalition, to save our society from further splitting

Monday, May 24, 2010

Giving away email addresses

Many people are aware that they increase spam by posting their email address online, they try to circumvent that by writing their email like me at gmail dot com. But the Python code below shows how to retrieve the email address from a string (even if people use brackets etc). So be cautious.

                text2=text2.replace(' dot ','.')
                text2=text2.replace(' (dot) ','.')
                text2=text2.replace(' [dot] ','.')
                text2=text2.replace(' {dot} ','.')
                text2=text2.replace(' at ','@')
                text2=text2.replace(' (at) ','@')
                text2=text2.replace(' [at] ','@')
                text2=text2.replace(' {at} ','@')
                #now put into list
                for w in text3:
                    if w.find('@')>0 and w.find('.')>0:
                        if w.rfind('@') is not w.find('@'): #in case of 2 at

Friday, May 21, 2010

Google Social Graph in Python

I have started using the Google Social Graph API. I am running it in Python. It's pretty straightforward. I am using the edges out option (edo=1) and then collect all nodes_referenced. I do this up to the fourth degree. This way you can check what information about you or your friends is out there and can be linked up.

Tuesday, May 11, 2010

Twitter Follower Hack Has Twitter Leaping to Fix, Tweeps Panic-Tweeting [Updated] - via Fast Company

"Want to force anyone on Twitter to follow you (yes, even the wondrous
Mr. Fry)? There's a hack for that, and it's damned simple. The thing
is, it looks like it's kinda, sorta, maybe broken Twitter...and
everyone has zero followers.

Gizmodo just had a piece demonstrating the hack, which they speculate
may be a layover code from Twitter's earlier days that's still in
action. It couldn't be simpler: Visit, log in to your
profile, click on "Find people" and in the search box type "accept
xxxxxx" replacing the x's with the username of your desired target. It
may throw up an error message, but it seems to work very reliably.

The act has been quickly christened Twape on the Intertubes, meaning
"Twitter rape" and it's potentially incredibly important. So
incredibly important, in fact, that the community is freaking out
(while occasionally throwing out good jokes,) and not only because of
Twape itself, but because it looks at first glance like it may have
broken the entire Twitter system. But really this is probably a sign
that Twitter may already be on the case--if you visit
right now you'll see your follower counts are at zero...though your
feed will still connect to all the right Tweeps that you were
originally following.

This, folks, is a living act to demonstrate exactly how powerful
Twitter has become. For users who make the most of Twitter's global
reach to promote their wares, connect with customers or engage in
dialog with clients and friends, the list of people following you is a
jealously monitored and important thing (even while research
repeatedly shows that it's not how many people who follow you, but who
is following you that counts.) Any notion that Twitter may somehow
lose track of this data is frightening. Fingers crossed, hey, Tweeps?

Update: Twitter's confirmed that it's on the case, and that user's
follower counts will be returning to normal. A quick scan of the feeds
shows that this is only true for a limited number of Tweeps for the
mo, but we have confidence in the system."

via Fast Company

I am six users

I am browsing the Internet at work using two browsers, on the iPhone, on the blackberry, and at home using two browsers. This means web analytics packages would count me six times if I visited the same page from all. Are the counts over-inflated?

Sunday, April 25, 2010

View slideshows

Replace 'test' in and see a slideshow, this example skips videos and shows only CC content, so in case you like something you can pause and copy the picture.

Wednesday, April 14, 2010

Tory and Labour manifesto speeches

Here are word clouds of the 2 manifesto sepeechees by the Cameron and Brown (could not locate Clegg's). As you can see Cameron talks a lot about the government, people, power and together. Brown talks about Britain, people, future. The word fair doesn't actually come through that much in Brown's speech.

Monday, April 05, 2010

(BN) Soros Plans Oxford University Economics Institute on Markets, Times Says

Bloomberg News, sent from my iPhone.

Soros Plans Economics Institute at Oxford University, Times Says

April 5 (Bloomberg) -- George Soros, the billionaire U.S. investor, is helping to establish an economics institute at Britain's Oxford University, the London-based Times reported.

The institute, to be headed by Professor David Hendry, a fellow of Nuffield College, will be part of the university's James Martin 21st Century School; Soros and the school itself are each contributing $5 million, the newspaper said.

The initiative is part of a campaign by Soros to push the economics profession away from the idea that markets should be left to themselves, a premise that, he believes, helped to cause the global financial and economic crisis, the Times said.

Soros hopes to set up further institutes at universities in Germany, France, China, Italy and the U.S., the newspaper said, adding that Oxford was picked for the first because the financier sees Britain as more open to fresh economic thinking than the U.S.

Soros's funding is being channeled through the New York-based Institute for New Economic Thinking, or Inet, established last year; Robert Johnson, a former managing director at Soros Fund Management LLC who now heads Inet, said the economic crisis and economists' failure to predict it show that a broader approach to economics is necessary, taking in history, psychology, natural science and even literature, the Times reported.

Click here for web link

Find out more about Bloomberg for iPhone:


Saturday, March 27, 2010

Google public data

Google has started its own data tool. It's still quite basic and has mainly EU and US stats but it's a nice addition. Here it shows that the UK which used to have always lower unemployment, has now higher unemployment than Germany.

This week

Diane Abbott, who appears on BCC1 This Week, is getting increasingly annoying. She interrupts guests when she doesn't agree and has an awful temper. Her dress sense which was never great also seems to diminish. I did actually like her once but she seems to be a really difficult person.

Monday, March 22, 2010

Movies of the Decade

You can browse the movies of the decades on IMDB by replacing the number in the following URL,1949&title_type=feature&num_votes=10000,&sort=user_rating,desc

40s Casablanca
50s 12 Angry Men
60s Good, Bad, Ugly
70s Godfather

SAS search macro

I have written a search macro in SAS which performs text search on various variables. It allows for and/or logic although only one at a time. Let me know what you think. Can be downloaded form Google Docs.

%search(dset, search data set
    str, characters to search for
    vars, variables to search in
    or=N,    N=and Y=or
    case=N, Y=case sensitive
    word=Y, Y=look for word N=look for words containing
    print=Y Y=prints
    out=search output data set

%search(one,dirk nachbar,firstname lastname) searches for dirk AND nachbar (not case sensitive) in variables firstname
 and lastname in table one

Friday, January 22, 2010

Class and income

There is a lot of talk about Labour making the election about class – probably to counter the Tories' Eton image. I think they are wrong in many ways. Firstly, they equate class and income, which is nonsense. You can be friends with royalty and not be rich and vice versa. In fact a lot of entrepreneurs didn't go to private school or university – and they and their children live better than many. Secondly, Labour's record on income inequality is so poor that they shouldn't even mention this. If they couldn't fix it in 13 years, how are they going to do it in the next 4 years when less growth is there to be distributed?

Monday, January 18, 2010

Information regarding Ocado deliveries from 18th January - bad comms

Ocado are usually good at customer communications. But this one really confused me. The subject states that it's about changes form 18th Jan, but this only gets mentioned in paragraph 4 where they say that I won't get a free newspaper anymore on some rubbish green pretext - which they lay out in paragraphs 1 to 3. Come on Ocado, you can do better than that.

Having trouble reading this email? View the web version

Dear Mr Nachbar,

As an efficient and environmentally friendly way of getting groceries from field to kitchen table, it's fair to say we're way out in front. Perhaps it's no big surprise: after all, it's something we've been working on since our very first delivery back in 2002.

But "being green" isn't just the latest marketing buzzword to us. It's at the very heart of what makes us Ocado, and no-one else. Like painstakingly paring down the weight of our van to enable us to deliver to more people using the same amount of fuel. Or completely overhauling our ordering system to reduce our food waste (at just 0.3% of total sales, we've got good reason to believe that we're industry leaders - and we're not finished yet). Or even becoming the first UK supermarket to actively collect and remake its own grocery bags.

These are some of the ways we've become more efficient, but there's still more to do. As founder members of the 10:10 campaign, we've pledged to reduce our carbon footprint by 10% this year. Making our grocery bags even more environmentally friendly will help us get there. So, too, will our trials of an innovative low-emission electric delivery van.

On a separate note, we've also been talking to our friends at The Times, and we've made a joint decision to stop including a free newspaper with Ocado deliveries from 18th January 2010. We're looking at continuing our partnership in other ways, however; and, to begin with, we're offering you a free seven-day trial of the electronic version of The Times.

To sign up for your free seven-day trial, simply follow this link to The Times' website and enter the code TIMES_TRIAL2.

Here's to the start of another green year at Ocado!

Jon Rudoe

Jon Rudoe
On behalf of the Ocado team

The only way to shop for groceries.

Voted favourite online supermarket in Which? magazine 2009 reader survey

Online Retailer of the Year 2005, 2007 & 2009, The Grocer Gold Awards.

Green Retailer of the Year 2009, The Grocer Gold Awards.
These prestigious awards recognise our efforts to deliver a more convenient,more sustainable alternative to supermarket shopping.

Large Retailer of the Year 2008, Online Green Awards.
We won this award in recognition of our revolutionary green approach to selling groceries

© 2010 Ocado Ltd. All rights reserved.

To help stop Ocado emails being seen as spam, please add '' to your address book.
If you would prefer not to receive any more information from Ocado please unsubscribe from our emails at any time.
The registered company address of Ocado Limited is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts, AL10 9NE.
Registered in England. Company registration number: 3875000

Friday, January 15, 2010

JavaScript hack enables Flash on iPhone

A rather clever programmer has managed to get the iPhone to run interactive apps created using Adobe's Flash platform. And because it works inside the Safari browser, it isn't subject to the dictatorial rules of Apple's App Store.

The software is called Gordon and it doesn't actually allow Flash to run natively on the iPhone. Instead, Gordon is a JavaScript runtime, written by Tobias Schneider, which enables the browser to play and display Flash files. A runtime is a collection of software that allows the running of code inside it. A helpful analogy is a software emulator for a games console. These enable you to play the original code of, say, Super Mario World, but inside an application that runs on your PC. It's also how games such as Sonic The Hedgehog are able to run on your iPhone.

View article...

Monday, January 11, 2010

Juicy Leak: Orange France's Boss Basically Confirms the iSlate [Updated]


Apple Tablet

Oh boy, adding to the spiraling whirl of Apple Tablet rumors it looks like the number two exec at cell-phone network Orange France has basically confirmed the iSlate is real, it's coming soon, and it'll have a global launch. 

The news is coming via an interview given by St├ęphane Richard on Europe 1 this morning. Richard is a senior exec at France Telecom/Orange, and as part of a longer discussion he was thrown this sudden question: "According to the weekly Le Point, in a short while your partner Apple is going to launch a tablet..." This could easily have been dismissed by Richard as idle speculation masquerading as a leading question. But Richard simply responded "oui..."

And thus came the next question, again speculative, but definitely mining for the right kind of info: "...with a Webcam..." Which got another "Oui" from Richard. And then came the interesting bit:

Interviewer: "And Orange users will be able to benefit from this too?"

Richard: "Of course!"

Basically, given three opportunities to either deflect the questions by saying "well, nobody knows..." or flatly deny the speculation, or even to squirm out of a direct answer like a politician, Richard chose not to. Or he forgot not to. Or he was allowed to leak some info by Apple. He was even enthusiastic about the matter--though note that, despite some of the mis-translated excitement about this elsewhere online, Richard didn't agree it was due in a "couple of days", instead that "quelques jours" (which aligns more with "someday soon") supports more the January announcement/March launch rumors we've been hearing recently.

Why should we pay attention to Richard's words? Because Orange is a senior player in the global cell-phone game--it controls vast grids in Europe and Africa, and with all its subsidiaries lumped together it's actually the world's fifth biggest operator. It's also a key iPhone distributor, and one that was chosen by Apple for early iPhone love before the multicarrier model really took off in Euroland.

Admittedly these are just off-the-cuff words. But if you add them to similar-feeling confidently assertive statements by The New York Time's Bill Keller back in October (noting that Keller even called it the "Apple slate") and the slew of detailed leaks that seem to be popping up at the moment, it really is going to get gadget fan's blood pumping. Particularly exciting is the fact that this concerns an imminent arrival of the gizmo in France--suggesting an international launch at the same time, which is different to the strategy Apple employed for the iPhone.


Update: Here's the video feed for your interest, in French of course. 

It's clear from the video that Richard is a little off-balanced by these questions, and you could perhaps assume he's speaking as speculatively as the interviewer himself. But he definitely mentions video calling, and the necessary infrastructure changes to support the increased data load, and that rules out that he's talking about iPhones.

View article...


apple-orange.jpg (11 KB)


Friday, January 08, 2010

Failed bomber

The failed bomb attempt on Christmas day and the failure of the security services is quite interesting from a data point of view. Apparently the agencies gave different databases which were not linked up properly. This is almost like in a badly run marketing or sales department. What the agencies need is a single suspect view. However the risk of errors is amazingly higher than in any marketing department.

The History of Terrorism: From Antiquity to al Qaeda
Inside Terrorism