Google


Monday, September 26, 2011

Wine region analysis

I have done some analysis on wine regions I am interested in, using wine-searcher.com data. I used all vintages of the top 500 wines (sometimes less) wine on their site. Apart from price, I also recorded Decanter awards. My analysis shows that Argentina has the best returns per rating point (ratings are an average of wine critics and the public, I have used points above 70). Piemont and Mosel have the worst stats: expensive or few awards. South Australia dominated among awards.


Region Wines Avg price Median price Awards Commended Net Avg point per pound Awards per wine
Alsace 2964 30 18 60 3 57 0.935 0.019
Argentina 2458 15 10 273 78 195 1.597 0.079
Greece 733 14 11 104 17 87 1.460 0.119
Mosel 3099 58 18 36 8 28 1.066 0.009
Piemont 5684 66 39 123 28 95 0.595 0.017
S Australia 4475 43 23 724 172 552 0.948 0.123
Stellenbosch 2425 17 11 353 100 253 1.460 0.104
Tuscany 5547 52 30 324 92 232 0.772 0.042

27385

1997 498 1499

Thursday, August 25, 2011

Classical music documentaries

Here are some documentaries or some famous performances


Oistrakh plays Shostakovich
http://www.youtube.com/watch?v=KIp9hcMwY6o

Oistrakh - Artist of People
http://www.youtube.com/watch?v=b28ItErX4FM

Karajan
http://www.youtube.com/watch?v=lhMO12vFA3Q

Andsnes
http://www.youtube.com/watch?v=DufZ_AeR5Lc

Glenn Gould A Portrait
http://www.youtube.com/watch?v=sWU_mC_dnxw

Glenn Gould - Life and Times
http://www.youtube.com/watch?v=Jiuw44HHb4g

Richter
http://www.youtube.com/watch?v=tYzXdh8nLn4

Bach
http://www.youtube.com/watch?v=5ZX7AJHz7HM

Mahler 3
http://www.youtube.com/watch?v=bArhdP88dGE

Bernstein plays Mahler 2
http://www.youtube.com/watch?v=k9_ONIz8XKA

Karajan plays Bruckner 8
http://www.youtube.com/watch?v=2ZNNcvVd2EI

Tchaikovsky - Who killed
http://www.youtube.com/watch?v=iq5uG6MfL54

Tchaikovsky - Discovering
http://www.youtube.com/watch?v=yA5enI0J82I

Rachmaninoff
http://www.youtube.com/watch?v=vi3MU9JnL7E

Mutter
http://www.youtube.com/watch?v=iMw9AY5-K9E

Mozart
http://www.youtube.com/watch?v=sHZu9kWuB-g

Beethoven
http://www.youtube.com/watch?v=Hx_4UY83C1I

Stravinsky
http://www.youtube.com/watch?v=G85YXinRvBY

Batiashvili
http://www.youtube.com/watch?v=jZu_OJjdGDg

Jansons plays Mahler 2
http://www.youtube.com/watch?v=sHsFIv8VA7w

Rite of Spring
http://www.youtube.com/watch?v=-7QgPgG4c-g

Shostakovich 5
http://www.youtube.com/watch?v=qHCIJ_oLoHw

Rostropovich plays Dvorak
http://www.youtube.com/watch?v=xxYbF-Yzdf0

Salonen
http://www.youtube.com/watch?v=2NtaBb3fvbk

Gould plays Bach
http://www.youtube.com/watch?v=8-KyL2gMxV8

Shostakovich - Close Up
http://www.youtube.com/watch?v=DRJdd7VMyUU

Shostakovich (private)
http://www.youtube.com/watch?v=gvHeCpB0qxg

Shostakovich - Against Stalin
http://www.youtube.com/watch?v=HCUxv7YHEgU


Friday, June 17, 2011

Bar/line chart improvement

Bar/line charts are quite useful when we want to show the development of 2 variables on different scales or units of measurement. By default Excel gives you the left chart, which is kind of ok, but I have 'developped' a better version on the right, which limits the charts at the min and max of the each series and draws a line in the mid point of the range (Median or average won't work here because they might not be on one line).

Also note how the colour font on the axis labels gets rid of any legend box.


Sunday, April 17, 2011

How to find R help online

In R you can find the help page of a function by typing help(func). If you want to something quickly online, place library and function in the following URL and off you go.

http://127.0.0.1:21798/library/[lib]/html/[func].html


Friday, April 15, 2011

Does Amazon filter Kindle items well?

As you might suspect, my answer is No. When searching for new Kindle books, I hardly find good results or recommendations. The problem is that there are a lot of virtually zero priced items which are top of the list but they are hardly worth the megabytes they carry. I am tired of looking at lists of cheap self help books.

Amazon seems to use the same recommendation idea for Kindle books but actually needs to adjust it to make it relevant. How does it help me if every second recommendation is Dracula just because it's free?

Amazon needs to put quality at the top of the list.

Better Choices Better Deals

BIS has published a paper which outlines how customers can benefit from using their data to optimise their shopping. They quote the many loyalty cards and tools which are already out there. They are creating the mydata initiative where customers can access their own data and find the best deals based on their usage.

I am quote skeptic about this scheme.

  1. Data is collected for a reason by specialised companies which exploit the data (not the customer), it is their asset.
  2. Data formats from different providers/retailers are vastly different and will never be brought under one roof, if it will the data will lose its richness. (A good example is the number of households quoted from Boots, Tesco and Nectar - they all use different definitions of what an active household is)
  3. There is a cost involved and it is not clear who will carry that.

Nevertheless I like the idea that customers have more rights to accessing their own data.

http://www.bis.gov.uk/assets/biscore/consumer-issues/docs/b/11-749-better-choices-better-deals-consumers-powering-growth.pdf

Wednesday, April 06, 2011

R and Python

Here is a R and Python syntax table, I have also included Numpy commands to make it more comparable. Where a cell is empty I could not find an equivalent.


Task Python Python Numpy R
sequence x=[I for I in range(1,11)]   x <- 1:10
scalar x=1 x=array(1) x <- 1
vector/list x=[1,2] x=array((1,2)) x <- c(1,2)
constant vector x=100*[1] x=ones(100) x <- rep(1,100)
append x.append(1)   x <- c(x,1)
matrix x=[[1,2],[3,4]] mat([[1,2],[3,4]]) x <- matrix(c(1,2,3,4),ncol=2,byrow=TRUE))
column stack   hstack((x,y)) cbind(x,y)
row stack   vstack((x,y)) rbind(x,y)
for for I in range(1,11):
 print I
  for I in c(1:10)) {
 print(I)
}
while I=1
while (I<10):
 I+=1
  I <- 1
while (I<10) {
 I <- I+1
}
if if I==10:
 print 'Yes'
else:
 print 'No'
  if (I==10) {
 print('Yes')
} else {
 print ('No')
}
length len(x) len(x) length(x), nrow(x)
columns len(x[0]) x.shape[1] ncol(x)
dimension   x.shape dim(x)
summary     summary(x)
read csv import csv
reader=csv.read(open('file','r'))
mydata=[]
for line in reader:
 mydata.append(line)
  mydata <- read.csv("file", header=TRUE)
write csv import csv
writer=csv.writer(open('file','w'))
for d in data:
 writer.writerow(d)
  write.csv(data, file="file", row.names = FALSE)
sum sum(x) sum(x) sum(x)
select element x[1][1] x[1,1] x[2,2]
last element x[-1] x[-1] x[-1]
select column   x[:,1] x[,2], x$Name
correlation   corrcoef(x,y) cor(x,y)
mean   mean(x) mean(x)
function def func(x):
 print x
 return x
  func <- function(x) {
 print(x)
 return(x)
 }
dot product   dot(x,b) x*b
transpose   transpose(x) t(x)
matrix product   b*x t(b) %*% x
random random.random() random.rand(1) runif(1,0,1)
sort x.sort()   sort(x)
help help(command) help(command) help(command), ??command

Wednesday, March 23, 2011

Who we reward

Why do we reward people like Silvio Berlusconi or Charlie Sheen with our attention. They are bad at what they are supposed to do. They are addicts. You don't have to be moral about it, they are simply ugly.

Why don't we celebrate (more) people who create something beautiful, practical or useful to society?

Tuesday, March 22, 2011

Maths of self-publishing

Joe Konrath has decided to self-publish rather than accept a 500k advance. He explains that he would get 70% rather than 14.9%. He also explains that pricing books cheaper, will get him higher e-sales. In the table below I have calculated that with these figures he only needs to sell 43% of what the publisher expects to sell to make the same money.

http://jakonrath.blogspot.com/2011/03/ebooks-and-self-publishing-dialog.html



Price Royalty % Sales Royalty
w/ publisher $9.99 14.9% 335,906 $500,000.00
self pub $4.99 70.0% 143,143 $500,000.00
% 50% 470% 43%




















Thursday, January 13, 2011

Groupon

Everyone is talking about Groupon, I wanted to check how fast they are growing. The below shows the global visitors - Groupon has overtaken Yelp but is closely followed by Living Social. It looks as though Living Social could actually overtake Groupon. I am not sure if the dip is due to the incomplete January.

How to hack Google charts

I ran a comparison on Google Trends and this is the location of the chart png. As you can see it has lots of colours in there and labels.

http://chart.apis.google.com/chart?cht=lc&chd=e:2Y2Y2Y2Y2Y2Y2Y2Y,B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6,nBnB,RRRR,________________________________AYBDBUBMBNBbByCJCzCcDlDrETFOF1GyGjHSHoJXKmLrLnMAMmMqOZQiUkTcSzUKVZbKaqe3nek4hK,AYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAY______________________________________________________________________________,AVATARAKAcATASBeDzBIBSBBAkAdAiAyA2A2A.BTA6A4A1BGAwA8A2BHB7BkCiCnCcDDDiE5FZITIpHqGHHkIQHnJwJ1I5PDMWL8O3PNSyYecW,,MUMgN7OSO2QFPjQYQwQeQHRXRrSYTRTtUxVvVkV8T8UmU0UsVGSxRWSITyXFWwXgXMXmV8XKYvY7XsYCYnYQbSbSa6axa1Z3Z4dXYiUKWsXSYS,&chds=0.0,2100000.0&chs=580x188&chco=ffffff00,ffffff00,ffffff00,ffffff00,4684eeff,4684eeff,dc3912ff,4684eeff,ff9900ff,4684eeff&chls=1.0,1.0,0.0%7C1.0,1.0,0.0%7C1.0,1.0,0.0%7C1.0,1.0,0.0%7C1.75,1.0,0.0%7C1.5,3.0,3.0%7C1.75,1.0,0.0%7C1.5,3.0,3.0%7C1.75,1.0,0.0%7C1.5,3.0,3.0&chxt=x&chxr=0,0.0,100.0&chxl=0:%7C%7CJan+2009%7C%7C%7CApr+2009%7C%7C%7CJul+2009%7C%7C%7COct+2009%7C%7C%7CJan+2010%7C%7C%7CApr+2010%7C%7C%7CJul+2010%7C%7C%7COct+2010%7C%7C%7C&chxs=0,443322ff,9.0,0.0&chm=v,443322ff,1,-1,1%7Ct+Daily+Unique+Visitors,676767ff,0,0,10,1%7Ct+Google+Trends,676767ff,0,6,10,1%7Ct+1.4+M,676767ff,2,0,10,1%7Ct+700+K,676767ff,3,0,10,1&chg=12.0,33.33,1.0,1.0,4.0

For instance I could change the 'Google Trends' in the top right to my name. I tried changing the tick at 700 to 600 but that changes just the label not the data. You could try to increase the dimensions from 580x188 to something bigger.

http://chart.apis.google.com/chart?cht=lc&chd=e:2Y2Y2Y2Y2Y2Y2Y2Y,B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6B6,nBnB,RRRR,________________________________AYBDBUBMBNBbByCJCzCcDlDrETFOF1GyGjHSHoJXKmLrLnMAMmMqOZQiUkTcSzUKVZbKaqe3nek4hK,AYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAY______________________________________________________________________________,AVATARAKAcATASBeDzBIBSBBAkAdAiAyA2A2A.BTA6A4A1BGAwA8A2BHB7BkCiCnCcDDDiE5FZITIpHqGHHkIQHnJwJ1I5PDMWL8O3PNSyYecW,,MUMgN7OSO2QFPjQYQwQeQHRXRrSYTRTtUxVvVkV8T8UmU0UsVGSxRWSITyXFWwXgXMXmV8XKYvY7XsYCYnYQbSbSa6axa1Z3Z4dXYiUKWsXSYS,&chds=0.0,2100000.0&chs=580x188&chco=ffffff00,ffffff00,ffffff00,ffffff00,4684eeff,4684eeff,dc3912ff,4684eeff,ff9900ff,4684eeff&chls=1.0,1.0,0.0|1.0,1.0,0.0|1.0,1.0,0.0|1.0,1.0,0.0|1.75,1.0,0.0|1.5,3.0,3.0|1.75,1.0,0.0|1.5,3.0,3.0|1.75,1.0,0.0|1.5,3.0,3.0&chxt=x&chxr=0,0.0,100.0&chxl=0:||Jan+2009|||Apr+2009|||Jul+2009|||Oct+2009|||Jan+2010|||Apr+2010|||Jul+2010|||Oct+2010|||&chxs=0,443322ff,9.0,0.0&chm=v,443322ff,1,-1,1|t+Daily+Unique+Visitors,676767ff,0,0,10,1|t+Dirk+nachbar,676767ff,0,6,10,1|t+1.4+M,676767ff,2,0,10,1|t+700+K,676767ff,3,0,10,1&chg=12.0,33.33,1.0,1.0,4.0

Wednesday, January 12, 2011

Kaggle social network challenge - test/train code

For those having participated in the Kaggle social network challenge here is the Python code to split the full downloaded graph into test and training.

#create random sorted train set and test set with equal amounts of true and false edges

import random

samp=9000

#import complete file
f1=open('complete4.txt','r')
f2=open('simplesplit_test.txt','w')
f3=open('simplesplit_validate.txt','w')
f4=open('simplesplit_train.txt','w')


prim=[]
prim_set=set()
sec_set=set()
prim_connections={}
prim_2plus=0
sec_connections={}
sec_2plus=0
for line in f1:
    a=line.split(',')[0]
    b=line.split(',')[1].strip()
    prim.append([a,b,random.random()]) #need rand for later
    if a in prim_set: #if seen before
        prim_connections[a]+=1
    else:
        prim_connections[a]=1       
    if b in sec_set: #if seen before
        sec_connections[b]+=1
    else:
        sec_connections[b]=1       
    prim_set.add(a)
    sec_set.add(b)
   
print len(prim),len(prim_connections),len(sec_connections)

#universe of those with 2+ connections
prim_universe=set()
for p in prim_connections.keys():
    if prim_connections[p]>1:
        prim_2plus+=1
        prim_universe.add(p)

#universe of those with 2+ connections
sec_universe=set()
for p in sec_connections.keys():
    if sec_connections[p]>1:
        sec_2plus+=1
        sec_universe.add(p)
       
print prim_2plus,sec_2plus

#chose 2 sets 5000
sample=random.sample(prim_universe,samp)
sample1=set(random.sample(sample,samp/2))
sample2=set([i for i in sample if i not in sample1])

print len(sample),len(sample1),len(sample2)

#sort by random
prim2=sorted(prim,key=lambda rand:rand[2])

del prim

prim3=[]
sample1_done=set()
for i in prim2:
    if i[0] in sample1:
        if i[0] not in sample1_done and (sec_connections[i[1]]>1 or i[1] in prim_connections): #not done and inbound has other edge
            sec_connections[i[1]]-=1
            f2.write(i[0]+','+i[1]+'\n') #test
            f3.write(i[0]+','+i[1]+',1\n') #validate
            sample1_done.add(i[0]) #is done
            print len(sample1_done)
        else:
            f4.write(i[0]+','+i[1]+'\n') #train       
    else:
        f4.write(i[0]+','+i[1]+'\n') #train
        if i[0] in sample2: #create a subset of prim to speed up non pairs check
            prim3.append([i[0],i[1]])

del prim2

print len(prim3)

#for sample2 chose non connections
count=0
for i in sample2:
    if count
        done=0
        prim4=[j[1] for j in prim3 if i==j[0]] #a subset
        while done==0:
            rand=random.sample(sec_universe,1)[0] #because 1 returns set
            if rand not in prim4 and rand<>i:
                done=1
                count+=1
        print count
        f2.write(i+','+rand+'\n') #test
        f3.write(i+','+rand+',0\n') #validate
    else:
        break

f1.close()
f2.close()
f3.close()
f4.close()

Tuesday, January 11, 2011

Android lock

I have tried to determine the number of possible combinations on a 3x3 Android pattern lock. I come up with 10,305 combinations which is slightly more than the 10,000 combinations you would get with a 4 number lock. Let me know if you find any errors.

#count how many patterns there are on 3x3 lock
#every point visited once
#can move straight and diagonal
#path length 1 to 9

#length 1 is trivial: 9 possibilities

done=dict()
for c in range(1,4):
    for r in range(1,4):
        poss=len(done)
        done[poss]=[[c,r]]

poss=len(done)
print poss

def posspath(curpath): #returns all possible paths from a curpath
    outpath=[]
    #can go 8 different ways
    for add in ([0,1],[1,0],[1,1],[0,-1],[-1,0],[1,-1],[-1,1],[-1,-1]):
        c=curpath[-1][0]
        r=curpath[-1][1]
        #if within bounds and not visited
        if 1<=c+add[0]<=3 and 1<=r+add[1]<=3 and [c+add[0],r+add[1]] not in curpath:
            outpath+=[curpath+[[c+add[0],r+add[1]]]]
    return outpath
   
for path in range(2,10):
    for c in range(1,4):
        for r in range(1,4):
            curpath=[[c,r]]
            nextpath=posspath(curpath)
            i=0
            while i
                if len(nextpath[i])==path:
                    if nextpath[i] not in done.keys():
                        poss=len(done)
                        done[poss]=nextpath[i]
                else:
                    #explore possible and remove original
                    nextpath2=posspath(nextpath[i])
                    nextpath.remove(nextpath[i])
                    nextpath+=nextpath2
                    i-=1
                i+=1
           
poss=len(done)
print poss