Computational Investing
Chris Harris
11 Nov 2014


Introduction


In this class conducted by Tucker Balch, Georgia Tech, the first initiative involves developing an investment strategy, which has the ability to survive backtesting. For the time period concerned, Jan 2008 to Dec 2009, the market underwent a major downdraft in response to the subprime mortgage crisis. However, as this data presents, a simple strategy of buying stocks within the S&P 500 index, which dipped below $ 5 per share, a critical margin level requirement, for a holding period of just 5 trading days, could have kept an investment portfolio in the black throughout the ‘Great Recession’, without a significant drawdown. If a stock repeats this price pattern, it can be purchased multiple times. For simplicity, no portfolio scaling takes place, as only 100 shares are purchased for each event, regardless of investment capital size. The main goal of this study is to compare one strategy’s success relative to another, so portfolio size should not factor into the equation. Through this exercise and subsequent examples presented throughout the class, I became indoctrinated into the hedge fund manager’s circle. Program development procedes according to the following scheme:

  • event.py identifies times when a stock drops below $ 5 / share within the S&P 500 index
  • marketsim.py generates buy and sell orders to align with a 5 trading day holding period
  • analyze.py determines profits and losses based on buy / sell orders and porfolio capital funding


Python Code


event.py:

import os
import shutil
import math
import matplotlib as mpl
import matplotlib.pyplot as plt

# obtain market data
def acquire(sym,interval):

    if not os.path.isfile("../data/%s.csv" %sym):
        shutil.copy("../QSTK-0.2.8/QSTK/QSData/Yahoo/%s.csv" %sym, "../data")
    data   = open("../data/%s.csv" %sym,'r')
    stream = data.read()
    data.close()

    field  = 7
    entry  = ""
    val    = []
    matrix = []

    for ch in stream:
        entry = entry + ch
        if ch==',' or ch=='\n':
            if len(entry)>1:
                val.append(entry[:-1])
            entry = ""
        if len(val)>=field:
            if interval[1] >= val[0] >= interval[0]:
                matrix.append(val)
            val = []

    matrix.sort()
    return matrix

def plot(label,mean,std,delta):

    X  = []
    Y  = []
    xl = [-delta-1,delta+1]

    mpl.rc('text', color='#C8A078')
    mpl.rc('figure', facecolor='black', edgecolor='black')
    mpl.rc('axes', edgecolor='#C6BDBA', labelcolor='#C8A078', facecolor='black', linewidth=2)
    mpl.rc('xtick', color='#C8A078')
    mpl.rc('ytick', color='#C8A078')
    plt.tick_params(bottom=False, top=False, left=False, right=False)

    plt.figure(1)
    for i in range(0,2*delta+1):
        x = []
        y = []
        x.append(float(i-delta))
        x.append(x[0])
        y.append(mean[i] - std[i])
        y.append(mean[i] + std[i])
        if x[0] > 0.0:
            plt.plot(x, y, color='#F5A078', linewidth=1)
        X.append(x[0])
        Y.append(mean[i])
    plt.plot(X, Y, color='#004040', linewidth=2)
    plt.xlim(xl)
    plt.title("%s Mean Return Relative to S&P500 Index: 2008 to 2009" %label, fontsize=25)
    plt.xticks(fontsize=20)
    plt.yticks(fontsize=20)
    plt.xlabel("Trade days", fontsize=25)
    plt.ylabel("Cumulative Return", fontsize=25)

    plt.figure(2)
    plt.plot(X, Y, color='#004040', linewidth=2)
    plt.xlim(xl)
    plt.title("%s Mean Return Relative to S&P500 Index: 2008 to 2009" %label, fontsize=25)
    plt.xticks(fontsize=20)
    plt.yticks(fontsize=20)
    plt.xlabel("Trade days", fontsize=25)
    plt.ylabel("Cumulative Return", fontsize=25)
    plt.show()

def process(spfile):

    # read symbol file
    data   = open(spfile,'r')
    stream = data.read()
    data.close()

    # process symbol list
    entry  = ""
    symbol = []
    for ch in stream:
        entry = entry + ch
        if ch==',' or ch=='\n':
            if len(entry)>1:
                symbol.append(entry[:-1])
            entry = ""
    return symbol

def event(spfile,interval,ofile):

    hold  = 5
    delta = 20
    shr   = 100
    ref   = "SPY"
    act   = ["Sell","Buy"]

    # obtain reference data
    rmatrix = acquire(ref,interval)
    period  = len(rmatrix)
    print("\nperiod:\t%s to %s\n\t%i overall trade days\n\nomit:" %(interval[0],interval[1],period),end="")

    # check for event
    ev     = 0
    stream = ""
    array  = []
    sdt    = [0.0 for p in range(2*delta+1)]
    symbol = process(spfile)

    for sym in symbol:
        matrix = acquire(sym,interval)
        if len(matrix) < period:
            print("\t%s" %sym)
            continue

        for i in range(delta,period-delta):
            if float(matrix[i-1][6]) >= 5.0 and float(matrix[i][6]) < 5.0:
                date    = matrix[i][0]
                stream  = stream + "%s,%s,%s,%s,%s,%s,\n" %(date[0:4],date[5:7],date[8:10],sym,act[1],shr)
                date    = matrix[i+hold][0]
                stream  = stream + "%s,%s,%s,%s,%s,%s,\n" %(date[0:4],date[5:7],date[8:10],sym,act[0],shr)

                l = 0
                array.append([0.0 for q in range(2*delta+1)])
                for j in range(i-delta,i+delta+1):
                    dt = float(matrix[j][6])/float(matrix[i][6])-float(rmatrix[j][6])/float(rmatrix[i][6])
                    array[ev][l] = dt
                    sdt[l] = sdt[l] + dt
                    l = l + 1
                ev = ev + 1

    # evaluate event data
    std  = [0.0 for r in range(2*delta+1)]
    mean = [0.0 for s in range(2*delta+1)]
    for i in range(0,2*delta+1):
        sd2 = 0.0
        mean[i] = sdt[i]/float(ev)
        for j in range(0,ev):
            d = array[j][i] - mean[i]
            sd2 = sd2 + d**2.0
        std[i] = math.sqrt(sd2/float(ev))

    # write data file
    output = open(ofile,'w')
    output.write(stream)
    output.close()

    # display results
    print("\nEvent count = %i\n" %ev)
    plot("Event",mean,std,delta)

def main():

    ofile    = "orders.csv"
    spfile   = "sp5002012.txt"
    interval = ["2008-01-03","2009-12-28"]

    event(spfile,interval,ofile)

main()


Python Code


marketsim.py:

import sys
import os
import shutil

# obtain market data
def acquire(sym,interval):

    if not os.path.isfile("../data/%s.csv" %sym):
        shutil.copy("../QSTK-0.2.8/QSTK/QSData/Yahoo/%s.csv" %sym, "../data")
    data   = open("../data/%s.csv" %sym,'r')
    stream = data.read()
    data.close()

    field  = 7
    entry  = ""
    val    = []
    matrix = []

    for ch in stream:
        entry = entry + ch
        if ch==',' or ch=='\n':
            if len(entry)>1:
                val.append(entry[:-1])
            entry = ""
        if len(val)>=field:
            if interval[1] >= val[0] >= interval[0]:
                matrix.append(val)
            val = []

    matrix.sort()
    return matrix

def gather(orders):

    # read order file
    data = open(orders,'r')
    stream = data.read()
    data.close()

    # input order data
    field = 6
    entry = ""
    val   = []
    fmt   = []
    odata = []

    for ch in stream:
        entry = entry + ch
        if ch==',' or ch=='\n':
            if len(entry)>1:
                val.append(entry[:-1])
            entry = ""
        if len(val)>=field:
            for i in range(1,3):
                if len(val[i]) < 2:
                    val[i] = '0' + val[i]
            fmt.append("%s-%s-%s" %(val[0],val[1],val[2]))
            fmt.append(val[3])
            fmt.append(val[4])
            fmt.append(val[5])
            odata.append(fmt)
            val = []
            fmt = []

    odata.sort()
    return odata

def execute(orders,initial,interval,values):

    # retrieve market data
    symbol = []
    matrix = []
    odata  = gather(orders)
    print()
    for i in range(0,len(odata)):
        print("%s\t%s\t%s\t%s" %(odata[i][0],odata[i][1],odata[i][2],odata[i][3]))
        if odata[i][1] in symbol:
            continue
        symbol.append(odata[i][1])
        matrix.append(acquire(odata[i][1],interval))

    # process daily value
    stream = ""
    cash   = initial
    shr    = [0.0 for i in range(len(symbol))]
    for i in range(0,len(matrix[0])):
        date = matrix[0][i][0]
        for j in range(0,len(odata)):
            if date==odata[j][0]:
                for k,item in enumerate(symbol):
                    if odata[j][1]==symbol[k]:
                        delta=float(odata[j][3])
                        equity = delta*float(matrix[k][i][6])
                        if odata[j][2]=="Buy":
                            cash   = cash - equity
                            shr[k] = shr[k] + delta
                        if odata[j][2]=="Sell":
                            cash   = cash + equity
                            shr[k] = shr[k] - delta
        val = cash
        for j,item in enumerate(symbol):
            val = val + shr[j]*float(matrix[j][i][6])
        stream  = stream + "%s,%s,%s,%.8g\n" %(date[0:4],date[5:7],date[8:10],val)

    # write data file
    output = open("values.csv",'w')
    output.write(stream)
    output.close()
    print("\ninitial = %.8g\norders  = %s\nvalues  = %s\n" %(initial,orders,values))

def main():

    initial = float(sys.argv[1])
    orders = sys.argv[2]
    values = sys.argv[3]
    interval = ["2008-01-03","2009-12-28"]

    execute(orders,initial,interval,values)

main()


Python Code


analyze.py:

import sys
import os
import shutil
import math
import matplotlib as mpl
import matplotlib.pyplot as plt

# obtain market data
def acquire(sym,interval):

    if not os.path.isfile("../data/%s.csv" %sym):
        shutil.copy("../QSTK-0.2.8/QSTK/QSData/Yahoo/%s.csv" %sym, "../data")
    data   = open("../data/%s.csv" %sym,'r')
    stream = data.read()
    data.close()

    field  = 7
    entry  = ""
    val    = []
    matrix = []

    for ch in stream:
        entry = entry + ch
        if ch==',' or ch=='\n':
            if len(entry)>1:
                val.append(entry[:-1])
            entry = ""
        if len(val)>=field:
            if interval[1] >= val[0] >= interval[0]:
                matrix.append(val)
            val = []

    matrix.sort()
    return matrix

def finance(matrix):

    tsh = 252.0
    r   = [0.0 for m in range(4)]
    dr  = [0.0 for n in range(len(matrix))]

    # total return
    r[0] = float(matrix[len(matrix)-1][1])/float(matrix[0][1]) - 1.0

    # daily return
# --------------------------------------------------------------------------------------------
#
#   leave first element 'dr[0]' = 0
#
# --------------------------------------------------------------------------------------------
    for i in range(1,len(matrix)):
        dr[i] = float(matrix[i][1])/float(matrix[i-1][1]) - 1.0

    # mean daily return
    sdr = 0.0
    for i, item in enumerate(matrix):
        sdr = sdr + dr[i]
    r[1] = sdr/float(len(matrix))

    # standard deviation
    sd2 = 0.0
    for i, item in enumerate(matrix):
        d = r[1] - dr[i]
        sd2 = sd2 + d**2.0
    r[2] = math.sqrt(sd2/float(len(matrix)))

    # sharpe ratio
    r[3] = math.sqrt(tsh)*r[1]/r[2]
    return r,dr

def linearregression(matrix):

    s0  = 0.
    sx  = 0.
    sx2 = 0.
    sy  = 0.
    sxy = 0.
    X   = [0.0 for l in range(len(matrix))]
    Y   = [0.0 for m in range(len(matrix))]
    p   = [0.0 for n in range(3)]

    for i, item in enumerate(matrix):
        X[i] = float(i)
        Y[i] = float(matrix[i][1])
        s0   = s0 + 1.
        sx   = sx + X[i]
        sx2  = sx2 + X[i]*X[i]
        sy   = sy + Y[i]
        sxy  = sxy + X[i]*Y[i]

    meanx = sx/s0
    meany = sy/s0
    p[0]  = (s0*sxy - sx*sy)/(s0*sx2 - sx*sx)
    p[1]  = meany - p[0]*meanx
    ssy   = 0.
    ssr   = 0.

    for i, item in enumerate(matrix):
        ssy = ssy + (Y[i] - meany)*(Y[i] - meany)
        ssr = ssr + (Y[i] - p[1] - p[0]*X[i])*(Y[i] - p[1] - p[0]*X[i])
    p[2] = math.sqrt(1.0 - ssr/ssy)
    return p

def plot(label,symbol,matrix,array,p):

    X  = []
    Y  = []
    xl = [-10,len(array)+10]

    mpl.rc('text', color='#C8A078')
    mpl.rc('figure', facecolor='black', edgecolor='black')
    mpl.rc('axes', edgecolor='#C6BDBA', labelcolor='#C8A078', facecolor='black', linewidth=2)
    mpl.rc('xtick', color='#C8A078')
    mpl.rc('ytick', color='#C8A078')
    plt.tick_params(bottom=False, top=False, left=False, right=False)

    plt.figure(1)
    for i in range(0,len(array)):
        X.append(float(i))
        Y.append(p[0]*X[i] + p[1])
    plt.plot(X, Y, color='#F5A078', linewidth=2)
    Y  = []
    for i in range(0,len(array)):
        Y.append(float(array[i][1]))
    plt.plot(X, Y, color='#004040', linewidth=2, label="Port")
    color = ['#640000','#004000','#004040','#640064']
    Y  = []
    for i in range(0,len(array)):
        Y.append(float(matrix[i][1])/float(matrix[0][1])*float(array[0][1]))
    plt.plot(X, Y, color='#640064', linewidth=2, label=symbol)

    plt.xlim(xl)
    plt.legend(loc=(0.8,0.56), frameon=False, fontsize=20)
    fmt = "equity = %.4gt + %.4g\n    corr = %.4g\n" %(p[0],p[1],p[2])
    plt.annotate(fmt, xy=(25,900000), fontsize=20)
    plt.title("%s Performance: %s to %s" %(label,array[0][0],array[len(array)-1][0]), fontsize=25)
    plt.xticks(fontsize=20)
    plt.yticks(fontsize=20)
    plt.xlabel("Trade day", fontsize=25)
    plt.ylabel("Value, dollar", fontsize=25)
    plt.show()

def valdata(values):

    # read value file
    data = open(values,'r')
    stream = data.read()
    data.close()

    # input value data
    field = 4
    entry = ""
    val   = []
    fmt   = []
    vdata = []

    for ch in stream:
        entry = entry + ch
        if ch==',' or ch=='\n':
            if len(entry)>1:
                val.append(entry[:-1])
            entry = ""
        if len(val)>=field:
            for i in range(1,3):
                if len(val[i]) < 2:
                    val[i] = '0' + val[i]
            fmt.append("%s-%s-%s" %(val[0],val[1],val[2]))
            fmt.append(val[3])
            vdata.append(fmt)
            val = []
            fmt = []
    return vdata

def refdata(vdata,symbol):

    # retrieve market data
    interval = [vdata[0][0],vdata[len(vdata)-1][0]]
    matrix = acquire(symbol,interval)
    period = len(matrix)
    print()
    val   = []
    rdata = []
    for i in range(0,period):
        val.append(matrix[i][0])
        val.append(matrix[i][6])
        rdata.append(val)
        val = []
        if i==0 or i==period-1:
            print("%s\t%s" %(vdata[i][0],vdata[i][1]))
    return rdata

def analyze(values,symbol):

    vdata = valdata(values)
    rdata = refdata(vdata,symbol)

    # output results
    r,dr = finance(vdata)
    p    = linearregression(vdata)
    f    = ("\nPortfolio:\n\ntotal return       = %.4g\nmean daily return  = %.4g\nstandard deviation = %.4g\nSharpe ratio       = %.4g"
    %(r[0],r[1],r[2],r[3]))
    print(f)

    r,dr = finance(rdata)
    f    = ("\n$SPX:\n\ntotal return       = %.4g\nmean daily return  = %.4g\nstandard deviation = %.4g\nSharpe ratio       = %.4g\n"
    %(r[0],r[1],r[2],r[3]))
    print(f)
    plot("Portfolio",symbol,rdata,vdata,p)

def main():

    values = sys.argv[1]
    symbol = sys.argv[2]

    analyze(values,symbol)

main()


Results


$ python event.py

period: 2008-01-03 to 2009-12-28
        501 overall trade days

omit:   BWA
        CFN
        DPS
        KMI
        LO
        LYB
        MJN
        MPC
        PEG
        PM
        PSX
        QEP
        SNI
        TRIP
        V
        WPX
        XYL

Event count = 193

$ python marketsim.py 10000 orders.csv values.csv

2008-02-06      LSI     Buy     100
2008-02-13      LSI     Sell    100
2008-02-19      LSI     Buy     100
2008-02-26      LSI     Sell    100
2008-02-28      THC     Buy     100
2008-03-06      THC     Buy     100
2008-03-06      THC     Sell    100
2008-03-07      LSI     Buy     100
2008-03-13      THC     Sell    100
2008-03-14      LSI     Buy     100
                ...
2009-10-28      LSI     Buy     100
2009-10-30      RF      Buy     100
2009-11-03      LSI     Buy     100
2009-11-03      RF      Sell    100
2009-11-04      AMD     Sell    100
2009-11-04      LSI     Sell    100
2009-11-06      RF      Sell    100
2009-11-10      LSI     Sell    100
2009-11-27      THC     Buy     100
2009-12-04      THC     Sell    100

initial = 10000
orders  = orders.csv
values  = values.csv

$ python analyze.py values.csv \$SPX

2008-01-03      10000
2009-12-28      13590

Portfolio:

total return       = 0.359
mean daily return  = 0.0006709
standard deviation = 0.01091
Sharpe ratio       = 0.9762

$SPX:

total return       = -0.2207
mean daily return  = -0.0002553
standard deviation = 0.022
Sharpe ratio       = -0.1842



Figure 1: Event Mean Return with Standard Deviation Range Bars


figure1



Figure 2: Event Mean Return


figure2



Figure 3: Portfolio Performance


figure3