Scraping stock prices using Alpha Vantage and Google Finance

Stock price scraping can be a nightmare if the APIs you’re trying to use are not up to date. Few months ago I was looking for free sources to obtain one-min-level data. Apart of having troubles with the Yahoo Finance API (apparently non up-to-date by then) and having to tweak some code samples in GitHub to scrape Google Finance, I found the new and shining provider of real-time stock prices, Alpha Vantage.

So I decided to develop an script to download data once a week for backtesting. The Alpha Vantage API is as straightforward to use as seen below. It returns a Pandas dataframe that later we save in a gzip file. You only have to request a free API key here.

"""
Retrieve intraday stock data from Alpha Vantage API.
"""

#Alpha Vantage API to download 15 days of minute data (only if required)
from alpha_vantage.timeseries import TimeSeries
apikey='YOUR_ALPHA_VANTAGE_API_KEY'

# Get pandas object with the intraday data and another with the call's metadata
ts = TimeSeries(key=apikey, output_format='pandas')

For Google Finance, I had seen several code samples in GitHub and other blogs. I literally copied the code below from those, only changing the URL of the requests to make it work.

"""
Retrieve intraday stock data from Google Finance.
"""

import csv
import datetime
import re

import pandas as pd
import requests

def get_google_finance_intraday(ticker, period=60, days=1, exchange='USD', debug=False):
    """
    Retrieve intraday stock data from Google Finance.
    Parameters
    ----------
    ticker : str
        Company ticker symbol.
    period : int
        Interval between stock values in seconds.
    days : int
        Number of days of data to retrieve.
    Returns
    -------
    df : pandas.DataFrame
        DataFrame containing the opening price, high price, low price,
        closing price, and volume. The index contains the times associated with
        the retrieved price values.
    """
    
    uri = 'https://finance.google.com/finance/getprices' \
          '?&p={days}d&f=d,o,h,l,c,v&q={ticker}&i={period}?x={exchange}'.format(ticker=ticker, period=period,  days=days, exchange=exchange)
    
    if(debug): 
        print (uri)
        
    page = requests.get(uri)
    reader = csv.reader(page.content.splitlines())
    columns = ['Open', 'High', 'Low', 'Close', 'Volume']
    rows = []
    times = []
    
    for row in reader:
        
        if re.match('^[a\d]', row[0]):
            
            if row[0].startswith('a'):
                start = datetime.datetime.fromtimestamp(int(row[0][1:]))
                times.append(start)
                
            else:
                times.append(start+datetime.timedelta(seconds=period*int(row[0])))
                
            rows.append(map(float, row[1:]))
            
    if len(rows):
        return pd.DataFrame(rows, index=pd.DatetimeIndex(times, name='Date'),
                            columns=columns)
    else:
        return pd.DataFrame(rows, index=pd.DatetimeIndex(times, name='Date'))

Having both functions that download the data in a dataframe, the only missing bit is the code that saves that dataframe in a CSV or a GZIP file. Though, I decided to complicate a bit the idea. The four functions below read a CSV of stock symbols and scrape+save each one from the source specified as parameter (Google Finance or Alpha Vantage).

download_list_of_prices receives a CSV file and a pointer to the last symbol downloaded (I normally use this for debugging) and tries to download each following symbol (listed in the CSV) an X number of times (that we define as input parameter); sometimes the Alpha Vantage service is not available and a good couple of tries in some symbols is required. That ‘trying’ is performed by the function try_download.

The function download_single_price_from saves each of the dataframes for each symbol iterated, not without creating a folder for that symbol if needed (using the function check_or_create_path).

"""
Download price for a given symbol using either Google Finance or Alpha Vantage
"""

def download_single_price_from(symbol,period=60,days=20,exchange='USD',site="google",debug=True, \
                               path="default" ,name="default"): #real max days at 1min level is 15...
    df = pd.DataFrame({'A' : []})
    site_option = ""
    
    # Download index price
    if site=="google_finance":
        df=get_google_finance_intraday(symbol,period,days,exchange,debug) 
    elif site=="alpha_vantage":        
        df, meta_data = ts.get_intraday(symbol, interval='1min', outputsize='full')
        
    # Save index prices
    output_file=check_or_create_path(path)+"/"+name+"_"+str(datetime.date.today())+".csv.gz"
    df.to_csv(output_file, sep=';', compression='gzip')


"""
Trying to download the symbols '$MAX_TRIES' times. 
After, we assume that the symbol is not available in the given API/Provider.
"""

def try_download(symbol,period,days,exchange,site,debug,path, name, tries_count):
    try: download_single_price_from(symbol=symbol,period=period,days=days,exchange=exchange,\
                                    site=site,debug=debug,path=path,name=symbol)
    except: # catch *all* exceptions
        e = sys.exc_info()[0]
        print( "&lt;p&gt;Error: %s&lt;/p&gt;" % e )
        
        # Recursive function that tries again from pointer when crashing (awaits 5 seconds to retry)
        if (tries_count&lt;MAX_TRIES):
            time.sleep(5)
            print ("Trying again... try #"+str(tries_count+1)+" of a maximum of "+str(MAX_TRIES))
            try_download(symbol,period,days,exchange,site,debug,path, name, tries_count=(tries_count+1))

        else: 
            print ("Maximum number of attemps for symbol: "+symbol+" from "+site+ \
                   ". Check in path '"+path+"' required.")
                   
"""
Download price for all the symbols of a given list
"""

import os
import sys
import time

def download_list_of_prices(root_path, list_file, symbols_subpath, \
                            period=60,days=20,exchange='USD',site="google",\
                            debug=True, from_symbol=''):
    # 1 Initialize pointer
    download_symbol=False if from_symbol != '' else True
    
    # 2 Load list of stocks
    symbols=pd.read_csv(root_path+"/"+list_file, sep=';', parse_dates=True, infer_datetime_format=True)
    
    # 3 Check output root paths (and create them if needed)
    symbols_full_path=\
        check_or_create_path(str(check_or_create_path(root_path+"/"+site+"/"+symbols_subpath))+"/symbols")
       
    counter=0
    # 4 Go through list of prices
    for symbol in symbols['symbol'].tolist():
        counter=counter+1
        
        # Looking for start symbol to start downloading
        # An start symbol should be provided for instance in case of a failed download 
        # This is only to avoid starting from scratch in those cases.
        if (download_symbol or symbol==from_symbol):
            download_symbol=True
            
            # Print current symbol to be downloaded as feedback.
            # It also prints the symbol index in the whole list of symbols to download. 
            print ""+str(counter)+"/"+str(len(symbols['symbol'].tolist()))+" = "+symbol
            
            # Trying to download the symbols 10 times.
            try_download(symbol=symbol,period=period,days=days,exchange=exchange,\
                         site=site,debug=debug,path=symbols_full_path+"/"+symbol+"/",\
                         name=symbol, tries_count=0)

    print("Done!")


"""
Check if the path does exist. If it doesnt, create a folder for the given symbol
""" 

def check_or_create_path(path):
    directory = os.path.dirname(path)
    try: 
        os.stat(directory)
    except: 
        os.mkdir(directory)
    return path

Finally, we only need to run those functions. I’d suggest to use try_download, as seen below, to download a single symbol but trying a number of times. Then download_list_of_prices if we have a CSV ready with several symbols that we want to trigger.

if __name__ == "__main__":

    # Download SPX index
    for site in ["google_finance","alpha_vantage"]:
        #download_single_price_from("SPX", 60, 20, "USD", site ,True, RAW_DATA_PATH+site+"/"+"S&P500/index","S&P500")
        try_download("SPX", 60, 20, "USD", site ,True, RAW_DATA_PATH+site+"/"+"S&P500/index","S&P500", 0)

    # Download all SPX stocks
    from_symbol="" #"LUK" #DAL" 
    for site in ["google_finance","alpha_vantage"]:
        download_list_of_prices(root_path=PATH+"/raw",list_file="SPX_list.csv",symbols_subpath="S&P500", \
                                period=60,days=20,exchange='USD',site=site,debug=False,from_symbol=from_symbol)

    # Download SPY ETF
    for site in ["google_finance","alpha_vantage"]:
        try_download("SPY",60,20,"USD",site,True,RAW_DATA_PATH+site+"/"+"S&P500/spy_eft","SPY",0)

    # Download EURUSD
    for site in ["google_finance","alpha_vantage"]:
        try_download("EURUSD",60,20,"USD",site,True,RAW_DATA_PATH+site+"/"+"EURUSD","EURUSD",0)

    # Download Bitcoin-USD
    for site in ["google_finance","alpha_vantage"]:
        try_download("BTCUSD",60,20,"USD",site,True,RAW_DATA_PATH+site+"/"+"BITCOIN","BTCUSD",0)

    # Download NASDAQ
    for site in ["google_finance","alpha_vantage"]:
        try_download("IXIC",60,20,"USD",site,True,RAW_DATA_PATH+site+"/"+"NASDAQ/index","NASDAQ",0)
        # NASDAQ EFTs to track volumes
        try_download("QQQ",60,20,"USD",site,True,RAW_DATA_PATH+site+"/"+"NASDAQ/qqq_eft","QQQ",0)

See the full script here:

In the next post of this series we’ll see how to model a dataset of technical indicators by using prices and volumes.

About Us

Stay connected

Trending Slider

Surviving the Promptpocalypse: A Guide to Better LLM Conversations

A summary of how AI has progressed in the last 5 years and current challenges (by ChatGPT)

4 reasons for Agile in Analytics

Struggling with Hive… What can I do?

Working in a Big Data Project using the terminal

Scraping stock prices using Alpha Vantage and Google Finance

Latest

Surviving the Promptpocalypse: A Guide to Better LLM Conversations

A summary of how AI has progressed in the last 5 years and current challenges (by ChatGPT)

About Us

Stay connected

Scraping stock prices using Alpha Vantage and Google Finance

cetrulin

Related posts

A summary of how AI has progressed in the last 5 years and current challenges (by ChatGPT)

Flattening complex XML structures into Hive tables using Spark DFs

4 reasons for Agile in Analytics

Struggling with Hive… What can I do?

Surviving the Promptpocalypse: A Guide to Better LLM Conversations

Working in a Big Data Project using the terminal

Surviving the Promptpocalypse: A Guide to Better LLM Conversations

A summary of how AI has progressed in the last 5 years and current challenges (by ChatGPT)

4 reasons for Agile in Analytics

Struggling with Hive… What can I do?

Working in a Big Data Project using the terminal

Scraping stock prices using Alpha Vantage and Google Finance

Surviving the Promptpocalypse: A Guide to Better LLM Conversations

A summary of how AI has progressed in the last 5 years and current challenges (by ChatGPT)