Stock price scraping can be a nightmare if the APIs you’re trying to use are not up to date. Few months ago I was looking for free sources to obtain one-min-level data. Apart of having troubles with the Yahoo Finance API (apparently non up-to-date by then) and having to tweak some code samples in GitHub to scrape Google Finance, I found the new and shining provider of real-time stock prices, Alpha Vantage.

So I decided to develop an script to download data once a week for backtesting. The Alpha Vantage API is as straightforward to use as seen below. It returns a Pandas dataframe that later we save in a gzip file. You only have to request a free API key here.

For Google Finance, I had seen several code samples in GitHub and other blogs. I literally copied the code below from those, only changing the URL of the requests to make it work.

Having both functions that download the data in a dataframe, the only missing bit is the code that saves that dataframe in a CSV or a GZIP file. Though, I decided to complicate a bit the idea. The four functions below read a CSV of stock symbols and scrape+save each one from the source specified as parameter (Google Finance or Alpha Vantage).

download_list_of_prices receives a CSV file and a pointer to the last symbol downloaded (I normally use this for debugging) and tries to download each following symbol (listed in the CSV) an X number of times (that we define as input parameter); sometimes the Alpha Vantage service is not available and a good couple of tries in some symbols is required. That ‘trying’ is performed by the function try_download.

The function download_single_price_from saves each of the dataframes for each symbol iterated, not without creating a folder for that symbol if needed (using the function check_or_create_path).

Finally, we only need to run those functions. I’d suggest to use try_download, as seen below, to download a single symbol but trying a number of times. Then download_list_of_prices if we have a CSV ready with several symbols that we want to trigger.

See the full script here:

In the next post of this series we’ll see how to model a dataset of technical indicators by using prices and volumes.