How To Scrape Daily Stock Gainers From Yahoo Finance using Python

In this tutorial, I will show you how to crawl top stocks from Yahoo finance website.

I will use following two Python packages to crawl the website
.

  • lxml
  • requests

Let us first import the packages.

In [1]:
import lxml.html
import requests

For Daily top gainers, we can crawl the following yahoo finance URL...
finance.yahoo.com/gainers

In [2]:
url = 'https://finance.yahoo.com/gainers'
In [3]:
headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:91.0) Gecko/20100101 Firefox/91.0'
}

Let us create a Python function to crawl the above URL. We will do following...

  • Open the URL using Python requests
  • Convert the HTML page text (DOM) to LXML library format using lxml.html.fromstring method
  • Query the HTML DOM for the section which contains the daily stock gainers

To extract the stocks list, open the above yahoo url (in Google Chrome), right click on top "Matching stocks" text and click on inspect. You should see div id="fin-scr-res-table". If you take your cursor over it, you would see that entire table is selected, we want to extract the contents from this div.
In the code below, We are collecting all the links and then for each link we are extracting the href text.

In [5]:
Image(filename="images/google-chrome-yfinance.png")
Out[5]:
In [6]:
def printStocks(url):
    ytext = requests.get(url,headers=headers).text
    yroot = lxml.html.fromstring(ytext)
    for x in yroot.xpath('//*[@id="fin-scr-res-table"]//a'):
        print(x.attrib['href'].split("/")[-1].split("?")[0])
In [7]:
printStocks(url)
HKD
GOCOF
TMNA
IBRX
EVEX
AUR
TWLO
FRSH
CANO
ENVX
EQRX
DOOO
FNMAT
CRK
FMCKM
EE
MNTK
MMYT
CTRA
CRSP
CNX
THNPY
NKLA
APA
BHC

Fortunately we can use the above function for Yahoo's other pages too. Let us say want to extract the all undervalued growth stocks from the following URL...

In [8]:
url = "https://finance.yahoo.com/screener/predefined/undervalued_growth_stocks"
In [9]:
printStocks(url)
F
PFE
X
FCX
PBR
AUY
CLF
M
GGB
COP
UMC
KMI
DVN
CVX
CVE
APA
TSM
HPE
MRNA
PBR-A
DXC
ABBV
SU
HAL
QCOM