Introduction
Some years ago, I published a VBA code that proved to be very popular in blog users: the code and the accompanying workbook could be used for downloading files from the internet. One common request that I get since I published that code, is how to retrieve the hyperlinks from a web page, to download the files afterward.
I decided to help all these blog users by creating a workbook that does exactly what they requested. So, the workbook that you will find in the downloads section can be used for grabbing all the hyperlinks from a given URL. The procedure is quite straightforward: just enter the URL on a specific cell and hit the Get Links button. After a few seconds (depending on your internet speed) you will have all the external hyperlinks from the particular URL. Moreover, apart from the hyperlinks, you will also get the displayed text for each hyperlink. In this way, you can match the hyperlink with its position on the web page and filter out those links that are not relevant.
NOTE: the code ignores internal hyperlinks and anchor tags. In other words, the hyperlinks that are retrieved should all start with “http”.
How to get external hyperlinks from a web page – demonstration video
The short video below demonstrates the usage of the spreadsheet using as an example the URL of this blog. In this particular case, 43 hyperlinks were retrieved.
Downloads
The file can be opened with Excel 2007 or newer. Please enable macros before using it. The VBA code was protected using the Unviewable+.
Read also
Excel & VBA: Download Internet Files Automatically
Get Image Size From A URL
From the tool’s page:
Question: In which Office versions
actively supported by Microsoft can the file converted by the
Unviewable+ application compatible be opened with:
Reply: Office 2007 (32 bit), 2010, 2013, 2016 (32/64 bit)
The workbook was developed in Office 2013 (32 bit) and is working fine on my computer. In general, Office 2007 is known to have several issues.
… I think Unviewable+ is incompatible with Office2K7. Will try later on a machine with Office2k13.
Audeser, yes, I am doing something similar in this workbook (I use the getElementsByTagName method).
Regarding log-in code, I once wrote this article a few years ago:
https://myengineeringworld.net/2015/01/website-log-in-automation-vba-macro.html
I also confirm that web scrapping, as well as doing other “Internet stuff” from Excel can be a big trouble.
I’d run on MsOffice2k3 and MsOffice 2k7-SP3 with no luck. As you try to activate the macros, Excel warns that VBA project is corrupted and rips entirely apart, leaving the file useless.
Can you please check it is not happening with other Office versions?
I’ll go a little further on this as I can not run the file posted here.
Some time ago I developed (for my purposes of webscrapping) an HTML parser. I must confess it’s a daunting job well over 5.000 lines of code… but that nightmare came because I coded with no reference to any library (so it’s no iEFrame.dll dependant). I’m happy with it as it served my purpose to parse HTML, and in the end comes handy to also parse C++, JS, JSON,… whatever that has parser rules. But I must confess that dealing with it it’s horrendous (as I should inspect the target site before webscrapping nothing to get only the interesting… and leaving ads and other things outside of it… or it will go on anything it finds until the last linked item of the Net is downloaded). The worst thing is that as is recalls in some APIs to download files, Windows Defender marks its as Infected, and from time to time it hiddenly deletes the file }:|>
I suspect (from looking on what is not cyphered inside the “code”) that that is not your case, and that you have arrived to a shorter code using these two:
• Microsoft HTML Object Library (mshtml.tlb): This library is required to access all HTML controls which can be present on your HTML page.
• Microsoft Internet Controls (ieframe.dll): This reference is required to do operations on Internet Explorer because to open an HTML page we need to access Internet Explorer.
Regardless any one is interested, I think that this post could enlight a little more on this topic: https://audeser.wordpress.com/2018/10/30/vba-interact-with-html/ , as it explains how to go through a login system, it barely shows how to interact with HTML items, and output links to an Excel Sheet.
The initial code is not mine (anyway it’s referred on the post), but has been heavily modified to get to a functional webBrowser on Excel… to bypass Gmail two steps login and lets download from there on.
The needed code is something like this. Throw a “Microsoft Internet Controls” object inside an UserForm (or use oBrowser object as suggested in the commented code), with a textbox called txtURL, and a CommandButton. Paste this is the CommandButton1_Click event:
On Error GoTo Err
‘Dim oBrowser As InternetExplorer
Dim HTMLDoc As HTMLDocument ‘Reference to Microsoft HTML Object Library
Dim oHTML_Element As IHTMLElement
Dim oHTML_TagCol As IHTMLElementCollection
Dim lgItem As Long
‘Set oBrowser = Me.WebBrowser1 ‘New InternetExplorer
‘With oBrowser
With Me.WebBrowser1
.Silent = False
.Navigate Me.txtURL.Text
.Visible = True ‘False
‘Call Browser_Complete
Set HTMLDoc = .Document
Set oHTML_TagCol = HTMLDoc.getElementsByTagName(“a”)
‘N# items: oHTML_TagCol.length
For Each oHTML_Element In oHTML_TagCol
lgItem = lgItem + 1
ActiveSheet.Cells(lgItem, 1).Value = oHTML_Element.href
ActiveSheet.Cells(lgItem, 2).Value = oHTML_Element.innerText
Next
‘.Visible = True
End With
ExitProc:
Exit Sub
Err:
MsgBox (“Error Occured”)