PDF Search Through VBA

Share this

May 5, 2014

PDF Search Through VBA


Introduction 


This post’s motive came from an email question that I received from a blog reader during the previous weekend. Jason wrote: “I am trying to perform a PDF search from Excel. Is that even possible?” So, in this post, I will try to answer this question. In general, there are two possible solutions to this problem (OK, maybe there are other solutions that I am not aware of), both of which have their advantages and disadvantages.

The FindText method

Syntax: object.FindText(text to find, case sensitive, whole words only, beginning)

Description: The FindText method returns true if the text was found or false if it was not. If the return value is true, it finds the specified text (the first instance), scrolls until the word is visible, and highlights it.

Here are the 4 arguments of this method:
Text to find: The text that is to be found.
Case sensitive: If true, the PDF search is case-sensitive. If false, it is case-insensitive.
Whole words only: If true, the PDF search matches only whole words. If false, it matches partial words.
Beginning: If true, the PDF search begins on the first page of the document. If false, it begins on the current page.

Pros: Useful when searching a text phrase in the PDF document (more than one word).
Cons: In some cases, it doesn’t work (it doesn’t highlight the text). Although it might be an easy and fast method, unfortunately, it is not 100% reliable.

 

The “JSO approach”

Unlike FindText, the JSO approach doesn’t use a “native method,” but, in reality, it is two loops, one inside the other. The idea is to loop through all the PDF document words and compare each word with the text we are searching for. If the comparison is true, the word is highlighted; otherwise, the next word proceeds. This solution’s name comes from the JavaScript Object (JSO) that performs all the hard work.

Pros: Useful when searching a SINGLE WORD in the PDF document (not a phrase). It’s quite a reliable method.
Cons: If you search two words, for example, in the PDF, it doesn’t find anything. In large PDFs, it might be considerably slower.

Unfortunately, there is no straight solution to this problem, but only a sort of compromise. You will either go with the unreliable FindText method or with the slow JSO approach (only if you are searching a single word). Needless to say that the VBA code that you will find below works ONLY with Adobe Professional. If you try to use it with Adobe Reader, you will get an error.

 


VBA code for PDF search


The FindTextInPDF macro uses the FindText method to find a text phrase inside a PDF document.

Option Explicit
 
Sub FindTextInPDF()
 
    '----------------------------------------------------------------------------------------
    'This macro can be used to find a specific TEXT (more than one word) in a PDF document.
    'The macro opens the PDF, finds the specified text (the first instance), scrolls so
    'that it is visible and highlights it.
    'The macro uses the FindText method (see the code below for more info).
 
    'Note that in some cases it doesn't work (doesn't highlight the text), so in those
    'cases prefer the SearchTextInPDF macro if you have only ONE WORD to find!
 
    'The code uses late binding, so no reference to an external library is required.
    'However, the code works ONLY with Adobe Professional, so don't try to use it with
    'Adobe Reader because you will get an "ActiveX component can't create object" error.
 
    'Written by:    Christos Samaras
    'Date:          04/05/2014
    'e-mail:        [email protected]
    'site:          http://www.myengineeringworld.net
    '----------------------------------------------------------------------------------------
 
    'Declaring the necessary variables.
    Dim TextToFind  As String
    Dim PDFPath     As String
    Dim App         As Object
    Dim AVDoc       As Object
 
    'Specify the text you want to search.
    'TextToFind = "Christos Samaras"
    'Using a range:
    TextToFind = ThisWorkbook.Sheets("PDF Search").Range("C5").Value
 
    'Specify the path of the sample PDF form.
    'Full path example:
    'PDFPath = "C:\Users\Christos\Desktop\How Software Companies Die.pdf"
    'Using workbook path:
    'PDFPath = ThisWorkbook.Path & "\" & "How Software Companies Die.pdf"
    'Using a range:
    PDFPath = ThisWorkbook.Sheets("PDF Search").Range("C7").Value
 
    'Check if the file exists.
    If Dir(PDFPath) = "" Then
        MsgBox "Cannot find the PDF file!" & vbCrLf & "Check the PDF path and retry.", _
                vbCritical, "File Path Error"
        Exit Sub
    End If
 
    'Check if the input file is a PDF file.
    If LCase(Right(PDFPath, 3)) <> "pdf" Then
        MsgBox "The input file is not a PDF file!", vbCritical, "File Type Error"
        Exit Sub
    End If
 
    On Error Resume Next
 
    'Initialize Acrobat by creating the App object.
    Set App = CreateObject("AcroExch.App")
 
    'Check if the object was created. In case of error release the object and exit.
    If Err.Number <> 0 Then
        MsgBox "Could not create the Adobe Application object!", vbCritical, "Object Error"
        Set App = Nothing
        Exit Sub
    End If
 
    'Create the AVDoc object.
    Set AVDoc = CreateObject("AcroExch.AVDoc")
 
    'Check if the object was created. In case of error release the objects and exit.
    If Err.Number <> 0 Then
        MsgBox "Could not create the AVDoc object!", vbCritical, "Object Error"
        Set AVDoc = Nothing
        Set App = Nothing
        Exit Sub
    End If
 
    On Error GoTo 0
 
    'Open the PDF file.
    If AVDoc.Open(PDFPath, "") = True Then
 
        'Open successful, bring the PDF document to the front.
        AVDoc.BringToFront
 
        'Use the FindText method in order to find and highlight the desired text.
        'The FindText method returns true if the text was found or false if it was not.
        'Here are the 4 arguments of the FindText methd:
        'Text to find:          The text that is to be found (in this example the TextToFind variable).
        'Case sensitive:        If true, the search is case-sensitive. If false, it is case-insensitive (in this example is True).
        'Whole words only:      If true, the search matches only whole words. If false, it matches partial words (in this example is True).
        'Search from 1st page:  If true, the search begins on the first page of the document. If false, it begins on the current page (in this example is False).
        If AVDoc.FindText(TextToFind, True, True, False) = False Then
 
            'Text was not found, close the PDF file without saving the changes.
            AVDoc.Close True
 
            'Close the Acrobat application.
            App.Exit
 
            'Release the objects.
            Set AVDoc = Nothing
            Set App = Nothing
 
            'Inform the user.
            MsgBox "The text '" & TextToFind & "' could not be found in the PDF file!", vbInformation, "Search Error"
 
        End If
 
    Else
 
        'Unable to open the PDF file, close the Acrobat application.
        App.Exit
 
        'Release the objects.
        Set AVDoc = Nothing
        Set App = Nothing
 
        'Inform the user.
        MsgBox "Could not open the PDF file!", vbCritical, "File error"
 
    End If
 
End Sub 

And here is the code for the second macro SearchWordInPDF, which uses the JSO approach.

Option Explicit
 
Sub SearchWordInPDF()
 
    '----------------------------------------------------------------------------------------
    'This macro can be used to find a specific WORD in a PDF document (one word ONLY -> in
    'case you search two words for example it doesn't find anything, just opens the file).
    'The macro opens the PDF, finds the first appearance of the specified word, scrolls
    'so that it is visible and highlights it.
 
    'The code uses late binding, so no reference to an external library is required.
    'However, the code works ONLY with Adobe Professional, so don't try to use it with
    'Adobe Reader because you will get an "ActiveX component can't create object" error.
 
    'Written by:    Christos Samaras
    'Date:          04/05/2014
    'e-mail:        [email protected]
    'site:          http://www.myengineeringworld.net
    '--------------------------------------------------------------------------------------
 
    'Declaring the necessary variables.
    Dim WordToFind  As String
    Dim PDFPath     As String
    Dim App         As Object
    Dim AVDoc       As Object
    Dim PDDoc       As Object
    Dim JSO         As Object
    Dim i           As Long
    Dim j           As Long
    Dim Word        As Variant
    Dim Result      As Integer
 
    'Specify the text you want to search.
    'WordToFind = "Engineering"
    'Using a range:
    WordToFind = ThisWorkbook.Sheets("PDF Search").Range("C12").Value
 
    'Specify the path of the sample PDF form.
    'Full path example:
    'PDFPath = "C:\Users\Christos\Desktop\How Software Companies Die.pdf"
    'Using workbook path:
    'PDFPath = ThisWorkbook.Path & "\" & "How Software Companies Die.pdf"
    'Using a range:
    PDFPath = ThisWorkbook.Sheets("PDF Search").Range("C14").Value
 
    'Check if the file exists.
    If Dir(PDFPath) = "" Then
        MsgBox "Cannot find the PDF file!" & vbCrLf & "Check the PDF path and retry.", _
                vbCritical, "File Path Error"
        Exit Sub
    End If
 
    'Check if the input file is a PDF file.
    If LCase(Right(PDFPath, 3)) <> "pdf" Then
        MsgBox "The input file is not a PDF file!", vbCritical, "File Type Error"
        Exit Sub
    End If
 
    On Error Resume Next
 
    'Initialize Acrobat by creating the App object.
    Set App = CreateObject("AcroExch.App")
 
    'Check if the object was created. In case of error release the objects and exit.
    If Err.Number <> 0 Then
        MsgBox "Could not create the Adobe Application object!", vbCritical, "Object Error"
        Set App = Nothing
        Exit Sub
    End If
 
    'Create the AVDoc object.
    Set AVDoc = CreateObject("AcroExch.AVDoc")
 
    'Check if the object was created. In case of error release the objects and exit.
    If Err.Number <> 0 Then
        MsgBox "Could not create the AVDoc object!", vbCritical, "Object Error"
        Set AVDoc = Nothing
        Set App = Nothing
        Exit Sub
    End If
 
    On Error GoTo 0
 
    'Open the PDF file.
    If AVDoc.Open(PDFPath, "") = True Then
 
        'Open successful, bring the PDF document to the front.
        AVDoc.BringToFront
 
        'Set the PDDoc object.
        Set PDDoc = AVDoc.GetPDDoc
 
        'Set the JS Object - Java Script Object.
        Set JSO = PDDoc.GetJSObject
 
        'Search for the word.
        If Not JSO Is Nothing Then
 
            'Loop through all the pages of the PDF.
            For i = 0 To JSO.numPages - 1
 
                'Loop through all the words of each page.
                For j = 0 To JSO.getPageNumWords(i) - 1
 
                    'Get a single word.
                    Word = JSO.getPageNthWord(i, j)
 
                    'If the word is string...
                    If VarType(Word) = vbString Then
 
                        'Compare the word with the text to be found.
                        Result = StrComp(Word, WordToFind, vbTextCompare)
 
                        'If both strings are the same.
                        If Result = 0 Then
                            'Select the word and exit.
                            Call JSO.selectPageNthWord(i, j)
                            Exit Sub
                        End If
 
                    End If
 
                Next j
 
            Next i
 
            'Word was not found, close the PDF file without saving the changes.
            AVDoc.Close True
 
            'Close the Acrobat application.
            App.Exit
 
            'Release the objects.
            Set JSO = Nothing
            Set PDDoc = Nothing
            Set AVDoc = Nothing
            Set App = Nothing
 
            'Inform the user.
            MsgBox "The word '" & WordToFind & "' could not be found in the PDF file!", vbInformation, "Search Error"
 
        End If
 
    Else
 
        'Unable to open the PDF file, close the Acrobat application.
        App.Exit
 
        'Release the objects.
        Set AVDoc = Nothing
        Set App = Nothing
 
        'Inform the user.
        MsgBox "Could not open the PDF file!", vbCritical, "File error"
 
    End If
 
End Sub 

Both macros were tested using a PDF file that was created based on this article.

 


Downloads


Download

The zip file contains an Excel file and a sample PDF file. The Excel file can be opened with Excel 2007 or newer. Please enable macros before using it.

Page last modified: March 21, 2021

Christos Samaras

Hi, I am Christos, a Mechanical Engineer by profession (Ph.D.) and a Software Developer by obsession (10+ years of experience)! I founded this site back in 2011 intending to provide solutions to various engineering and programming problems.

Christos E. Samaras

  • Hi, Linda,

    What kind of error do you get?
    Did you download the sample file and didn’t work?
    Also, do you have Adobe Professional installed on your computer?

    Best Regards,
    Christos

  • I am getting an error. I am trying to run this as a macro in Excel, I get invalid outside procedure. How do I run this?

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}
    Add Content Block
    >