Quite often when I am doing some online marketing I receive reports in PDF format containing links and I need to somehow extract the links from the pdf in order to submit those links to various indexing services.
I used to pay an outsourcer $10 each time, but now I do it myself and it takes less than 60 seconds.
Here is an example of a PDF I will receive which contains links to all my press releases and as you can see they are not really in any friendly copyable format.
Here are the steps I use to extract a list of all the links in the PDF using free online tools.
STEP 1: Convert the PDF to HTML
a. Go to: http://www.pdfonline.com/pdf-to-word-converter/
b. Upload your PDF file (the conversion process will start automatically)
c. After file is converted click the “download” button in the header.
d. Select the option “Download HTML file” and save to your computer.
STEP 2: Extract URLS from HTML file
a. Go to: http://eel.surf7.net.my/
b. Find the html file you downloaded to your computer in step 1.
c. Open the html file in your web browser.
d. Select “view source” so you can copy and paste the source code.
e. Copy all the source code and paste it into the form at surf7
f. Follow the steps 1-5 below.
Here is a screenshot of the form on surf7 where you paste your code.
1. Paste your html code here
2. Select “New Line” so that each url is placed on a new line
3. Select URL as the type of address to extract
4. Click Extract button
5. All the url’s from the html file will be outputted here.
You can then take that list of url’s and do with them what you wish.