Tuesday, 19 September 2017

How to Extract Email Addresses, Phone Numbers, and Links From Text

You've got an email or letter with phone numbers, email addresses, or website links throughout the text—and you'd like to get a list of each of those items on their own. Copy and paste is tedious and time consuming—and, anyway, aren't computers designed to do tasks like that for us?

They are. All you need is a bit of Regex—or regular expression—code and a text editor, and you can pull out the data you want from your text to paste into another app. Or, if you want to automatically extract text from any of several hundreds of apps to send to another app, you can do that too, with Formatter by Zapier. Here's how to use Regex in popular text editors or Formatter in popular web apps.



How to Extract Text with Regex

Regex script in Regex texter, extracting phone numbers
Regex scripts look like long strings of random text, but they can be the most powerful way to find any text you want

You're likely familiar with the search tool built into most apps on your computer. Press Control+F or Command+F, type in the word you want to find, and the app will highlight every time that word shows up in your text. For example, if you're looking for the number "47" in the sentence "I bought 47 apples," your program's Find tool would highlight the number 47 in that sentence.

What if you instead wanted to find any number in your text? Perhaps your sentence now says "I bought 47 apples and 23 eggs" and you'd like a list of the numbers. Regex—or REGular EXpressions—are what you'll use. Regex lets you tell the computer what type of text you're looking for, using its own syntax. Say we want to find any number. We'd do a regex search for [0-9]—that will search for anything containing at least one numeral (digits between 0 and 9). Want to find any number or the letter "a"? [0-9]|a would do the trick, as regex uses the pipe | character to mean or.

So, if you're looking for email addresses, you could just search for @ with the normal Find tool to highlight every email address—along with anything that includes an @ symbol, though few things other than email addresses do. A detailed regex script, though, could do better. It could find all the characters around the "@" symbol and select the full email address. And then, with the tools in popular text editor apps, you could copy each email address out of your text.

Regex is geeky—but it can actually be easy to use, with regex tools in popular apps along with pre-made regex scripts. First, let's check some quick regex scripts to extract links, emails, and phone numbers, then learn how to use regex in popular text editing programs Sublime Text, Notepad++, and BBEdit:

Regex Scripts to Extract Data

Before you can extract text in your apps, you'll need some regex scripts to use. Here are three scripts we've tested extensively to extract website links, emails, and phone numbers from large blocks of text. Each works with as wide a range of results as possible—and all work in each of the text editors mentioned here. Although they might look like intimidating gobbledygook, all you have to do to use them is copy and paste into the text editor's search commands.

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
(?:(?:\+?([1-9]|[0-9][0-9]|[0-9][0-9][0-9])\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([0-9][1-9]|[0-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})(?:\s*(?:#|x\.?|ext\.?|extension)\s*(\d+))?

Just use those Regex scripts in your text editor's Find tool, and they should find all the links, emails, and phone numbers in your text. Then use the steps above to copy each of those into their own list.

Now, let's learn how to use regex in Sublime Text, Notepad++, BBEdit, and Google Sheets:

How to Use Regex in Sublime Text (Windows, Mac, Linux)

Use regex in Sublime Text to find text, then use the Find All tool to copy every result at once
A regex find function in Sublime Text

Cross-platform text editor Sublime Text is one of the easiest ways to extract text with regex through its built-in Find all tool.

In the text document that you want to extract specific text from, press Control+F or Command+F to open the search bar. Click the * icon on the far right to enable regex mode, then type or paste in your regex script. Now, click Find All, and Sublime Text will highlight and select every instance of your text it finds.

And here are our extracted website links

Want to extract that text and get it in its own list? Just press Control+F or Command+F again, then make a new document and paste your results for a list of each of the things you extracted.

Sublime Text Price: Free to evaluate; $70 per user license

How to Regex in Notepad++ (Windows)

Use regex to find and replace text in Notepad++
In Notepad++, you'll use the Replace function to put each result on its own line

Free Windows text editor Notepad++ has a regex option in its Find tool as well—but it doesn't let you copy text the same way Sublime Text does. Instead, we'll use it to put each result on its own line, bookmark those lines, and then copy those bookmarked lines by themselves.

Here's how it works. Just type or paste your text into Notepad++, and press Control+F to open the search tool. Click the Replace tab, then type or paste your regex script in the Find what: box. Under that, type the following in the Replace with: box to put each result on its own line:

\n\1\n

Now, click the Regular expression bullet point in the bottom left corner, then click the Replace All button. That should get every one of your regex search results on their own line.

Bookmark all regex results in Notepad++

To copy just your regex results, you'll need to do two more things. First, click the Mark box in the Find window, select Bookmark line in the options, and click Mark All. That will put a red bullet beside each of the lines with your your regex results.

Finally, click the search menu, and select Bookmark -> Copy Bookmarked Lines. Open a new document and paste the text, and you'll have a list of just the text you wanted to find via regex.

Notepad++ Price: Free open-source download

How to Use Regex in BBEdit (Mac)

BBEdit lets you extract the text regex finds in one click
Use the Extract option in BBEdit to copy your regex results to a new document

Perhaps the easiest way to extract text with regex is using the Mac text editor BBEdit. Just enter your text in Regex, press Command+F to open the Find window, and enter your regex script in the Find box. Check the Grep option in the bottom of the page to run the regex script (which, in BBEdit, is powered by the terminal app Grep, yet another way you could extract text via regex).

Now, click the Extract button on the right, and BBEdit will make a new text file and add each of your extracted items to the document. It's the quickest way to extract text with regex.

BBEdit Price: Free evaluation; $49.99 license per user

How to Use Regex in Google Sheets (Web)

use Regex in Google Sheets to extract the first result from your spreadsheet

If you only need one regex result, Google Sheet' =regextract function lets you use regex inside your spreadsheet to find the first matching result. Just enter =regextract(, then type in the text you want to search through or select the correct cell, add a comma, then enter your regex script in quotes, and add a closing parenthesis on the end. Google Sheets will then extract the first match from your text—and if your regex script includes sections such as the script pictured which checks for every part of a phone number, Google Sheets will split the result out into one cell per section.

Google Docs regex

The same thing works in Google Docs, though there's no easy way to copy out all the results. Press Control+F or Command+F to open the search dialog, tap the three-dot icon open the full search dialog, add your regex script in the Find dialog, and check the Match using regular expressions box. That will let you find each item that matches your regex query—though you'll have to copy each result manually to extract them from your document.

Google Sheets and Docs Price: Free for personal use; from $5/month G Suite Basic plan for business use

Extract Email Addresses, Phone Numbers, and Links Online

Extract phone numbers, email addresses, and links with tools like ConvertCSV
ConvertCSV can extract your text online

Want something simpler? There are a number of simple web apps that can extract the text you need with a few clicks. The most versatile of the apps we tested is ConvertCSV.com. It can extract email addresses, links, and phone numbers—though it doesn't recognize as many variations as the regex scripts above. And, it can convert your spreadsheet files to different formats if you need, too.

Here are some of the best free simple tools to extract text online:

Extract Email Addresses, Phone Numbers, and Links Automatically with Zapier

Extract text with Zapier formatter
Zapier Formatter can automatically extract emails, links, and numbers anytime something new is added to your apps.

Regex works great when you have a long document with emails and links and numbers, and you need to extract them all. But, far more often, you'll need to extract text from one thing and use it directly in another app.

For example, say someone emails you with a link—and you'd like to automatically add that link to Pocket so you can read it later. Or perhaps you save your contact info in Evernote notes, and you want to pull out the email address and send an automated email to your new contacts.

Zapier's Formatter tool can help. Zapier is an app automation tool that connects over 750 apps so whenever something happens in one app, Zapier can start a chain reaction, copying your text into other apps to add contacts, start projects, send emails, and more. And Zapier's Formatter can, among other things, extract text so you get exactly what you want from your apps.

Zapier gmail

Here's how it works. First, you'll make a new Zap and select the app you want to trigger—or start—the workflow. We'll choose Gmail here, to watch for links in new emails.

Zapier extract URL with formatter

Then, add a Formatter step and choose the Text action. Select the Extract URL transform to find the link in the email, and click the + button beside the Input field and select the Body Plain field to have Zapier find a link in the email text. Test that step, and Zapier will find that first link from the email body text.

Add extracted link to Pocket with Zapier

Finally, add an action app to your Zap. We'll choose Pocket here. Select Save for Later, then click the + button beside the URL field and select the link from the Formatter step.

VoilĂ ! Now anytime you get an email with a link, Zapier will add it to your Pocket reading list automatically.

Here's a pre-made Zap to try it out with Gmail and Google Contacts—Zapier's Formatter can search your email text for a phone number in the email or perhaps its signature, and then add that to the new contact along with their name and email:


Now, go make your own automations. Zapier Formatter's extract tools are a powerful way to find just want you need from your text and then use it in other apps. Whether you need to copy contact info, financial data, website links, and more, Formatter can help—automatically and instantly.

Or, if you want to extract text in bulk on a one-time basis, regex is your best new friend.



source https://zapier.com/blog/extract-links-email-phone-regex/

No comments:

Post a Comment