| RE | Description |
|---|---|
| . | Any character except new line |
| \d | Digit (0 - 9) |
| \D | Not a Digit |
| \w | Word character (a-z, A-Z, 0-9, _) |
| \W | Not a word character |
| \s | Whitespace (space, tab, newline) |
| \S | Not whitespace |
| \b | Word boundary #\bha means space or nothing before 'ha' |
| \B | Not a word boundary |
| ^ | Beginning of a string |
| $ | End of a string |
| [] | Matches characters in bracket, no need for inside in case of escape characters,[1-7] == [1234567] != [-17] |
[^ ] |
Matches characters not in bracket, [^a-c] == all letters except a,b,c |
| | | Either Or |
| () | Group |
| Quantifiers | |
| * | 0 or more |
| + | 1 or more |
| ? | 0 or 1 |
| {3} | Exact number |
| {3, 4} | Range of numbers (minimum, maximum) |
| Meta Characters | |
| .[{()^$?*+ | need to be escaped by (back slash) |
| Text | Regular Expression |
|---|---|
| abcdefghijklmnopqurtuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 1234567890 |
|
| Ha HaHa |
\bHa (using word boundary) |
| abhishekpathak.com | \w+\.com |
| 321-555-4321 123.555.1234 |
\b{3}[-.]\b\{3}[-.]\b{4} |
| Mr. Schafer Mr Smith Ms David Mrs. Robinson Mr. T |
M(r\|s\|rs).?\s[A-Z]\w* |
| cat mat pat bat |
[^b]at |
| CoreyMSchafer@gmail.com corey.schafer@university.edu corey-321-schafer@my-work.net |
[a-zA-Z0-9.-]+@[a-zA-Z-]+.(com\|edu\|net) |
| All kind of email addresses | [a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+ |
```python
raw string: print("\tTab"): ____Tab ; print(r"\tTab"): \tTab¶
import re search_in_text = """ abcdefghijklmnopqurtuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 1234567890 """ pattern = re.compile(r'regular_expression_here') searches = pattern.finditer(search_in_text) for search in searches: print(search)```