= regular expressions: a guided tour :title: regular expressions: a guided tour :author: der.hans :copyright: 2007-2017 der.hans --- CC BY-SA 4.0 unported :date: 2018Mar10 @ SCaLE :max-width: 60em :website: https://www.LuftHans.com/talks/ :source-highlighter: pygments :data-uri: :imagesdir: resources //// Abstract Example driven introduction to regular expressions. The talk uses plain english to explain regular expression concepts, syntax and language. Common tools such as grep, sed and awk will provide conduits for demonstrating regular expressions. Many of our favorite system administration tools use regular expressions for text matching. While using plain english to describe regular expressions the example driven introduction will explain regular expression concepts, syntax and language. Examples will include common tools such as grep, sed and awk. //// == Upcoming Hans LibrePlanet at MIT in Boston "Device and Personal Privacy Technology Roundup" Sunday, 2018Mar25 @ 11:50 https://libreplanet.org/2018/speakers/#hans == What R RegEx? Short: Regular Expressions (RegEx) are sequences of characters that define a matching pattern using a specialized language. Medium: RegEx define patterns that describe sets of strings. They are used by many common *NIX tools such as grep and sed. They can also be used in many programming languages such as Perl, PHP and Python. Database query languages also likely support RegEx. Created in the 1950s. Popularized by *NIX :) == What R ! RegEx? * Regular Expressions are not globs * Both use similar characters for pattern matching * Globs are evaluated by the shell *before* the command is run * Regular Expression are evaluated by the command * Use quotes to protect RegEx from accidental globbification * Globbing is *mostly* for filename matching * RegEx is *mostly* for everything else * RegEx is not explicitly limited to one line, but most tools do single-line matching by default == RegEx Examples ---- sed -re -i 's/ *$//' script.sh ---- ---- ip addr list | grep -E '(([12]{,1}[[:digit:]]{1,2})\.){3}([12]{,1}[[:digit:]]{1,2})' ---- ---- ip addr list | grep -E '([[:lower:][:digit:]]{2}:){5}[[:lower:][:digit:]]{2}' ---- == Meet Star /| / | /__|______ | | | Star | | __ __ | | | || | | | |__||__| | | __ __()| -------------- | | || | | < Hi, I'm Star > | | || | | -------------- | |__||__| | / | | / |__________| * == Star is a Helpful Neighbor `*` == zero or more of the *previous character* Star is a modifier acting on whatever comes before it x* == zero or more x y* == zero or more y ---- sed -re -i 's/ *$//' script.sh ---- == Star grep Examples ---- $ # the same as "grep x file.txt" $ grep -E 'xy*' file.txt ---- ---- $ # find at least one x, still the same as "grep x file.txt" $ grep -E 'xx*' file.txt ---- ---- $ # use grep to cat the file $ grep -E 'x*' file.txt ---- ---- $ # sloppily also look for British spelling $ grep -E 'colou*r' file.txt ---- == Star sed Examples ---- $ # search for the first zero or more r, then replace $ echo fred | sed -re 's/r*/x/' xfred ---- ---- $ # search for all zero or more r, then replace $ echo fred | sed -re 's/r*/x/g' xfxexdx ---- ---- $ # search for all zero or more r, then replace $ echo anke | sed -re 's/r*/x/g' xaxnxkxex ---- == Meet Plus /| / | /__|______ | Plus | | __ __ | | | || | | | | || | | | |__||__| | ----------- | __ __()| < hi I'm Plus> | | || | | ----------- | | || | | / | |__||__| | / |__________| + We've moved into a fancy neighborhood now! == Plus is a Neighbor That Counts ( to at least one ) `+` == one or more of the *previous character* Plus is a modifier acting on whatever comes before it x+ == one or more x y+ == one or more y ---- sed -re -i 's/ +$//' script.sh ---- == Plus grep Examples ---- $ # search for x followed by at least one y $ grep -E 'xy+' file.txt ---- ---- $ # find at least one x, still the same as "grep x file.txt" $ grep -E 'x+' file.txt ---- ---- $ # sloppily look for only British spelling $ grep -E 'colou+r' file.txt ---- == Plus sed Examples ---- $ # search and replace the first one or more r $ echo fred | sed -re 's/r+/x/' fxed ---- ---- $ # search and replace all one or more r $ echo fred | sed -re 's/r+/x/g' fxed ---- ---- $ # search and replace all one or more r $ echo anke | sed -re 's/r+/x/g' anke ---- == RegEx Variants There are multiple RegEx languages Extended RegEx - man 7 regex Basic RegEx - man 7 regex Perl Compatible Regex ( PCRE ) - man perlre Fred's House of RegEx ( FHRegEx: pronounced fregex ) == RegEx Variant Usage For command line and *NIX tools use extended where possible If extended not available, check man page :) For programming languages use PCRE or native matching == Symbols thus far * Star works for basic, extended and PCRE * Plus works for extended and PCRE, but not basic == PCRE Star Catchup ---- # the same as "grep x file.txt" $ grep -P 'xy*' file.txt ---- ---- # find at least one x, still the same as "grep x file.txt" $ grep -P 'xx*' file.txt ---- ---- # use grep to cat the file $ grep -P 'x*' file.txt ---- ---- # sloppily also look for British spelling $ grep -P 'colou*r' file.txt ---- Same as before == PCRE Plus Catchup ---- # search for x followed by at least one y $ grep -P 'xy+' file.txt ---- ---- # find at least one x, still the same as "grep x file.txt" $ grep -P 'x+' file.txt ---- ---- # sloppily look for British spelling $ grep -P 'colou+r' file.txt ---- Same as before == Meet Dot /| / | /__|______ | Dot | | __ __ | | | || | | | | || | | | |__||__| | ---------- | __ __()| < hi I'm Dot> | | || | | ---------- | | || | | / | |__||__| | / |__________| . == Still Single After All These Years `.` == any single character // the highly technical term for a period Dot is a wild card Dot matches any single character except line breaks Plus and Star match whatever comes before them, dot matches in place x.+ == x followed by one or more characters y.+ == y followed by one or more characters Works the same in extended, PCRE and basic RegEx == Dot grep Examples ---- $ # find at least one x, still the same as "grep x file.txt" $ grep -E 'x.*' file.txt ---- ---- $ # search for x followed by at least one other character $ grep -E 'x.+' file.txt ---- ---- $ # find Fred-based names ( Freddy, Fredericka, etc. ), but not Fred $ grep -E 'Fred.+' names.txt ---- == Dot sed Examples ---- $ # replace r and all chars after it with x $ echo fred | sed -re 's/r.+/x/' fx ---- ---- $ # replace r and all chars before it with x $ echo fred | sed -re 's/.+r/x/' xed ---- ---- $ # replace f followed by any 2 characters with x $ echo fred | sed -re 's/f../x/' xd ---- == Dot Notes Repeated Dot ( .., .* or .+ ) doesn't require matches to be the same character Plus and Star are greedy and will match everything they can Plus and Star combined with Dot matches everything ---- $ # show all lines in the file $ grep '.*' file.txt ---- ---- $ # show all lines in the file that have at least one character $ grep '.+' file.txt ---- == Dot vs. ! Dot Unless escaped, a period is a dot ---- grep -i '2018.*.jpg' /var/mail/account # also matches 2018_fred_jpg.png ---- Use '\.' to match a period ---- grep -i '2018.*\.jpg' /var/mail/account # require a period ---- == Meet Single Character Quote /| / | /__|______ | SCQ | | __ __ | | | || | | | | || | | | |__||__| | ---------- | __ __()| < hi I'm SCQ> | | || | | ---------- | | || | | / | |__||__| | / |__________| \ == Help Others Shine Through Backslash quotes whatever comes right after it `\` == quote the next character, which won't be interpreted as special character `\.` == period, not dot ---- # find files that end in '.jpg' $ find ~/Images/ | grep '\.jpg$' ---- ---- $ # find lines that have a plus symbol in them $ grep '\+' math.txt ---- == Collection Discount /| / | /__|______ |Collection| | __ __ | | | || | | | | || | | | |__||__| | ------------------- | __ __()| < hi I'm Collection > | | || | | ------------------- | | || | | / | |__||__| | / |__________| [ ] == Collections Are Square Surround the collections with square brackets, aka bracket expression `[aeiou]` == any lower case English full vowel ---- $ echo abcdefhij | sed -re 's/[aeiou]/./g' .bcd.fh.j ---- ---- $ echo abcdefhij | sed -re 's/[a1b2c3]/./g' ...defhij ---- == Home on the Range /| / | /__|______ | Range | | __ __ | | | || | | | | || | | | |__||__| | ------------------ | __ __()| < hi I make a range> | | || | | ------------------ | | || | | / | |__||__| | / |__________| - == Ranges in Collections A range can be specified inside a collection ---- $ echo abcdefhij | sed -re 's/[a-e]/./g' .....fhij ---- ---- echo 1234567890 | sed -re 's/[1-9]/./g' .........0 ---- == Build Some Character Classes /| / | /__|______ |Char Class| | __ __ | | | || | | | | || | | | |__||__| | ----------------------- | __ __()| < hi I'm Character Class> | | || | | ----------------------- | | || | | / | |__||__| | / |__________| [: :] == Nethack Builds Character .Not these character classes ---- $ echo abcdefhij | sed -re 's/[[:ranger:][:mage:][:thief:]]/./g' sed: -e expression #1, char 18: Invalid character class name ---- == Character Builds Collection Character classes can be used inside collections ---- $ echo abcdefhij | sed -re 's/[[:alpha:]]/./g' ......... ---- ---- $ echo CiHyFr82oap3 | sed -re 's/[[:lower:]]/./g' C.H.F.82...3 ---- ---- $ echo CiHyFr82oap3 | sed -re 's/[[:digit:]]/./g' CiHyFr..oap. ---- ---- $ echo CiHyFr82oap3 | sed -re 's/[[:alnum:]]/./g' ............ ---- == Earlier Examples ---- $ ip addr list | grep -E '[12]{,1}[[:digit:]]{1,2}\.' inet 127.0.0.1/8 scope host lo inet 10.0.136.18/21 brd 10.0.143.255 scope global dynamic wlan0 $ ---- ---- $ ip addr list | grep -E '([12]{,1}[[:digit:]]{1,2}\.[12]{,1}[[:digit:]]{,2}\.[12]{,1}[[:digit:]]{1,2}\.[12]{,1}[[:digit:]]{1,2})' ---- ---- $ ip addr list | grep -E '(([12]{,1}[[:digit:]]{1,2})\.){3}([12]{,1}[[:digit:]]{1,2})' ---- == Cast of Characters * Some character classes `[:alpha:]` == localized alphabet `[:digit:]` == 0-9 `[:alnum:]` == localized alphabet and 0-9 `[:blank:]` == space, tab `[:punct:]` == any printable character which is not a blank or an alnum `[:cntrl:]` == control character ---- $ man 7 regex ---- == Character is One * The character class is only one part of the collection ---- $ echo CiHyFr82oap3 | sed -re 's/[CiH[:digit:]]/./g' ...yFr..oap. ---- ---- echo CiHyFr82oap3 | sed -re 's/[C[:lower:][:digit:]]/./g' ..H.F....... ---- == Not Your Parent's RegEx `^` at the beginning of a collection means 'not' ---- [^a] ---- ---- $ # find Fred-based names ( Freddy, Fredericka, etc. ), but not Fred $ grep -E 'Fred[^ ]+' names.txt ---- ---- $ # find Fred-based names ( Freddy, Fredericka, etc. ), but not Fred $ grep -E 'Fred[^[:blank:]]+' names.txt ---- == Or Branching allows matching this or the other `|` == branch ---- $ echo fred | grep -E 'fred|anke' fred $ echo anke | grep -E 'fred|anke' anke ---- == Group Discount A group can compartmentalize matches for future reference, aka atom ---- $ echo fred | sed -re 's/(.*)/\1 \1 \1/' fred fred fred ---- ---- $ echo fred anke | sed -re 's/(.*) (.*)/\2 \1/' anke fred ---- == Anchors Aweigh * A pattern can be matched to one end or the other * Caret and Dollar `^` == beginning of line when outside a collection at the beginning of the RegEx `$` == end of line when outside a collection at the end of the RegEx `^$` == empty line `^[[:blank:]]*$` == empty line or line with just space characters == Rooting out Root ---- $ grep -E root /etc/passwd root:x:0:0:root:/root:/bin/bash ---- ---- $ grep -Ec bin /etc/passwd 46 ---- ---- $ grep -E ^bin /etc/passwd bin:x:2:2:bin:/bin:/usr/sbin/nologin ---- == Max and Minitz Use curly braces and a comma to match minimun or maximum number of times ---- $ echo ddd | sed -re 's/d{1,2}/q/' qd ---- ---- $ echo ddd | sed -re 's/d{1}/q/' qdd ---- ---- $ echo ddd | sed -re 's/d{1,}/q/' q ---- ---- $ # less sloppily also look for British spelling $ grep -E 'colou{,1}r' file.txt ---- == ASCII GNU ---- ,-----._ . . ,' `-.__,------._ // __\\' `-. (( _____-'___)) | `:='/ (alf_/ | `.=| |=' | |) O | \ | | /\ \ | / . / \ \ | .-..__ ___ .--' \ |\ \ | |o o | ``--.___. / `-' \ \\ \ | `--'' ' .' / / | | | | \ | | / / | | | mmm | || | | /| | ( .' \ \ || | | | | \ \ // / / | | \ \ || |_| / | |_/ /_| /__/ ---- == Contacting Hans Thank you! * https://mastodon.rocks/@lufthans ** Mastodon * https://gnusocial.de/lufthans ** GNU Social * https://identi.ca/lufthans ** identi.ca * https://plus.google.com/106398898073454924098 ** G+ * LuftHans on Freenode, usually in #LOPSA ** IRC == Resources * https://en.wikipedia.org/wiki/Regular_expression#History - RegEx creation * https://regexr.com/ * http://www.regular-expressions.info/refcapture.html == Credits * Gopher With Antlers @2018 Brian Cluff * http://ascii.co.uk/art/doors - ASCII doors * ASCII gnu from cowsay //// sed -re "s/$( date +%Y -d '1 years ago' )/$( date +%Y )/" echo | openssl s_client -connect ${REMHOST}:${REMPORT} 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' sed -i -re "s/[^[:blank:].\/]*\.$exampleProfile/$randomString.$newProfile/g" -e "s/$exampleProfile/$newProfile/" $ffdir/$randomString.$newProfile/prefs.js num=$( sed -rne 's/^\[Profile([[:digit:]]*)\]$/\1/p' $ffinifile | sort -n | tail -1 ) mac_addy=$( virsh dumpxml $domain | grep 'mac address' | sed -re "s/^[^:]*'//" -e "s/'.*$//" ) ip_addy=$( grep "$mac_addy" /var/log/syslog | grep DHCPACK | tail -1 | sed -re 's/.*DHCPACK\(.*\)[[:blank:]]*//' -e 's/ .*//' ) arp -an | grep "$mac_addy" | sed -re 's/.*\((.*)\).*/\1/' Antwort=$( find "$p1" -type f | grep "${file##*/}" | while read Akte ls -la ${filename} | grep ^-r--r--r-- cat /etc/passwd | grep -E '^root' ////