Fetching a part of a string using regex starting from a given word untill the end of the string(one that includes newlines)

import re

stri = "Hello guys.\nHow are you doing?\nI hope you have sweet dreams tonight."
regex = re.compile("guys[.\n]*$")

print regex.findall(stri)

I know . in regex can be any character except a newline, [xy] means either x or y, * after a character means any number of that character and $ denotes the end of the string. So why does "guys[.\n]*$" not give me "guys.\nHow are you doing?\nI hope you have sweet dreams tonight."?


ANSWERS:


You put the period in a character class, where it is the one place it will just match a . character and nothing else. [...] means any literal character contained in that class.

You want to use the re.DOTALL configuration constant instead:

regex = re.compile("guys.*$", re.DOTALL)

Alternatively, you should keep the . outside of the character class and use it in a group with the \n newline:

regex = re.compile("guys(?:.|\n)*$")

Demo:

>>> import re
>>> stri = "Hello guys.\nHow are you doing?\nI hope you have sweet dreams tonight."
>>> regex = re.compile("guys.*$", re.DOTALL)
>>> print regex.findall(stri)
['guys.\nHow are you doing?\nI hope you have sweet dreams tonight.']

Martijn's answer gives a good explanation for the behavior you are seeing. As an alternative to the re.DOTALL or (?:.\n) option, you can use something like the following:

regex = re.compile(r"guys[\s\S]*$")

Since \s means "all whitespace" and \S means "anything but whitespace", putting them in a character class together will allow to match any character including newlines.


use re.MULLTILINE and you should match over lines...

>>> regex = re.compile("guys.*",re.DOTALL|re.MULTILINE)
>>> regex.findall(stri)
['guys.\nHow are you doing?\nI hope you have sweet dreams tonight.']

/EDIT :As martjin points out I am wrong about multi-line

>>> regex = re.compile("guys.*",re.DOTALL)
>>> regex.findall(stri)
['guys.\nHow are you doing?\nI hope you have sweet dreams tonight.']


 MORE:


 ? how to extract portion of string in RegEx
 ? how to extract part of string in RegEx
 ? Python regex to extract substring at start and end of string
 ? Regex to extract parts of string
 ? How to extract this part of the string with regex
 ? regex to extract a portion of a string
 ? How to match start or end of given string using regex in java
 ? Regex starts and/or ends with string - how to simplify?
 ? Using Fabric on cascated host environment
 ? Fabric run() method handle only twice