AWK
Utilisation de GNU awk ou mawk:
$ awk '$1~"^"word{printf("--\n%s",$0)}' word='are' RS='--\n' infile
--
are you happy
--
are(you hungry
too
Ceci définit le mot variable sur le mot correspondant au début de l'enregistrement et RS (séparateur d'enregistrement) sur '-' suivi d'une nouvelle ligne \n
. Ensuite, pour tout enregistrement commençant par le mot à rechercher ( $1~"^"word
), imprimez un enregistrement mis en forme. Le format est un "-" de départ avec une nouvelle ligne avec l'enregistrement exact trouvé.
GREP
Utiliser (GNU pour l’ -z
option) grep:
grep -Pz -- '--\nare(?:[^\n]*\n)+?(?=--|\Z)' infile
grep -Pz -- '(?s)--\nare.*?(?=\n--|\Z)\n' infile
grep -Pz -- '(?s)--\nare(?:(?!\n--).)*\n' infile
Description (s) Pour les descriptions suivantes, l'option PCRE (?x)
est utilisée pour ajouter (beaucoup) des explications de commentaires (et d'espaces) en ligne avec l'expression rationnelle réelle (de travail). Si les commentaires (et la plupart des espaces) (jusqu'à la nouvelle ligne) sont supprimés, la chaîne résultante reste la même expression régulière. Cela permet de décrire la regex en détail dans le code de travail. Cela facilite grandement la maintenance du code.
Option 1 regex (?x)--\nare(?:[^\n]*\n)+?(?=--|\Z)
(?x) # match the remainder of the pattern with the following
# effective flags: x
# x modifier: extended. Spaces and text after a #
# in the pattern are ignored
-- # matches the characters -- literally (case sensitive)
\n # matches a line-feed (newline) character (ASCII 10)
are # matches the characters are literally (case sensitive)
(?: # Non-Capturing Group (?:[^\n]*\n)+?
[^\n] # matches non-newline characters
* # Quantifier — Matches between zero and unlimited times, as
# many times as possible, giving back as needed (greedy)
\n # matches a line-feed (newline) character (ASCII 10)
) # Close the Non-Capturing Group
+? # Quantifier — Matches between one and unlimited times, as
# few times as possible, expanding as needed (lazy)
# A repeated capturing group will only capture the last iteration.
# Put a capturing group around the repeated group to capture all
# iterations or use a non-capturing group instead if you're not
# interested in the data
(?= # Positive Lookahead (?=--|\Z)
# Assert that the Regex below matches
# 1st Alternative --
-- # matches the characters -- literally (case sensitive)
| # 2nd Alternative \Z
\Z # \Z asserts position at the end of the string, or before
# the line terminator right at the end of the
# string (if any)
) # Closing the lookahead.
Option 2 regex (?sx)--\nare.*?(?=\n--|\Z)\n
(?sx) # match the remainder of the pattern with the following eff. flags: sx
# s modifier: single line. Dot matches newline characters
# x modifier: extended. Spaces and text after a # in
# the pattern are ignored
-- # matches the characters -- literally (case sensitive)
\n # matches a line-feed (newline) character (ASCII 10)
are # matches the characters are literally (case sensitive)
.*? # matches any character
# Quantifier — Matches between zero and unlimited times,
# as few times as possible, expanding as needed (lazy).
(?= # Positive Lookahead (?=\n--|\Z)
# Assert that the Regex below matches
# 1st Alternative \n--
\n # matches a line-feed (newline) character (ASCII 10)
-- # matches the characters -- literally.
| # 2nd Alternative \Z
\Z # \Z asserts position at the end of the string, or
# before the line terminator right at
# the end of the string (if any)
) # Close the lookahead parenthesis.
\n # matches a line-feed (newline) character (ASCII 10)
Option 3 regex (?xs)--\nare(?:(?!\n--).)*\n
(?xs) # match the remainder of the pattern with the following eff. flags: xs
# modifier x : extended. Spaces and text after a # in are ignored
# modifier s : single line. Dot matches newline characters
-- # matches the characters -- literally (case sensitive)
\n # matches a line-feed (newline) character (ASCII 10)
are # matches the characters are literally (case sensitive)
(?: # Non-capturing group (?:(?!\n--).)
(?! # Negative Lookahead (?!\n--)
# Assert that the Regex below does not match
\n # matches a line-feed (newline) character (ASCII 10)
-- # matches the characters -- literally
) # Close Negative lookahead
. # matches any character
) # Close the Non-Capturing group.
* # Quantifier — Matches between zero and unlimited times, as many
# times as possible, giving back as needed (greedy)
\n # matches a line-feed (newline) character (ASCII 10)
sed
$ sed -nEe 'bend
:start ;N;/^--\nare/!b
:loop ;/^--$/!{p;n;bloop}
:end ;/^--$/bstart' infile