by Sara Jakša
amount:
- Total\s*:\s*\$([\d\.\,]+)
- Total amount\s*:\s*\$([\d\.\,]+)
date:
- (\d+?[st|nd|rd|th]* [A-Z][a-z][a-z], \d\d\d\d)
- (\d\d\/\d\d\/\d\d\d\d)
- ([A-Z][a-z]+?-\d\d-\d\d\d\d)
invoice_number:
- INV(\S+[\d]+)
- Invoice ID:[\s\S]*?(\d+)
Sequence of characters that specify a search pattern in text.
Text from: https://en.wikipedia.org/wiki/Bratislava
\d{4}
The mixture of:
\d{4}
This regex means to search for 4 characters, that are digits.
Fetching data for REGION CUSTOMER (CUSTOMER_ID) - SERVICE failed with status ERROR due to
Fetching data for (\w+) ([\w\s]+?) \((\d+?)\) - ([\w\.]+?) failed with status (.+?) due to
My name is Sara Jakša
But sometimes this is not a valid name.
re.findall("[^\w\s]+", "Sara Jakša")
"Sara Jakša".match(/[^\w\s]+/g)
re.findall("[^\w\s]+", "Sara Jakša", flags=re.ASCII)
48
00:04:03,494 --> 00:04:04,745
(リオン)最高だね
49
00:04:04,828 --> 00:04:05,829
あっ
50
00:04:06,413 --> 00:04:08,916
(リオン)もう 最高の気分だよ!
51
00:04:08,999 --> 00:04:11,251
俺は 確かに傲慢だが—
52
00:04:11,335 --> 00:04:14,588
お前らは そんな俺にも勝てないわけだ
53
00:04:14,672 --> 00:04:19,718
格下に見ていた相手に負ける気分は
どうですか? 王子様!
54
00:04:19,802 --> 00:04:22,096
き… 貴様!
55
00:04:22,846 --> 00:04:25,849
(リオン)何が
“王族になど生まれたくなかった”だ
56
00:04:26,266 --> 00:04:30,896
お前 変態ババアに売られて
殺されそうになったこと あるのか?
57
00:04:30,980 --> 00:04:31,981
(ユリウス)何!?
58
00:04:32,398 --> 00:04:35,234
女子にペコペコ 頭下げた上に—
59
00:04:35,567 --> 00:04:38,737
お茶会を台なしにされた経験は?
60
00:04:39,113 --> 00:04:43,575
話しかけただけで
突き飛ばされた俺たちの気持ちが—
61
00:04:43,659 --> 00:04:45,744
分かるのかよ!
data = re.sub(r"^\d+?$", "\n", data, flags=re.M)
data = re.sub(r"\d\d:\d\d:\d\d,\d\d\d --> \d\d:\d\d:\d\d,\d\d\d", "", data)
data = re.sub(r"([^)]+?)", "", data, flags=re.M)
data = re.sub(r"\\u\d+?", "", data)
data = re.sub(r"\n+", "\n", data)
最高だね
あっ
もう 最高の気分だよ!
俺は 確かに傲慢だが—
お前らは そんな俺にも勝てないわけだ
格下に見ていた相手に負ける気分は
どうですか? 王子様!
き… 貴様!
何が
“王族になど生まれたくなかった”だ
お前 変態ババアに売られて
殺されそうになったこと あるのか?
何!?
女子にペコペコ 頭下げた上に—
お茶会を台なしにされた経験は?
話しかけただけで
突き飛ばされた俺たちの気持ちが—
分かるのかよ
import collections
kanji = re.sub(r"[^\u4E00-\u9FFF]", "", data)
len(set(kanji))
# 420
collections.Counter(kanji).most_common(10)
#[('俺', 37), ('殿', 28), ('下', 27), ('気', 24), ('分', 23)]
import regex
regex.findall("\p{Emoji}", "plenty of 🐟 in the 🌊")
# ['🐟', '🌊']
csv()
.fromString(csvString)
.on('json', (jsonObject) => {
//something is done
})
.on('done', () => {
//something is done
resolve(result)
});
csv\(\)[\s\n]*?
\.fromString\((.+)\)[\s\n]*?
\.on\(\s*?'json', \(.*?\) => \{([\s\S\n]+?)\}\)[\s\S\n]+
\.on\('done', \(.*?\) => \{([\s\S\n]+?)\}\)
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Contact: sarajaksa@sarajaksa.eu or approach me during this conference
Presentation is available on https://sarajaksa.eu/content/presentations/2022/pycon-slovakia-regex/