[Swan-dev] new log file check - ISCNTRL (and the sadness that is python's RE)
Andrew Cagney
andrew.cagney at gmail.com
Fri Oct 11 17:36:21 UTC 2019
I've added checks for control characters in both pluto.log and
console.txt, the'll show up as ISCNTRL.
(Pluto, when debug-logging was spewing an ECDSA structure that
contained raw bytes - that structure should have contained printable
ASCII).
The bad news is that kvmresults.py gets slower again. Superficially,
it's because I simply added a new regex call to a growing list - the
code looks like:
if re.match(buffer, "(null)"):
add PRINTF_NULL
if re.match(buffer, "[^ -~\n]"):
add ISCNTRL
and as everyone knows the correct way to right this is more like:
if re.match(buffer, "(null)|[^ -~\n]")):
which lets the RE generate a really fast DFA, just like grep. I've
even got a patch to do this (should revisit it, it also plays with the
"re" alternative "regex")
But here's where things get sad - it's slower than the above. It
turns out that Python's RE has an optimization where it will use a low
level string search to match things like "(null)" and that's faster
than the high-level DFA :-(
More information about the Swan-dev
mailing list