[Swan-dev] new log file check - ISCNTRL (and the sadness that is python's RE)

Andrew Cagney andrew.cagney at gmail.com
Fri Oct 11 17:36:21 UTC 2019


I've added checks for control characters in both pluto.log and
console.txt, the'll show up as ISCNTRL.
(Pluto, when debug-logging was spewing an ECDSA structure that
contained raw bytes - that structure should have contained printable
ASCII).

The bad news is that kvmresults.py gets slower again.  Superficially,
it's because I simply added a new regex call to a growing list - the
code looks like:
   if re.match(buffer, "(null)"):
    add PRINTF_NULL
   if re.match(buffer, "[^ -~\n]"):
    add ISCNTRL
and as everyone knows the correct way to right this is more like:
   if re.match(buffer, "(null)|[^ -~\n]")):
which lets the RE generate a really fast DFA, just like grep.  I've
even got a patch to do this (should revisit it, it also plays with the
"re" alternative "regex")
But here's where things get sad - it's slower than the above.  It
turns out that Python's RE has an optimization where it will use a low
level string search to match things like "(null)" and that's faster
than the high-level DFA :-(


More information about the Swan-dev mailing list