8. LimitationsΒΆ
- Adventures with
joblib
:Parallel
has averbose
argument that is underdocumented (“[I]f non zero, progress messages are printed. Above 50, the output is sent to stdout. The frequency of the messages increases with the verbosity level. If it more than 10, all iterations are reported.”). I even looked at the code and can’t figure it out. Anyway, some experimentation suggests 9 seems to be a reasonable value; the frequency of messages decays over time at a decent rate.sqlite3.Row
objects interact badly withjoblib
. The symptoms are very strange: functions run under 2 or more jobs that touch such an object simply stop, with no obvious exception or other error (though segfaults do appear in the system logs), and the job starts all its iterations but then hangs. For now, because it seems more elegant to not pass those tuples throughjoblib
, I haven’t removedrow_factory = sqlite3.Row
fromdb_glue.py
, but it’s something to consider for the future. We don’t currently used named columns very much. (sqlite3.Row
“tries to mimic a tuple in most of its features”, but I guess it’s not close enough.)- Same deal with
PyICU
. Callingtokenizers.ICU.tokenize()
hangs.