table of contents
Mail::SpamAssassin::Pyzor::Digest::Pieces(3pm) | User Contributed Perl Documentation | Mail::SpamAssassin::Pyzor::Digest::Pieces(3pm) |
NAME¶
Mail::SpamAssassin::Pyzor::Digest::Pieces - Pyzor backend logic module
DESCRIPTION¶
This module houses backend logic for Mail::SpamAssassin::Pyzor::Digest.
It reimplements logic found in pyzor's digest.py module (<https://github.com/SpamExperts/pyzor/blob/master/pyzor/digest.py>).
FUNCTIONS¶
$strings_ar = digest_payloads( $EMAIL_MIME )¶
This imitates the corresponding object method in digest.py. It returns a reference to an array of strings. Each string can be either a byte string or a character string (e.g., UTF-8 decoded).
NB: RFC 2822 stipulates that message bodies should use CRLF line breaks, not plain LF (nor plain CR). We will thus convert any plain CRs in a quoted-printable message body into CRLF. Python, though, doesn't do this, so the output of our implementation of digest_payloads() diverges from that of the Python original. It doesn't ultimately make a difference since the line-ending whitespace gets trimmed regardless, but it's necessary to factor in when comparing the output of our implementation with the Python output.
normalize( $STRING )¶
This imitates the corresponding object method in digest.py. It modifies $STRING in-place.
As with the original implementation, if $STRING contains (decoded) Unicode characters, those characters will be parsed accordingly. So:
$str = "123\xc2\xa0"; # [ c2 a0 ] == \u00a0, non-breaking space normalize($str);
The above will leave $str alone, but this:
utf8::decode($str); normalize($str);
... will trim off the last two bytes from $str.
$yn = should_handle_line( $STRING )¶
This imitates the corresponding object method in digest.py. It returns a boolean.
$sr = assemble_lines( \@LINES )¶
This assembles a string buffer out of @LINES. The string is the buffer of octets that will be hashed to produce the message digest.
Each member of @LINES is expected to be an octet string, not a character string.
($main, $sub, $encoding, $checkval) = parse_content_type( $CONTENT_TYPE )¶
@lines = splitlines( $TEXT )¶
Imitates "str.splitlines()". (cf. "pydoc str")
Returns a plain list in list context. Returns the number of items to be returned in scalar context.
2024-06-10 | perl v5.38.2 |