关于macos:在Swift中解码带引号的可打印消息

Decoding quoted-printable messages in Swift

我有一个带引号的可打印字符串,例如"费用为= C2 = A31,000 "。我如何将其转换为"费用为?£ 1,000 "。

此刻,我只是在手动转换文本,因此无法涵盖所有??情况。我确定只有一行代码会对此有所帮助。

这是我的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
func decodeUTF8(message: String) -> String
{
    var newMessage = message.stringByReplacingOccurrencesOfString("=2E", withString:".", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A2", withString:"a€¢", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=C2=A3", withString:"?£", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=A3", withString:"?£", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9C", withString:""", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A6", withString:"a€|", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9D", withString:""", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=92", withString:"'", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=3D", withString:"=", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=20", withString:"", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=99", withString:"'", options: NSStringCompareOptions.LiteralSearch, range: nil)

    return newMessage
}

谢谢


一种简单的方法是利用(NS)String方法
stringByRemovingPercentEncoding为此。
这是在观察到的
解码带引号的可打印内容,
所以第一个解决方案主要是将答案翻译成
该线程到Swift。

该想法是将引号可打印的" = NN "编码替换为
百分比编码"%NN ",然后使用现有方法删除
百分比编码。

续行单独处理。
另外,必须先对输入字符串中的百分比字符进行编码,
否则,它们将被视为百分比的主角
编码。

1
2
3
4
5
6
7
8
9
10
11
func decodeQuotedPrintable(message : String) -> String? {
    return message
        .stringByReplacingOccurrencesOfString("=\
\
", withString:"")
        .stringByReplacingOccurrencesOfString("=\
", withString:"")
        .stringByReplacingOccurrencesOfString("%", withString:"%25")
        .stringByReplacingOccurrencesOfString("=", withString:"%")
        .stringByRemovingPercentEncoding
}

该函数返回一个可选字符串,该字符串为nil用于无效输入。
无效的输入可以是:

  • 一个" = "字符,后跟两个十六进制数字,
    例如" = XX "。
  • " = NN "序列不会解码为有效的UTF-8序列,
    例如" = E2 = 64 "。

示例:

1
2
3
4
5
6
7
if let decoded = decodeQuotedPrintable("=C2=A31,000") {
    print(decoded) // ?£1,000
}

if let decoded = decodeQuotedPrintable("=E2=80=9CHello =E2=80=A6 world!=E2=80=9D") {
    print(decoded) // a€?Hello a€| world!a€?
}

更新1:上面的代码假定消息使用UTF-8
如大多数示例中所示,用于引用非ASCII字符的编码:C2 A3是"?£ "的UTF-8编码,E2 80 A4a€|的UTF-8编码。

如果输入为"Rub=E9n",则消息正在使用
Windows-1252编码。
要正确解码,必须替换

1
.stringByRemovingPercentEncoding

by

1
.stringByReplacingPercentEscapesUsingEncoding(NSWindowsCP1252StringEncoding)

还有一些方法可以从" Content-Type "中检测编码
标头字段,例如进行比较https://stackoverflow.com/a/32051684/1187415。

更新2:stringByReplacingPercentEscapesUsingEncoding
方法被标记为已弃用,因此上述代码将始终生成
编译器警告。不幸的是,似乎没有其他方法
由Apple提供。

这是一种全新的,完全独立的解码方法,
不会引起任何编译器警告。这次我写了
作为String的扩展方法。解释注释在
代码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
extension String {

    /// Returns a new string made by removing in the `String` all"soft line
    /// breaks" and replacing all quoted-printable escape sequences with the
    /// matching characters as determined by a given encoding.
    /// - parameter encoding:     A string encoding. The default is UTF-8.
    /// - returns:                The decoded string, or `nil` for invalid input.

    func decodeQuotedPrintable(encoding enc : NSStringEncoding = NSUTF8StringEncoding) -> String? {

        // Handle soft line breaks, then replace quoted-printable escape sequences.
        return self
            .stringByReplacingOccurrencesOfString("=\
\
", withString:"")
            .stringByReplacingOccurrencesOfString("=\
", withString:"")
            .decodeQuotedPrintableSequences(enc)
    }

    /// Helper function doing the real work.
    /// Decode all"=HH" sequences with respect to the given encoding.

    private func decodeQuotedPrintableSequences(enc : NSStringEncoding) -> String? {

        var result =""
        var position = startIndex

        // Find the next"=" and copy characters preceding it to the result:
        while let range = rangeOfString("=", range: position ..< endIndex) {
            result.appendContentsOf(self[position ..< range.startIndex])
            position = range.startIndex

            // Decode one or more successive"=HH" sequences to a byte array:
            let bytes = NSMutableData()
            repeat {
                let hexCode = self[position.advancedBy(1) ..< position.advancedBy(3, limit: endIndex)]
                if hexCode.characters.count < 2 {
                    return nil // Incomplete hex code
                }
                guard var byte = UInt8(hexCode, radix: 16) else {
                    return nil // Invalid hex code
                }
                bytes.appendBytes(&byte, length: 1)
                position = position.advancedBy(3)
            } while position != endIndex && self[position] =="="

            // Convert the byte array to a string, and append it to the result:
            guard let dec = String(data: bytes, encoding: enc) else {
                return nil // Decoded bytes not valid in the given encoding
            }
            result.appendContentsOf(dec)
        }

        // Copy remaining characters to the result:
        result.appendContentsOf(self[position ..< endIndex])

        return result
    }
}

用法示例:

1
2
3
4
5
6
7
8
9
10
11
if let decoded ="=C2=A31,000".decodeQuotedPrintable() {
    print(decoded) // ?£1,000
}

if let decoded ="=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
    print(decoded) // a€?Hello a€| world!a€?
}

if let decoded ="Rub=E9n".decodeQuotedPrintable(encoding: NSWindowsCP1252StringEncoding) {
    print(decoded) // Rub??n
}

Swift 4(及更高版本)的更新:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
extension String {

    /// Returns a new string made by removing in the `String` all"soft line
    /// breaks" and replacing all quoted-printable escape sequences with the
    /// matching characters as determined by a given encoding.
    /// - parameter encoding:     A string encoding. The default is UTF-8.
    /// - returns:                The decoded string, or `nil` for invalid input.

    func decodeQuotedPrintable(encoding enc : String.Encoding = .utf8) -> String? {

        // Handle soft line breaks, then replace quoted-printable escape sequences.
        return self
            .replacingOccurrences(of:"=\
\
", with:"")
            .replacingOccurrences(of:"=\
", with:"")
            .decodeQuotedPrintableSequences(encoding: enc)
    }

    /// Helper function doing the real work.
    /// Decode all"=HH" sequences with respect to the given encoding.

    private func decodeQuotedPrintableSequences(encoding enc : String.Encoding) -> String? {

        var result =""
        var position = startIndex

        // Find the next"=" and copy characters preceding it to the result:
        while let range = range(of:"=", range: position..<endIndex) {
            result.append(contentsOf: self[position ..< range.lowerBound])
            position = range.lowerBound

            // Decode one or more successive"=HH" sequences to a byte array:
            var bytes = Data()
            repeat {
                let hexCode = self[position...].dropFirst().prefix(2)
                if hexCode.count < 2 {
                    return nil // Incomplete hex code
                }
                guard let byte = UInt8(hexCode, radix: 16) else {
                    return nil // Invalid hex code
                }
                bytes.append(byte)
                position = index(position, offsetBy: 3)
            } while position != endIndex && self[position] =="="

            // Convert the byte array to a string, and append it to the result:
            guard let dec = String(data: bytes, encoding: enc) else {
                return nil // Decoded bytes not valid in the given encoding
            }
            result.append(contentsOf: dec)
        }

        // Copy remaining characters to the result:
        result.append(contentsOf: self[position ..< endIndex])

        return result
    }
}

用法示例:

1
2
3
4
5
6
7
8
9
10
11
if let decoded ="=C2=A31,000".decodeQuotedPrintable() {
    print(decoded) // ?£1,000
}

if let decoded ="=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
    print(decoded) // a€?Hello a€| world!a€?
}

if let decoded ="Rub=E9n".decodeQuotedPrintable(encoding: .windowsCP1252) {
    print(decoded) // Rub??n
}


很遗憾,我的回答有点晚了。不过,这可能对其他人有帮助。

1
2
3
4
5
6
7
8
9
10
11
var string ="The cost would be =C2=A31,000"

var finalString: String? = nil

if let regEx = try? NSRegularExpression(pattern:"={1}?([a-f0-9]{2}?)", options: NSRegularExpressionOptions.CaseInsensitive)
{
    let intermediatePercentEscapedString = regEx.stringByReplacingMatchesInString(string, options: NSMatchingOptions.WithTransparentBounds, range: NSMakeRange(0, string.characters.count), withTemplate:"%$1")
    print(intermediatePercentEscapedString)
    finalString = intermediatePercentEscapedString.stringByRemovingPercentEncoding
    print(finalString)
}

您也可以查看此工作解决方案-https://github.com/dunkelstern/QuotedPrintable

1
let result = QuotedPrintable.decode(string: quoted)

为了提供适用的解决方案,需要更多信息。因此,我将做一些假设。

例如,在HTML或邮件中,您可以将一种或多种编码应用于某种类型的源数据。例如,您可以编码一个二进制文件,例如具有base64的png文件,然后将其压缩。顺序很重要。

在您所说的示例中,源数据是一个字符串,并已通过UTF-8进行了编码。

在HTPP消息中,您的Content-Type因此是text/plain; charset = UTF-8。在您的示例中,似乎还应用了其他编码,
" Content-Transfer-Encoding ":Content-transfer-encoding可能是quoted-printablebase64(尽管不确定)。

为了将其还原,您需要以相反的顺序应用相应的解码。

提示:

在查看邮件的原始来源时,您可以查看邮件的标题(Contente-typeContent-transfer-encoding)。


此编码称为\\'quoted-printable \\',您需要做的是使用ASCII编码将字符串转换为NSData,然后遍历数据,将所有\\-= A3 \\之类的3个符号替换为字节/字符0xA3,然后使用NSUTF8StringEncoding将结果数据转换为字符串。