关于macos：在Swift中解码带引号的可打印消息

Decoding quoted-printable messages in Swift

我有一个带引号的可打印字符串，例如"费用为= C2 = A31,000 "。我如何将其转换为"费用为？￡ 1,000 "。

此刻，我只是在手动转换文本，因此无法涵盖所有??情况。我确定只有一行代码会对此有所帮助。

这是我的代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

func decodeUTF8(message: String) -> String
{
var newMessage = message.stringByReplacingOccurrencesOfString("=2E", withString:".", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A2", withString:"a€￠", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=C2=A3", withString:"?￡", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=A3", withString:"?￡", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9C", withString:""", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A6", withString:"a€|", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9D", withString:""", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=92", withString:"'", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=3D", withString:"=", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=20", withString:"", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=99", withString:"'", options: NSStringCompareOptions.LiteralSearch, range: nil)

return newMessage
}

谢谢

相关讨论

一种简单的方法是利用(NS)String方法
stringByRemovingPercentEncoding为此。
这是在观察到的
解码带引号的可打印内容，
所以第一个解决方案主要是将答案翻译成
该线程到Swift。

该想法是将引号可打印的" = NN "编码替换为
百分比编码"％NN "，然后使用现有方法删除
百分比编码。

续行单独处理。
另外，必须先对输入字符串中的百分比字符进行编码，
否则，它们将被视为百分比的主角
编码。

1
2
3
4
5
6
7
8
9
10
11

func decodeQuotedPrintable(message : String) -> String? {
return message
.stringByReplacingOccurrencesOfString("=\
\
", withString:"")
.stringByReplacingOccurrencesOfString("=\
", withString:"")
.stringByReplacingOccurrencesOfString("%", withString:"%25")
.stringByReplacingOccurrencesOfString("=", withString:"%")
.stringByRemovingPercentEncoding
}

该函数返回一个可选字符串，该字符串为nil用于无效输入。
无效的输入可以是：

一个" = "字符，后跟两个十六进制数字，
例如" = XX "。
" = NN "序列不会解码为有效的UTF-8序列，
例如" = E2 = 64 "。

示例：

1
2
3
4
5
6
7

if let decoded = decodeQuotedPrintable("=C2=A31,000") {
print(decoded) // ?￡1,000
}

if let decoded = decodeQuotedPrintable("=E2=80=9CHello =E2=80=A6 world!=E2=80=9D") {
print(decoded) // a€?Hello a€| world!a€?
}

更新1：上面的代码假定消息使用UTF-8
如大多数示例中所示，用于引用非ASCII字符的编码：C2 A3是"？￡ "的UTF-8编码，E2 80 A4是a€|的UTF-8编码。

如果输入为"Rub=E9n"，则消息正在使用
Windows-1252编码。
要正确解码，必须替换

1	.stringByRemovingPercentEncoding

1	.stringByReplacingPercentEscapesUsingEncoding(NSWindowsCP1252StringEncoding)

还有一些方法可以从" Content-Type "中检测编码
标头字段，例如进行比较https://stackoverflow.com/a/32051684/1187415。

更新2：stringByReplacingPercentEscapesUsingEncoding
方法被标记为已弃用，因此上述代码将始终生成
编译器警告。不幸的是，似乎没有其他方法
由Apple提供。

这是一种全新的，完全独立的解码方法，
不会引起任何编译器警告。这次我写了
作为String的扩展方法。解释注释在
代码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

extension String {

/// Returns a new string made by removing in the `String` all"soft line
/// breaks" and replacing all quoted-printable escape sequences with the
/// matching characters as determined by a given encoding.
/// - parameter encoding: A string encoding. The default is UTF-8.
/// - returns: The decoded string, or `nil` for invalid input.

func decodeQuotedPrintable(encoding enc : NSStringEncoding = NSUTF8StringEncoding) -> String? {

// Handle soft line breaks, then replace quoted-printable escape sequences.
return self
.stringByReplacingOccurrencesOfString("=\
\
", withString:"")
.stringByReplacingOccurrencesOfString("=\
", withString:"")
.decodeQuotedPrintableSequences(enc)
}

/// Helper function doing the real work.
/// Decode all"=HH" sequences with respect to the given encoding.

private func decodeQuotedPrintableSequences(enc : NSStringEncoding) -> String? {

var result =""
var position = startIndex

// Find the next"=" and copy characters preceding it to the result:
while let range = rangeOfString("=", range: position ..< endIndex) {
result.appendContentsOf(self[position ..< range.startIndex])
position = range.startIndex

// Decode one or more successive"=HH" sequences to a byte array:
let bytes = NSMutableData()
repeat {
let hexCode = self[position.advancedBy(1) ..< position.advancedBy(3, limit: endIndex)]
if hexCode.characters.count < 2 {
return nil // Incomplete hex code
}
guard var byte = UInt8(hexCode, radix: 16) else {
return nil // Invalid hex code
}
bytes.appendBytes(&byte, length: 1)
position = position.advancedBy(3)
} while position != endIndex && self[position] =="="

// Convert the byte array to a string, and append it to the result:
guard let dec = String(data: bytes, encoding: enc) else {
return nil // Decoded bytes not valid in the given encoding
}
result.appendContentsOf(dec)
}

// Copy remaining characters to the result:
result.appendContentsOf(self[position ..< endIndex])

return result
}
}

用法示例：

1
2
3
4
5
6
7
8
9
10
11

if let decoded ="=C2=A31,000".decodeQuotedPrintable() {
print(decoded) // ?￡1,000
}

if let decoded ="=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
print(decoded) // a€?Hello a€| world!a€?
}

if let decoded ="Rub=E9n".decodeQuotedPrintable(encoding: NSWindowsCP1252StringEncoding) {
print(decoded) // Rub??n
}

Swift 4(及更高版本)的更新：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

extension String {

/// Returns a new string made by removing in the `String` all"soft line
/// breaks" and replacing all quoted-printable escape sequences with the
/// matching characters as determined by a given encoding.
/// - parameter encoding: A string encoding. The default is UTF-8.
/// - returns: The decoded string, or `nil` for invalid input.

func decodeQuotedPrintable(encoding enc : String.Encoding = .utf8) -> String? {

// Handle soft line breaks, then replace quoted-printable escape sequences.
return self
.replacingOccurrences(of:"=\
\
", with:"")
.replacingOccurrences(of:"=\
", with:"")
.decodeQuotedPrintableSequences(encoding: enc)
}

/// Helper function doing the real work.
/// Decode all"=HH" sequences with respect to the given encoding.

private func decodeQuotedPrintableSequences(encoding enc : String.Encoding) -> String? {

var result =""
var position = startIndex

// Find the next"=" and copy characters preceding it to the result:
while let range = range(of:"=", range: position..<endIndex) {
result.append(contentsOf: self[position ..< range.lowerBound])
position = range.lowerBound

// Decode one or more successive"=HH" sequences to a byte array:
var bytes = Data()
repeat {
let hexCode = self[position...].dropFirst().prefix(2)
if hexCode.count < 2 {
return nil // Incomplete hex code
}
guard let byte = UInt8(hexCode, radix: 16) else {
return nil // Invalid hex code
}
bytes.append(byte)
position = index(position, offsetBy: 3)
} while position != endIndex && self[position] =="="

// Convert the byte array to a string, and append it to the result:
guard let dec = String(data: bytes, encoding: enc) else {
return nil // Decoded bytes not valid in the given encoding
}
result.append(contentsOf: dec)
}

// Copy remaining characters to the result:
result.append(contentsOf: self[position ..< endIndex])

return result
}
}

用法示例：

1
2
3
4
5
6
7
8
9
10
11

相关讨论

很遗憾，我的回答有点晚了。不过，这可能对其他人有帮助。

1
2
3
4
5
6
7
8
9
10
11

var string ="The cost would be =C2=A31,000"

var finalString: String? = nil

if let regEx = try? NSRegularExpression(pattern:"={1}?([a-f0-9]{2}?)", options: NSRegularExpressionOptions.CaseInsensitive)
{
let intermediatePercentEscapedString = regEx.stringByReplacingMatchesInString(string, options: NSMatchingOptions.WithTransparentBounds, range: NSMakeRange(0, string.characters.count), withTemplate:"%$1")
print(intermediatePercentEscapedString)
finalString = intermediatePercentEscapedString.stringByRemovingPercentEncoding
print(finalString)
}

您也可以查看此工作解决方案-https://github.com/dunkelstern/QuotedPrintable

1	let result = QuotedPrintable.decode(string: quoted)

为了提供适用的解决方案，需要更多信息。因此，我将做一些假设。

例如，在HTML或邮件中，您可以将一种或多种编码应用于某种类型的源数据。例如，您可以编码一个二进制文件，例如具有base64的png文件，然后将其压缩。顺序很重要。

在您所说的示例中，源数据是一个字符串，并已通过UTF-8进行了编码。

在HTPP消息中，您的Content-Type因此是text/plain; charset = UTF-8。在您的示例中，似乎还应用了其他编码，
" Content-Transfer-Encoding "：Content-transfer-encoding可能是quoted-printable或base64(尽管不确定)。

为了将其还原，您需要以相反的顺序应用相应的解码。

提示：

在查看邮件的原始来源时，您可以查看邮件的标题(Contente-type和Content-transfer-encoding)。

相关讨论

此编码称为\\'quoted-printable \\'，您需要做的是使用ASCII编码将字符串转换为NSData，然后遍历数据，将所有\\-= A3 \\之类的3个符号替换为字节/字符0xA3，然后使用NSUTF8StringEncoding将结果数据转换为字符串。

相关讨论