Decoding quoted-printable messages in Swift
我有一个带引号的可打印字符串,例如"费用为= C2 = A31,000 "。我如何将其转换为"费用为?£ 1,000 "。
此刻,我只是在手动转换文本,因此无法涵盖所有??情况。我确定只有一行代码会对此有所帮助。
这是我的代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | func decodeUTF8(message: String) -> String { var newMessage = message.stringByReplacingOccurrencesOfString("=2E", withString:".", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A2", withString:"a€¢", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=C2=A3", withString:"?£", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=A3", withString:"?£", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9C", withString:""", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A6", withString:"a€|", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9D", withString:""", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=92", withString:"'", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=3D", withString:"=", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=20", withString:"", options: NSStringCompareOptions.LiteralSearch, range: nil) newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=99", withString:"'", options: NSStringCompareOptions.LiteralSearch, range: nil) return newMessage } |
谢谢
一种简单的方法是利用
这是在观察到的
解码带引号的可打印内容,
所以第一个解决方案主要是将答案翻译成
该线程到Swift。
该想法是将引号可打印的" = NN "编码替换为
百分比编码"%NN ",然后使用现有方法删除
百分比编码。
续行单独处理。
另外,必须先对输入字符串中的百分比字符进行编码,
否则,它们将被视为百分比的主角
编码。
1 2 3 4 5 6 7 8 9 10 11 | func decodeQuotedPrintable(message : String) -> String? { return message .stringByReplacingOccurrencesOfString("=\ \ ", withString:"") .stringByReplacingOccurrencesOfString("=\ ", withString:"") .stringByReplacingOccurrencesOfString("%", withString:"%25") .stringByReplacingOccurrencesOfString("=", withString:"%") .stringByRemovingPercentEncoding } |
该函数返回一个可选字符串,该字符串为
无效的输入可以是:
-
一个" = "字符,后跟两个十六进制数字,
例如" = XX "。 -
" = NN "序列不会解码为有效的UTF-8序列,
例如" = E2 = 64 "。
示例:
1 2 3 4 5 6 7 | if let decoded = decodeQuotedPrintable("=C2=A31,000") { print(decoded) // ?£1,000 } if let decoded = decodeQuotedPrintable("=E2=80=9CHello =E2=80=A6 world!=E2=80=9D") { print(decoded) // a€?Hello a€| world!a€? } |
更新1:上面的代码假定消息使用UTF-8
如大多数示例中所示,用于引用非ASCII字符的编码:
如果输入为
Windows-1252编码。
要正确解码,必须替换
1 | .stringByRemovingPercentEncoding |
by
1 | .stringByReplacingPercentEscapesUsingEncoding(NSWindowsCP1252StringEncoding) |
还有一些方法可以从" Content-Type "中检测编码
标头字段,例如进行比较https://stackoverflow.com/a/32051684/1187415。
更新2:
方法被标记为已弃用,因此上述代码将始终生成
编译器警告。不幸的是,似乎没有其他方法
由Apple提供。
这是一种全新的,完全独立的解码方法,
不会引起任何编译器警告。这次我写了
作为
代码。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | extension String { /// Returns a new string made by removing in the `String` all"soft line /// breaks" and replacing all quoted-printable escape sequences with the /// matching characters as determined by a given encoding. /// - parameter encoding: A string encoding. The default is UTF-8. /// - returns: The decoded string, or `nil` for invalid input. func decodeQuotedPrintable(encoding enc : NSStringEncoding = NSUTF8StringEncoding) -> String? { // Handle soft line breaks, then replace quoted-printable escape sequences. return self .stringByReplacingOccurrencesOfString("=\ \ ", withString:"") .stringByReplacingOccurrencesOfString("=\ ", withString:"") .decodeQuotedPrintableSequences(enc) } /// Helper function doing the real work. /// Decode all"=HH" sequences with respect to the given encoding. private func decodeQuotedPrintableSequences(enc : NSStringEncoding) -> String? { var result ="" var position = startIndex // Find the next"=" and copy characters preceding it to the result: while let range = rangeOfString("=", range: position ..< endIndex) { result.appendContentsOf(self[position ..< range.startIndex]) position = range.startIndex // Decode one or more successive"=HH" sequences to a byte array: let bytes = NSMutableData() repeat { let hexCode = self[position.advancedBy(1) ..< position.advancedBy(3, limit: endIndex)] if hexCode.characters.count < 2 { return nil // Incomplete hex code } guard var byte = UInt8(hexCode, radix: 16) else { return nil // Invalid hex code } bytes.appendBytes(&byte, length: 1) position = position.advancedBy(3) } while position != endIndex && self[position] =="=" // Convert the byte array to a string, and append it to the result: guard let dec = String(data: bytes, encoding: enc) else { return nil // Decoded bytes not valid in the given encoding } result.appendContentsOf(dec) } // Copy remaining characters to the result: result.appendContentsOf(self[position ..< endIndex]) return result } } |
用法示例:
1 2 3 4 5 6 7 8 9 10 11 | if let decoded ="=C2=A31,000".decodeQuotedPrintable() { print(decoded) // ?£1,000 } if let decoded ="=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() { print(decoded) // a€?Hello a€| world!a€? } if let decoded ="Rub=E9n".decodeQuotedPrintable(encoding: NSWindowsCP1252StringEncoding) { print(decoded) // Rub??n } |
Swift 4(及更高版本)的更新:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | extension String { /// Returns a new string made by removing in the `String` all"soft line /// breaks" and replacing all quoted-printable escape sequences with the /// matching characters as determined by a given encoding. /// - parameter encoding: A string encoding. The default is UTF-8. /// - returns: The decoded string, or `nil` for invalid input. func decodeQuotedPrintable(encoding enc : String.Encoding = .utf8) -> String? { // Handle soft line breaks, then replace quoted-printable escape sequences. return self .replacingOccurrences(of:"=\ \ ", with:"") .replacingOccurrences(of:"=\ ", with:"") .decodeQuotedPrintableSequences(encoding: enc) } /// Helper function doing the real work. /// Decode all"=HH" sequences with respect to the given encoding. private func decodeQuotedPrintableSequences(encoding enc : String.Encoding) -> String? { var result ="" var position = startIndex // Find the next"=" and copy characters preceding it to the result: while let range = range(of:"=", range: position..<endIndex) { result.append(contentsOf: self[position ..< range.lowerBound]) position = range.lowerBound // Decode one or more successive"=HH" sequences to a byte array: var bytes = Data() repeat { let hexCode = self[position...].dropFirst().prefix(2) if hexCode.count < 2 { return nil // Incomplete hex code } guard let byte = UInt8(hexCode, radix: 16) else { return nil // Invalid hex code } bytes.append(byte) position = index(position, offsetBy: 3) } while position != endIndex && self[position] =="=" // Convert the byte array to a string, and append it to the result: guard let dec = String(data: bytes, encoding: enc) else { return nil // Decoded bytes not valid in the given encoding } result.append(contentsOf: dec) } // Copy remaining characters to the result: result.append(contentsOf: self[position ..< endIndex]) return result } } |
用法示例:
1 2 3 4 5 6 7 8 9 10 11 | if let decoded ="=C2=A31,000".decodeQuotedPrintable() { print(decoded) // ?£1,000 } if let decoded ="=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() { print(decoded) // a€?Hello a€| world!a€? } if let decoded ="Rub=E9n".decodeQuotedPrintable(encoding: .windowsCP1252) { print(decoded) // Rub??n } |
很遗憾,我的回答有点晚了。不过,这可能对其他人有帮助。
1 2 3 4 5 6 7 8 9 10 11 | var string ="The cost would be =C2=A31,000" var finalString: String? = nil if let regEx = try? NSRegularExpression(pattern:"={1}?([a-f0-9]{2}?)", options: NSRegularExpressionOptions.CaseInsensitive) { let intermediatePercentEscapedString = regEx.stringByReplacingMatchesInString(string, options: NSMatchingOptions.WithTransparentBounds, range: NSMakeRange(0, string.characters.count), withTemplate:"%$1") print(intermediatePercentEscapedString) finalString = intermediatePercentEscapedString.stringByRemovingPercentEncoding print(finalString) } |
您也可以查看此工作解决方案-https://github.com/dunkelstern/QuotedPrintable
1 | let result = QuotedPrintable.decode(string: quoted) |
为了提供适用的解决方案,需要更多信息。因此,我将做一些假设。
例如,在HTML或邮件中,您可以将一种或多种编码应用于某种类型的源数据。例如,您可以编码一个二进制文件,例如具有base64的
在您所说的示例中,源数据是一个字符串,并已通过UTF-8进行了编码。
在HTPP消息中,您的
" Content-Transfer-Encoding ":
为了将其还原,您需要以相反的顺序应用相应的解码。
提示:
在查看邮件的原始来源时,您可以查看邮件的标题(
此编码称为\\'quoted-printable \\',您需要做的是使用ASCII编码将字符串转换为NSData,然后遍历数据,将所有\\-= A3 \\之类的3个符号替换为字节/字符0xA3,然后使用NSUTF8StringEncoding将结果数据转换为字符串。