关于解析：如何在C ++中读取和解析CSV文件？

How can I read and parse CSV files in C++?

我需要在C++中加载和使用CSV文件数据。此时，它实际上可以只是一个逗号分隔的解析器(即，不必担心转义新行和逗号)。主要的需求是一个逐行分析器，每次调用方法时，它都会返回下一行的向量。

我发现这篇文章很有希望：http://www.boost.org/doc/libs/1诳0/libs/spirit/example/basics/list_parser.cpp

我从来没有用过Boost的精神，但我愿意尝试。但只有在没有更直接的解决方案的情况下，我才会忽略。

如果你不在乎漏掉逗号和换行符，你不能在引号中嵌入逗号和换行符(如果你不能转义，那么…)然后它只有大约三行代码(OK 14->但是它只有15行代码来读取整个文件)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

std::vector<std::string> getNextLineAndSplitIntoTokens(std::istream& str)
{
std::vector<std::string> result;
std::string line;
std::getline(str,line);

std::stringstream lineStream(line);
std::string cell;

while(std::getline(lineStream,cell, ','))
{
result.push_back(cell);
}
// This checks for a trailing comma with no data after it.
if (!lineStream && cell.empty())
{
// If there was a trailing comma then add an empty element.
result.push_back("");
}
return result;
}

我只创建一个表示行的类。然后流入该对象：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58

#include <iterator>
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <string>

class CSVRow
{
public:
std::string const& operator[](std::size_t index) const
{
return m_data[index];
}
std::size_t size() const
{
return m_data.size();
}
void readNextRow(std::istream& str)
{
std::string line;
std::getline(str, line);

std::stringstream lineStream(line);
std::string cell;

m_data.clear();
while(std::getline(lineStream, cell, ','))
{
m_data.push_back(cell);
}
// This checks for a trailing comma with no data after it.
if (!lineStream && cell.empty())
{
// If there was a trailing comma then add an empty element.
m_data.push_back("");
}
}
private:
std::vector<std::string> m_data;
};

std::istream& operator>>(std::istream& str, CSVRow& data)
{
data.readNextRow(str);
return str;
}
int main()
{
std::ifstream file("plop.csv");

CSVRow row;
while(file >> row)
{
std::cout <<"4th Element(" << row[3] <<")
";
}
}

但是只要做一点工作，我们就可以在技术上创建一个迭代器：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

class CSVIterator
{
public:
typedef std::input_iterator_tag iterator_category;
typedef CSVRow value_type;
typedef std::size_t difference_type;
typedef CSVRow* pointer;
typedef CSVRow& reference;

CSVIterator(std::istream& str) :m_str(str.good()?&str:NULL) { ++(*this); }
CSVIterator() :m_str(NULL) {}

// Pre Increment
CSVIterator& operator++() {if (m_str) { if (!((*m_str) >> m_row)){m_str = NULL;}}return *this;}
// Post increment
CSVIterator operator++(int) {CSVIterator tmp(*this);++(*this);return tmp;}
CSVRow const& operator*() const {return m_row;}
CSVRow const* operator->() const {return &m_row;}

bool operator==(CSVIterator const& rhs) {return ((this == &rhs) || ((this->m_str == NULL) && (rhs.m_str == NULL)));}
bool operator!=(CSVIterator const& rhs) {return !((*this) == rhs);}
private:
std::istream* m_str;
CSVRow m_row;
};

int main()
{
std::ifstream file("plop.csv");

for(CSVIterator loop(file); loop != CSVIterator(); ++loop)
{
std::cout <<"4th Element(" << (*loop)[3] <<")
";
}
}

相关讨论

这正是我想要的！现在，一些额外的学分..我如何将它变成一个有构造函数和两个方法的类：firstline()和nextline()。STD：：ISTRAM没有默认的构造函数……那么我该怎么用呢？谢谢你的帮助！！
first()下一个()。这是什么爪哇！只是开玩笑。
或者您可以使用一些boost库来解析csv…见下文
这段代码为我节省了几个小时。我通常不使用C++，但需要借助它来编写一个快速解析器。这是一个很好的样板文件，代码甚至可以编译。
@康拉德利：我总是试着写一些可以编译的代码。—)很高兴它能帮上忙。但正如Stefanb所建议的，你也可以看看Boost。Boost有一大堆东西使得C++更容易包括解析器代码。
最糟糕的事情之一是重写==和！寻找者。只是错了。
@达特海德：一个覆盖广泛的声明，它的广泛性是愚蠢的。如果你想澄清为什么它是坏的，然后为什么这种坏适用于这方面。
所以你不认为这是坏的吗？你认为认为这是愚蠢的，认为它是坏的？
@达特海德：我认为做广泛的概括是愚蠢的。上面的代码工作正常，所以我可以看到它有什么问题。但如果你对以上有任何具体的评论，我将在本文中予以考虑。但我可以看到，你如何能得出这个结论，通过不加考虑地遵循一套通用的C语言规则，并将其应用到另一种语言。
+++ 1。感谢您提供迭代器示例。csv只需在这么少的代码行中完成，并且具有良好的设计。你可以接受它并处理csv的引号和其他方言。
另外，如果由于在某个地方的另一个库定义了istream::operator>>(如eigen)，所以在上面的代码中遇到了奇怪的链接问题，那么在操作符声明之前添加一个inline来修复它。
@塞巴斯蒂安·库克：我认为你对你通过添加inline解决的问题感到困惑。名称空间冲突应该通过将内容放入显式名称空间来解决。内联将帮助链接器解决问题，因为您将定义放在头文件中并多次包含它。
解析部分丢失，仍有一个以字符串结尾。这只是一个过度设计的分线器。
为什么我不能用getline(stringstream(line),cell,',')代替stringstream lineStream(line); getline(lineStream,cell,',');？
@基里尔·伊库姆：这是一个很好的问题，应该有自己的答案。最好是在主网站上提问，而不是在评论中提问。
@Loki谢谢，完成stackoverflow.com/q/26979013/390066
这是关于如何创建迭代器类的最简单、最清晰的示例。
迭代器没有读取最后一行。为了读取最后一行，我应该做什么修改？谢谢
@托尼茨：对我来说很好。我有聊天室。发布你的代码，我会看看是否有问题chat.stackoverflow.com/rooms/113764/csv
谢谢!将operator++()的实现更改为以下代码以修复"最后一行未导入"-错误：CSVIterator& operator++() {if (m_str) { if (!((*m_str) >> m_row)){;m_str = NULL;}}return *this;}。如果您感兴趣，请在loki的聊天室中找到解释。感谢这段伟大的代码！
@斯特凡沃尔：那一行是在许多月以前被放在代码上面的。
：d哦，我几个月来一直在使用代码，在测试时发现了这个错误，所以我回到了这个页面。所以，上面的代码是安全的！伟大的工作@lokiastari！：)
我发现像：a,b,这样的一行只会给m_数据增加两个值。我认为应该加三个(第三个是空字符串)。我通过在while循环之后添加m_data.push_back("");来解决这个问题。因为我是通过索引访问单元格的，所以额外的值并不困扰我。
@matrixmanatyrservice：如果不进行任何测试，我会尝试：if (!lineStream){m_data.push_back("");}，这样它只会在逗号后面没有任何内容的情况下推送空值。如果你测试它，它可以工作，我会更新上面的答案。
@Lokiastari看起来比我的黑客好多了。我测试过了，结果成功了。
@洛基亚斯塔里，你能解释一下关于if (!lineStream)的事吗？没有尾随逗号的情况也会传递这个条件语句吗？另外，对于m_data.push_back("")来说，m_data来自哪里？不是应该是result.push_back("")吗？
@尼古拉斯对不起。打字错误(根据评论修改投诉时)。是的，if (!lineStream)需要更多(固定的)。而m_data应为result号(固定)。
@Lokiastari可能只是if (cell.empty())？对于正常线路，似乎!lineStream也是正确的。而且，似乎std::vectorresult也会在向量的末尾附加一个""。我认为这是因为在到达end-of-file condition之前，csv文件中
和end-of-file condition之间只有另一个空单元。
所以这个设置适用于字符串字符？如果我想读取整数或浮点数？
@Drizo：你有一个字符串，你可以通过将它反序列化为你的合适类型的对象来把它转换成任何类型。有几种方法可以对对象进行反序列化，因此这是您应该问的一个问题。
@darthvarder当你有一个成员别名typedef std::input_iterator_tag iterator_category;时，不为类重载operator!=是不正确的。参见inputiater。

使用Boost标记器的解决方案：

1
2
3
4
5
6
7
8
9

std::vector<std::string> vec;
using namespace boost;
tokenizer<escaped_list_separator<char> > tk(
line, escaped_list_separator<char>('\', ',', '"'));
for (tokenizer<escaped_list_separator<char> >::iterator i(tk.begin());
i!=tk.end();++i)
{
vec.push_back(*i);
}

相关讨论

我的版本不是使用任何东西，而是标准的C++ 11库。它很好地处理了Excelcsv报价：

1 2	spam eggs,"foo,bar","""fizz buzz""" 1.23,4.567,-8.00E+09

代码被写为一个有限状态机，一次只消耗一个字符。我觉得讲道理容易些。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66

#include <istream>
#include <string>
#include <vector>

enum class CSVState {
UnquotedField,
QuotedField,
QuotedQuote
};

std::vector<std::string> readCSVRow(const std::string &row) {
CSVState state = CSVState::UnquotedField;
std::vector<std::string> fields {""};
size_t i = 0; // index of the current field
for (char c : row) {
switch (state) {
case CSVState::UnquotedField:
switch (c) {
case ',': // end of field
fields.push_back(""); i++;
break;
case '"': state = CSVState::QuotedField;
break;
default: fields[i].push_back(c);
break; }
break;
case CSVState::QuotedField:
switch (c) {
case '"': state = CSVState::QuotedQuote;
break;
default: fields[i].push_back(c);
break; }
break;
case CSVState::QuotedQuote:
switch (c) {
case ',': // , after closing quote
fields.push_back(""); i++;
state = CSVState::UnquotedField;
break;
case '"': //"" ->"
fields[i].push_back('"');
state = CSVState::QuotedField;
break;
default: // end of quote
state = CSVState::UnquotedField;
break; }
break;
}
}
return fields;
}

/// Read CSV file, Excel dialect. Accept"quoted fields""with quotes"""
std::vector<std::vector<std::string>> readCSV(std::istream &in) {
std::vector<std::vector<std::string>> table;
std::string row;
while (!in.eof()) {
std::getline(in, row);
if (in.bad() || in.fail()) {
break;
}
auto fields = readCSVRow(row);
table.push_back(fields);
}
return table;
}

相关讨论

C++字符串工具包库(STRTK)有一个令牌网格类，允许您从文本文件、字符串或字符缓冲区加载数据，并以行列的方式解析/处理它们。

您可以指定行分隔符和列分隔符，或者只使用默认值。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

void foo()
{
std::string data ="1,2,3,4,5
"
"0,2,4,6,8
"
"1,3,5,7,9
";

strtk::token_grid grid(data,data.size(),",");

for(std::size_t i = 0; i < grid.row_count(); ++i)
{
strtk::token_grid::row_type r = grid.row(i);
for(std::size_t j = 0; j < r.size(); ++j)
{
std::cout << r.get<int>(j) <<"\t";
}
std::cout << std::endl;
}
std::cout << std::endl;
}

这里有更多的例子

相关讨论

您可以使用带有转义的列表分隔符的Boost标记器。

escaped_list_separator parses a superset of the csv. Boost::tokenizer

这只使用boost tokenizer头文件，不需要链接到boost库。

下面是一个例子(参见C++中的增强型令牌化器的CSV文件，用于细节或EDCOX1(3))：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

#include <iostream> // cout, endl
#include <fstream> // fstream
#include <vector>
#include <string>
#include // copy
#include <iterator> // ostream_operator
#include <boost/tokenizer.hpp>

int main()
{
using namespace std;
using namespace boost;
string data("data.csv");

ifstream in(data.c_str());
if (!in.is_open()) return 1;

typedef tokenizer< escaped_list_separator<char> > Tokenizer;
vector< string > vec;
string line;

while (getline(in,line))
{
Tokenizer tok(line);
vec.assign(tok.begin(),tok.end());

// vector now contains strings from one row, output to cout here
copy(vec.begin(), vec.end(), ostream_iterator<string>(cout,"|"));

cout <<"
----------------------" << endl;
}
}

相关讨论

使用spirit解析CSV并不是过分的杀伤力。spirit非常适合微解析任务。例如，在Spirit 2.1中，它很容易：

1
2
3
4
5
6
7
8
9
10

bool r = phrase_parse(first, last,

// Begin grammar
(
double_ % ','
)
,
// End grammar

space, v);

向量v充满了值。在新的Spirit2.1文档中有一系列关于这一点的教程，这些文档刚刚通过Boost1.41发布。

本教程从简单到复杂。csv解析器出现在中间的某个地方，并在使用spirit时涉及到各种技术。生成的代码和手写代码一样紧密。检查生成的汇编程序！

相关讨论

其实是杀伤力太大，编译时间太长，使得用精神简单的"微解析任务"显得不合理。
另外，我想指出的是，上面的代码并不解析csv，它只是解析由逗号分隔的向量类型的范围。它不处理引号、不同类型的列等。简言之，对某个确实回答了问题的东西，19票对我来说似乎有点可疑。
@沙丁纳胡说八道。对于小型解析器来说，编译时间不会太长，但也不重要，因为您需要将代码填充到自己的编译单元中，然后编译一次。然后你只需要把它链接起来，这就和它的效率一样高。至于你的其他评论，csv的方言和它的处理器一样多。这一个当然不是一个非常有用的方言，但它可以被扩展到处理引用值。
@Konrad：只需在一个空文件中包含"include"，只有一个主文件，而其他任何文件都不需要9.7秒，在2.GHz运行的corei7上运行MSVC 2012。这是不必要的膨胀。这个公认的答案在同一台机器上的2secs下编译，我不想想象"适当"的提升会持续多长时间。
@对于我来说(大约4秒)，Gerdiner所花的时间要少得多，但正如我所说，这是完全不相关的，因为您只需要编译一次那个tu。您节省的实现解析器的时间很容易抵消这里的编译成本。至于"正确"的提升，sprit语法：一个大的语法需要几分钟的时间来编译。但是再一次：这种成本很容易被编写解析器的容易性所抵消，而且这不是一个连续的成本，因为您不需要每次重新编译客户机代码时都重新编译解析器。
@我不得不同意你的看法，在使用Spirit进行像cvs处理这样简单的工作时，开销太大了。
我个人认为这完全取决于你写的应用程序。在我的例子中，我可能使用boost，因为我们已经编译了它，即使spirit不面向性能，我希望它为测试加载初始数据。这是什么，只有1-3行代码？把我算进去，我不必编写解析器，即使这很容易。
@实际上，精神是非常有效的。是的，编译需要很长时间，但执行速度远远超过手工编写的解析器，甚至是非幼稚的实现。举例来说，boost::qi::int_解析器是目前所有现有库(包括手写代码)中最有效的方法。
@Konradrudolph哇，是这样吗？我印象深刻！我最终在测试中使用它来进行csv解析，效果很好！
同意@konradrudolph——这里关注性能的人关心的是编译性能，这与精神的意图和编码人员通常关心的是相反的。这是一个C++模板的问题。在本文中，最好包括一个更完整的CSV解析示例，而不仅仅是一个指向教程的链接…
该示例显示了以逗号分隔的双精度值列表，而不是csv。

如果您确实关心正确地解析csv，那么这样做会相对缓慢，因为它一次只能处理一个字符。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67

void ParseCSV(const string& csvSource, vector<vector<string> >& lines)
{
bool inQuote(false);
bool newLine(false);
string field;
lines.clear();
vector<string> line;

string::const_iterator aChar = csvSource.begin();
while (aChar != csvSource.end())
{
switch (*aChar)
{
case '"':
newLine = false;
inQuote = !inQuote;
break;

case ',':
newLine = false;
if (inQuote == true)
{
field += *aChar;
}
else
{
line.push_back(field);
field.clear();
}
break;

case '
':
case '
':
if (inQuote == true)
{
field += *aChar;
}
else
{
if (newLine == false)
{
line.push_back(field);
lines.push_back(line);
field.clear();
line.clear();
newLine = true;
}
}
break;

default:
newLine = false;
field.push_back(*aChar);
break;
}

aChar++;
}

if (field.size())
line.push_back(field);

if (line.size())
lines.push_back(line);
}

相关讨论

当使用boost tokenizer为csv文件转义了u listu分隔符时，应该注意以下几点：

它需要转义符(默认反斜杠-)

它需要拆分器/分隔符字符(默认逗号-，)

它需要引号字符(默认引号-")

wiki指定的csv格式声明数据字段可以包含引号分隔符(支持)：

1997,Ford,E350,"Super, luxurious truck"

wiki指定的csv格式声明单引号应使用双引号处理(转义的列表分隔符将删除所有引号字符)：

1997,Ford,E350,"Super""luxurious"" truck"

csv格式没有指定任何反斜杠字符都应该去掉(转义的列表分隔符将去掉所有转义字符)。

修复boost转义的列表分隔符的默认行为的可能解决方案：

首先将所有反斜杠字符()替换为两个反斜杠字符()，这样它们就不会被删除。

然后将所有双引号(")替换为一个反斜杠字符和一个引号(")

这种解决方法的副作用是将用双引号表示的空数据字段转换为单引号标记。在遍历令牌时，必须检查令牌是否为单引号，并将其视为空字符串。

不是很漂亮，但它是有效的，只要引号中没有换行符。

由于所有的csv问题似乎都被重定向到这里，我想我会把答案贴在这里。这个答案不能直接回答提问者的问题。我希望能够在一个已知为csv格式的流中进行读取，并且每个字段的类型都是已知的。当然，下面的方法可以用来将每个字段都视为字符串类型。

作为我希望如何使用csv输入流的示例，请考虑以下输入(取自csv上的维基百科页面)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

const char input[] =
"Year,Make,Model,Description,Price
"
"1997,Ford,E350,"ac, abs, moon",3000.00
"
"1999,Chevy,"Venture ""Extended Edition""","",4900.00
"
"1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
"
"1996,Jeep,Grand Cherokee,"MUST SELL!
\
air, moon roof, loaded",4799.00
"
;

然后，我想能够像这样读取数据：

1
2
3
4
5
6
7
8
9
10
11

std::istringstream ss(input);
std::string title[5];
int year;
std::string make, model, desc;
float price;
csv_istream(ss)
>> title[0] >> title[1] >> title[2] >> title[3] >> title[4];
while (csv_istream(ss)
>> year >> make >> model >> desc >> price) {
//...do something with the record...
}

这就是我最终得到的解决方案。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74

struct csv_istream {
std::istream &is_;
csv_istream (std::istream &is) : is_(is) {}
void scan_ws () const {
while (is_.good()) {
int c = is_.peek();
if (c != ' ' && c != '\t') break;
is_.get();
}
}
void scan (std::string *s = 0) const {
std::string ws;
int c = is_.get();
if (is_.good()) {
do {
if (c == ',' || c == '
') break;
if (s) {
ws += c;
if (c != ' ' && c != '\t') {
*s += ws;
ws.clear();
}
}
c = is_.get();
} while (is_.good());
if (is_.eof()) is_.clear();
}
}
template <typename T, bool> struct set_value {
void operator () (std::string in, T &v) const {
std::istringstream(in) >> v;
}
};
template <typename T> struct set_value<T, true> {
template <bool SIGNED> void convert (std::string in, T &v) const {
if (SIGNED) v = ::strtoll(in.c_str(), 0, 0);
else v = ::strtoull(in.c_str(), 0, 0);
}
void operator () (std::string in, T &v) const {
convert<is_signed_int<T>::val>(in, v);
}
};
template <typename T> const csv_istream & operator >> (T &v) const {
std::string tmp;
scan(&tmp);
set_value<T, is_int<T>::val>()(tmp, v);
return *this;
}
const csv_istream & operator >> (std::string &v) const {
v.clear();
scan_ws();
if (is_.peek() != '"') scan(&v);
else {
std::string tmp;
is_.get();
std::getline(is_, tmp, '"');
while (is_.peek() == '"') {
v += tmp;
v += is_.get();
std::getline(is_, tmp, '"');
}
v += tmp;
scan();
}
return *this;
}
template <typename T>
const csv_istream & operator >> (T &(*manip)(T &)) const {
is_ >> manip;
return *this;
}
operator bool () const { return !is_.fail(); }
};

下面的帮助器可以通过C++ 11中的新的积分特性模板简化：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

template <typename T> struct is_signed_int { enum { val = false }; };
template <> struct is_signed_int<short> { enum { val = true}; };
template <> struct is_signed_int<int> { enum { val = true}; };
template <> struct is_signed_int<long> { enum { val = true}; };
template <> struct is_signed_int<long long> { enum { val = true}; };

template <typename T> struct is_unsigned_int { enum { val = false }; };
template <> struct is_unsigned_int<unsigned short> { enum { val = true}; };
template <> struct is_unsigned_int<unsigned int> { enum { val = true}; };
template <> struct is_unsigned_int<unsigned long> { enum { val = true}; };
template <> struct is_unsigned_int<unsigned long long> { enum { val = true}; };

template <typename T> struct is_int {
enum { val = (is_signed_int<T>::val || is_unsigned_int<T>::val) };
};

相关讨论

您可能想看看我的OFS项目CSVfix(更新链接)，它是用C++编写的CSV流编辑器。csv解析器不是什么奖品，但是它可以完成工作，整个包可以做你需要做的事情，而不需要你编写任何代码。

csv解析器见alib/src/a_csv.cpp，用法示例见csvlib/src/csved_ioman.cpp(IOManager::ReadCSV)。

相关讨论

另一个类似于Loki Astari的答案，在C++ 11中。这里的行是给定类型的std::tuple。代码扫描一行，然后扫描到每个分隔符，然后将值直接转换并转储到元组中(使用一点模板代码)。

1
2
3

for (auto row : csv<std::string, int, float>(file, ',')) {
std::cout <<"first col:" << std::get<0>(row) << std::endl;
}

Advanges：

相当干净和简单使用，只有C++ 11。
通过operator>>自动转换为std::tuple。

缺少什么：

转义和引用
如果csv格式不正确，则不进行错误处理。

主要代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127

#include <iterator>
#include <sstream>
#include <string>

namespace csvtools {
/// Read the last element of the tuple without calling recursively
template <std::size_t idx, class... fields>
typename std::enable_if<idx >= std::tuple_size<std::tuple<fields...>>::value - 1>::type
read_tuple(std::istream &in, std::tuple<fields...> &out, const char delimiter) {
std::string cell;
std::getline(in, cell, delimiter);
std::stringstream cell_stream(cell);
cell_stream >> std::get<idx>(out);
}

/// Read the @p idx-th element of the tuple and then calls itself with @p idx + 1 to
/// read the next element of the tuple. Automatically falls in the previous case when
/// reaches the last element of the tuple thanks to enable_if
template <std::size_t idx, class... fields>
typename std::enable_if<idx < std::tuple_size<std::tuple<fields...>>::value - 1>::type
read_tuple(std::istream &in, std::tuple<fields...> &out, const char delimiter) {
std::string cell;
std::getline(in, cell, delimiter);
std::stringstream cell_stream(cell);
cell_stream >> std::get<idx>(out);
read_tuple<idx + 1, fields...>(in, out, delimiter);
}
}

/// Iterable csv wrapper around a stream. @p fields the list of types that form up a row.
template <class... fields>
class csv {
std::istream &_in;
const char _delim;
public:
typedef std::tuple<fields...> value_type;
class iterator;

/// Construct from a stream.
inline csv(std::istream &in, const char delim) : _in(in), _delim(delim) {}

/// Status of the underlying stream
/// @{
inline bool good() const {
return _in.good();
}
inline const std::istream &underlying_stream() const {
return _in;
}
/// @}

inline iterator begin();
inline iterator end();
private:

/// Reads a line into a stringstream, and then reads the line into a tuple, that is returned
inline value_type read_row() {
std::string line;
std::getline(_in, line);
std::stringstream line_stream(line);
std::tuple<fields...> retval;
csvtools::read_tuple<0, fields...>(line_stream, retval, _delim);
return retval;
}
};

/// Iterator; just calls recursively @ref csv::read_row and stores the result.
template <class... fields>
class csv<fields...>::iterator {
csv::value_type _row;
csv *_parent;
public:
typedef std::input_iterator_tag iterator_category;
typedef csv::value_type value_type;
typedef std::size_t difference_type;
typedef csv::value_type * pointer;
typedef csv::value_type & reference;

/// Construct an empty/end iterator
inline iterator() : _parent(nullptr) {}
/// Construct an iterator at the beginning of the @p parent csv object.
inline iterator(csv &parent) : _parent(parent.good() ? &parent : nullptr) {
++(*this);
}

/// Read one row, if possible. Set to end if parent is not good anymore.
inline iterator &operator++() {
if (_parent != nullptr) {
_row = _parent->read_row();
if (!_parent->good()) {
_parent = nullptr;
}
}
return *this;
}

inline iterator operator++(int) {
iterator copy = *this;
++(*this);
return copy;
}

inline csv::value_type const &operator*() const {
return _row;
}

inline csv::value_type const *operator->() const {
return &_row;
}

bool operator==(iterator const &other) {
return (this == &other) or (_parent == nullptr and other._parent == nullptr);
}
bool operator!=(iterator const &other) {
return not (*this == other);
}
};

template <class... fields>
typename csv<fields...>::iterator csv<fields...>::begin() {
return iterator(*this);
}

template <class... fields>
typename csv<fields...>::iterator csv<fields...>::end() {
return iterator();
}

我在Github上放了一个很小的工作示例；我一直在使用它来解析一些数字数据，它起到了它的作用。

相关讨论

我只写了一个头，C++ 11 CSV解析器。它测试良好，速度快，支持整个csv规范(引号中的引用字段、分隔符/终止符、引号转义等)，并且可以配置为考虑不符合规范的csv。

通过Fluent接口进行配置：

1
2
3
4
5
6
7
8

// constructor accepts any input stream
CsvParser parser = CsvParser(std::cin)
.delimiter(';') // delimited by ; instead of ,
.quote('\'') // quoted fields use ' instead of"
.terminator('\0'); // terminated by \0 instead of by

,
, or

解析只是一个基于循环的范围：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

#include <iostream>
#include"../parser.hpp"

using namespace aria::csv;

int main() {
std::ifstream f("some_file.csv");
CsvParser parser(f);

for (auto& row : parser) {
for (auto& field : row) {
std::cout << field <<" |";
}
std::cout << std::endl;
}
}

相关讨论

在这里可以找到另一个csv I/O库：

http://code.google.com/p/fast-cpp-csv-解析器/

1
2
3
4
5
6
7
8
9
10

#include"csv.h"

int main(){
io::CSVReader<3> in("ram.csv");
in.read_header(io::ignore_extra_column,"vendor","size","speed");
std::string vendor; int size; double speed;
while(in.read_row(vendor, size, speed)){
// do stuff with the data
}
}

相关讨论

这里是unicode csv解析器的另一个实现(与wchar_t一起工作)。我写了一部分，而乔纳森·莱弗勒写了其余的。

注意：此解析器旨在尽可能紧密地复制Excel的行为，特别是在导入损坏或格式错误的csv文件时。

这是原始的问题-用多行字段和转义双引号解析csv文件

这是作为SSCCE的代码(简短、独立、正确的示例)。

#include <stdbool.h>
#include <wchar.h>
#include <wctype.h>

extern const wchar_t *nextCsvField(const wchar_t *p, wchar_t sep, bool *newline);

// Returns a pointer to the start of the next field,
// or zero if this is the last field in the CSV
// p is the start position of the field
// sep is the separator used, i.e. comma or semicolon
// newline says whether the field ends with a newline or with a comma
const wchar_t *nextCsvField(const wchar_t *p, wchar_t sep, bool *newline)
{
// Parse quoted sequences
if ('"' == p[0]) {
p++;
while (1) {
// Find next double-quote
p = wcschr(p, L'"');
// If we don't find it or it's the last symbol
// then this is the last field
if (!p || !p[1])
return 0;
// Check for"", it is an escaped double-quote
if (p[1] != '"')
break;
// Skip the escaped double-quote
p += 2;
}
}

// Find next newline or comma.
wchar_t newline_or_sep[4] = L"

";
newline_or_sep[2] = sep;
p = wcspbrk(p, newline_or_sep);

// If no newline or separator, this is the last field.
if (!p)
return 0;

// Check if we had newline.
*newline = (p[0] == '
' || p[0] == '
');

// Handle"

", otherwise just increment
if (p[0] == '
' && p[1] == '
')
p += 2;
else
p++;

return p;
}

static wchar_t *csvFieldData(const wchar_t *fld_s, const wchar_t *fld_e, wchar_t *buffer, size_t buflen)
{
wchar_t *dst = buffer;
wchar_t *end = buffer + buflen - 1;
const wchar_t *src = fld_s;

if (*src == L'"')
{
const wchar_t *p = src + 1;
while (p < fld_e && dst < end)
{
if (p[0] == L'"' && p+1 < fld_s && p[1] == L'"')
{
*dst++ = p[0];
p += 2;
}
else if (p[0] == L'"')
{
p++;
break;
}
else
*dst++ = *p++;
}
src = p;
}
while (src < fld_e && dst < end)
*dst++ = *src++;
if (dst >= end)
return 0;
*dst = L'\0';
return(buffer);
}

static void dissect(const wchar_t *line)
{
const wchar_t *start = line;
const wchar_t *next;
bool eol;
wprintf(L"Input %3zd: [%.*ls]
", wcslen(line), wcslen(line)-1, line);
while ((next = nextCsvField(start, L',', &eol)) != 0)
{
wchar_t buffer[1024];
wprintf(L"Raw Field: [%.*ls] (eol = %d)
", (next - start - eol), start, eol);
if (csvFieldData(start, next-1, buffer, sizeof(buffer)/sizeof(buffer[0])) != 0)
wprintf(L"Field %3zd: [%ls]
", wcslen(buffer), buffer);
start = next;
}
}

static const wchar_t multiline[] =
L"First field of first row,"This field is multiline
"
"
"
"but that's OK because it's enclosed in double quotes, and this
"
"is an escaped "" double quote" but this one "" is not
"
" "This is second field of second row, but it is not multiline
"
" because it doesn't start
"
" with an immediate double quote"
"
;

int main(void)
{
wchar_t line[1024];

while (fgetws(line, sizeof(line)/sizeof(line[0]), stdin))
dissect(line);
dissect(multiline);

return 0;
}

对不起，这一切似乎都是隐藏几行代码的大量复杂语法。

为什么不这样：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

/**

Read line from a CSV file

@param[in] fp file pointer to open file
@param[in] vls reference to vector of strings to hold next line

*/
void readCSV( FILE *fp, std::vector<std::string>& vls )
{
vls.clear();
if( ! fp )
return;
char buf[10000];
if( ! fgets( buf,999,fp) )
return;
std::string s = buf;
int p,q;
q = -1;
// loop over columns
while( 1 ) {
p = q;
q = s.find_first_of(",
",p+1);
if( q == -1 )
break;
vls.push_back( s.substr(p+1,q-p-1) );
}
}

int _tmain(int argc, _TCHAR* argv[])
{
std::vector<std::string> vls;
FILE * fp = fopen( argv[1],"r" );
if( ! fp )
return 1;
readCSV( fp, vls );
readCSV( fp, vls );
readCSV( fp, vls );
std::cout <<"row 3, col 4 is" << vls[3].c_str() <<"
";

return 0;
}

相关讨论

您可以使用fopen、fscanf函数打开和读取.csv文件，但重要的是分析数据。使用分隔符分析数据的最简单方法。对于.csv，分隔符是"，"。

假设您的data1.csv文件如下：

1
2
3
4

A,45,76,01
B,77,67,02
C,63,76,03
D,65,44,04

您可以标记数据并存储在char数组中，然后使用atoi()等函数进行适当的转换。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

FILE *fp;
char str1[10], str2[10], str3[10], str4[10];

fp = fopen("G:\\data1.csv","r");
if(NULL == fp)
{
printf("
Error in opening file.");
return 0;
}
while(EOF != fscanf(fp," %[^,], %[^,], %[^,], %s, %s, %s, %s", str1, str2, str3, str4))
{
printf("
%s %s %s %s", str1, str2, str3, str4);
}
fclose(fp);

[^，]，^-它反转逻辑，表示匹配任何不包含逗号的字符串，然后是最后一个，表示匹配终止前一个字符串的逗号。

当你使用像boost::spirit这样漂亮的东西时，你会感到骄傲。

在这里，我尝试使用一个解析器(几乎)来符合这个链接的csv规范(字段中不需要换行符)。此外，逗号周围的空格将被取消)。

在你克服了等待10秒钟来编译这段代码的令人震惊的经历之后，你可以坐下来享受。

// csvparser.cpp
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>

#include <iostream>
#include <string>

namespace qi = boost::spirit::qi;
namespace bascii = boost::spirit::ascii;

template <typename Iterator>
struct csv_parser : qi::grammar<Iterator, std::vector<std::string>(),
bascii::space_type>
{
qi::rule<Iterator, char() > COMMA;
qi::rule<Iterator, char() > DDQUOTE;
qi::rule<Iterator, std::string(), bascii::space_type > non_escaped;
qi::rule<Iterator, std::string(), bascii::space_type > escaped;
qi::rule<Iterator, std::string(), bascii::space_type > field;
qi::rule<Iterator, std::vector<std::string>(), bascii::space_type > start;

csv_parser() : csv_parser::base_type(start)
{
using namespace qi;
using qi::lit;
using qi::lexeme;
using bascii::char_;

start = field % ',';
field = escaped | non_escaped;
escaped = lexeme['"' >> *( char_ -(char_('"') | ',') | COMMA | DDQUOTE) >> '"'];
non_escaped = lexeme[ *( char_ -(char_('"') | ',') ) ];
DDQUOTE = lit("""") [_val = '"'];
COMMA = lit(",") [_val = ','];
}

};

int main()
{
std::cout <<"Enter CSV lines [empty] to quit
";

using bascii::space;
typedef std::string::const_iterator iterator_type;
typedef csv_parser<iterator_type> csv_parser;

csv_parser grammar;
std::string str;
int fid;
while (getline(std::cin, str))
{
fid = 0;

if (str.empty())
break;

std::vector<std::string> csv;
std::string::const_iterator it_beg = str.begin();
std::string::const_iterator it_end = str.end();
bool r = phrase_parse(it_beg, it_end, grammar, space, csv);

if (r && it_beg == it_end)
{
std::cout <<"Parsing succeeded
";
for (auto& field: csv)
{
std::cout <<"field" << ++fid <<":" << field << std::endl;
}
}
else
{
std::cout <<"Parsing failed
";
}
}

return 0;
}

编译：

1	make csvparser

测试(例如从维基百科被盗)：

1
2
3
4
5
6
7
8
9
10
11
12
13

./csvparser
Enter CSV lines [empty] to quit

1999,Chevy,"Venture""Extended Edition, Very Large""",,5000.00
Parsing succeeded
field 1: 1999
field 2: Chevy
field 3: Venture"Extended Edition, Very Large"
field 4:
field 5: 5000.00

1999,Chevy,"Venture""Extended Edition, Very Large""",,5000.00"
Parsing failed

此解决方案检测这4个案例

全班在

https://github.com/pedro-vicente/csv-parser

1
2
3
4
5
6

1,field 2,field 3,
1,field 2,"field 3 quoted, with separator",
1,field 2,"field 3
with newline",
1,field 2,"field 3
with newline and separator,",

它逐字符读取文件，一次读取一行到一个向量(字符串)，因此适用于非常大的文件。

用法是

迭代直到返回空行(文件结尾)。行是一个向量，其中每个条目都是一个csv列。

1
2
3
4
5
6
7
8
9
10
11

read_csv_t csv;
csv.open("../test.csv");
std::vector<std::string> row;
while (true)
{
row = csv.read_row();
if (row.size() == 0)
{
break;
}
}

阶级宣言

1
2
3
4
5
6
7
8
9

class read_csv_t
{
public:
read_csv_t();
int open(const std::string &file_name);
std::vector<std::string> read_row();
private:
std::ifstream m_ifs;
};

实施

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72

std::vector<std::string> read_csv_t::read_row()
{
bool quote_mode = false;
std::vector<std::string> row;
std::string column;
char c;
while (m_ifs.get(c))
{
switch (c)
{
/////////////////////////////////////////////////////////////////////////////////////////////////////
//separator ',' detected.
//in quote mode add character to column
//push column if not in quote mode
/////////////////////////////////////////////////////////////////////////////////////////////////////

case ',':
if (quote_mode == true)
{
column += c;
}
else
{
row.push_back(column);
column.clear();
}
break;

/////////////////////////////////////////////////////////////////////////////////////////////////////
//quote '"' detected.
//toggle quote mode
/////////////////////////////////////////////////////////////////////////////////////////////////////

case '"':
quote_mode = !quote_mode;
break;

/////////////////////////////////////////////////////////////////////////////////////////////////////
//line end detected
//in quote mode add character to column
//return row if not in quote mode
/////////////////////////////////////////////////////////////////////////////////////////////////////

case '
':
case '
':
if (quote_mode == true)
{
column += c;
}
else
{
return row;
}
break;

/////////////////////////////////////////////////////////////////////////////////////////////////////
//default, add character to column
/////////////////////////////////////////////////////////////////////////////////////////////////////

default:
column += c;
break;
}
}

//return empty vector if end of file detected
m_ifs.close();
std::vector<std::string> v;
return v;
}

这里是读取矩阵的代码，注意您在matlab中还有一个csvwrite函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

void loadFromCSV( const std::string& filename )
{
std::ifstream file( filename.c_str() );
std::vector< std::vector<std::string> > matrix;
std::vector<std::string> row;
std::string line;
std::string cell;

while( file )
{
std::getline(file,line);
std::stringstream lineStream(line);
row.clear();

while( std::getline( lineStream, cell, ',' ) )
row.push_back( cell );

if( !row.empty() )
matrix.push_back( row );
}

for( int i=0; i<int(matrix.size()); i++ )
{
for( int j=0; j<int(matrix[i].size()); j++ )
std::cout << matrix[i][j] <<"";

std::cout << std::endl;
}
}

您需要做的第一件事是确保文件存在。完成您只需要尝试在路径上打开文件流。在你之后已打开文件流，请使用stream.fail()查看它是否按预期工作，或者没有。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

bool fileExists(string fileName)
{

ifstream test;

test.open(fileName.c_str());

if (test.fail())
{
test.close();
return false;
}
else
{
test.close();
return true;
}
}

您还必须验证提供的文件是正确的文件类型。要完成此操作，您需要查看提供的文件路径，直到找到文件扩展名。一旦您有了文件扩展名，请确保它是一个.csv文件。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

bool verifyExtension(string filename)
{
int period = 0;

for (unsigned int i = 0; i < filename.length(); i++)
{
if (filename[i] == '.')
period = i;
}

string extension;

for (unsigned int i = period; i < filename.length(); i++)
extension += filename[i];

if (extension ==".csv")
return true;
else
return false;
}

此函数将返回稍后在错误消息中使用的文件扩展名。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

string getExtension(string filename)
{
int period = 0;

for (unsigned int i = 0; i < filename.length(); i++)
{
if (filename[i] == '.')
period = i;
}

string extension;

if (period != 0)
{
for (unsigned int i = period; i < filename.length(); i++)
extension += filename[i];
}
else
extension ="NO FILE";

return extension;
}

此函数将实际调用上面创建的错误检查，然后通过文件进行分析。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

void parseFile(string fileName)
{
if (fileExists(fileName) && verifyExtension(fileName))
{
ifstream fs;
fs.open(fileName.c_str());
string fileCommand;

while (fs.good())
{
string temp;

getline(fs, fileCommand, '
');

for (unsigned int i = 0; i < fileCommand.length(); i++)
{
if (fileCommand[i] != ',')
temp += fileCommand[i];
else
temp +="";
}

if (temp !="\0")
{
// Place your code here to run the file.
}
}
fs.close();
}
else if (!fileExists(fileName))
{
cout <<"Error: The provided file does not exist:" << fileName << endl;

if (!verifyExtension(fileName))
{
if (getExtension(fileName) !="NO FILE")
cout <<"\tCheck the file extension." << endl;
else
cout <<"\tThere is no file in the provided path." << endl;
}
}
else if (!verifyExtension(fileName))
{
if (getExtension(fileName) !="NO FILE")
cout <<"Incorrect file extension provided:" << getExtension(fileName) << endl;
else
cout <<"There is no file in the following path:" << fileName << endl;
}
}

您还可以查看Qt库的功能。

它支持正则表达式，qstring类有很好的方法，例如split()返回qstring list，通过使用提供的分隔符拆分原始字符串获得的字符串列表。对于csv文件应该足够。

要获得具有给定标题名称的列，我使用以下内容：C++继承QT问题qSnk

相关讨论

如果你不想在你的项目中包含Boost(如果你要用它来进行csv解析的话，这就相当大了…)

我很幸运能在这里进行csv解析：

http://www.zedwood.com/article/112/cpp-csv-parser

它处理带引号的字段-但不处理内联字符(这对于大多数使用来说可能很好)。

相关讨论

这是一个老线程，但它仍然是搜索结果的顶部，所以我在这里找到了使用STD：：String Strand和YVEBuMax的简单字符串替换方法的解决方案。

下面的示例将逐行读取文件，忽略以//开头的注释行，并将其他行解析为字符串、int和double的组合。Stringstream进行分析，但需要用空格分隔字段，所以我首先使用StringReplace将逗号转换为空格。它可以处理制表符，但不处理带引号的字符串。

错误或丢失的输入只是被忽略，这可能是好的，也可能不是好的，这取决于您的情况。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

#include <string>
#include <sstream>
#include <fstream>

void StringReplace(std::string& str, const std::string& oldStr, const std::string& newStr)
// code by Yves Baumes
// http://stackoverflow.com/questions/1494399/how-do-i-search-find-and-replace-in-a-standard-string
{
size_t pos = 0;
while((pos = str.find(oldStr, pos)) != std::string::npos)
{
str.replace(pos, oldStr.length(), newStr);
pos += newStr.length();
}
}

void LoadCSV(std::string &filename) {
std::ifstream stream(filename);
std::string in_line;
std::string Field;
std::string Chan;
int ChanType;
double Scale;
int Import;
while (std::getline(stream, in_line)) {
StringReplace(in_line,",","");
std::stringstream line(in_line);
line >> Field >> Chan >> ChanType >> Scale >> Import;
if (Field.substr(0,2)!="//") {
// do your stuff
// this is CBuilder code for demonstration, sorry
ShowMessage((String)Field.c_str() +"
" + Chan.c_str() +"
" + IntToStr(ChanType) +"
" +FloatToStr(Scale) +"
" +IntToStr(Import));
}
}
}

这里有一个随时可用的函数，如果您只需要加载一个双精度(没有整数，没有文本)的数据文件。

#include <sstream>
#include <fstream>
#include <iterator>
#include <string>
#include <vector>
#include

using namespace std;

/**
* Parse a CSV data file and fill the 2d STL vector"data".
* Limits: only"pure datas" of doubles, not encapsulated by" and without
inside.
* Further no formatting in the data (e.g. scientific notation)
* It however handles both dots and commas as decimal separators and removes thousand separator.
*
* returnCodes[0]: file access 0-> ok 1-> not able to read; 2-> decimal separator equal to comma separator
* returnCodes[1]: number of records
* returnCodes[2]: number of fields. -1 If rows have different field size
*
*/
vector<int>
readCsvData (vector <vector <double>>& data, const string& filename, const string& delimiter, const string& decseparator){

int vv[3] = { 0,0,0 };
vector<int> returnCodes(&vv[0], &vv[0]+3);

string rowstring, stringtoken;
double doubletoken;
int rowcount=0;
int fieldcount=0;
data.clear();

ifstream iFile(filename, ios_base::in);
if (!iFile.is_open()){
returnCodes[0] = 1;
return returnCodes;
}
while (getline(iFile, rowstring)) {
if (rowstring=="") continue; // empty line
rowcount ++; //let's start with 1
if(delimiter == decseparator){
returnCodes[0] = 2;
return returnCodes;
}
if(decseparator !="."){
// remove dots (used as thousand separators)
string::iterator end_pos = remove(rowstring.begin(), rowstring.end(), '.');
rowstring.erase(end_pos, rowstring.end());
// replace decimal separator with dots.
replace(rowstring.begin(), rowstring.end(),decseparator.c_str()[0], '.');
} else {
// remove commas (used as thousand separators)
string::iterator end_pos = remove(rowstring.begin(), rowstring.end(), ',');
rowstring.erase(end_pos, rowstring.end());
}
// tokenize..
vector<double> tokens;
// Skip delimiters at beginning.
string::size_type lastPos = rowstring.find_first_not_of(delimiter, 0);
// Find first"non-delimiter".
string::size_type pos = rowstring.find_first_of(delimiter, lastPos);
while (string::npos != pos || string::npos != lastPos){
// Found a token, convert it to double add it to the vector.
stringtoken = rowstring.substr(lastPos, pos - lastPos);
if (stringtoken =="") {
tokens.push_back(0.0);
} else {
istringstream totalSString(stringtoken);
totalSString >> doubletoken;
tokens.push_back(doubletoken);
}
// Skip delimiters. Note the"not_of"
lastPos = rowstring.find_first_not_of(delimiter, pos);
// Find next"non-delimiter"
pos = rowstring.find_first_of(delimiter, lastPos);
}
if(rowcount == 1){
fieldcount = tokens.size();
returnCodes[2] = tokens.size();
} else {
if ( tokens.size() != fieldcount){
returnCodes[2] = -1;
}
}
data.push_back(tokens);
}
iFile.close();
returnCodes[1] = rowcount;
return returnCodes;
}

另一种快速简便的方法是使用Boost.Fusion I/O：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

#include <iostream>
#include <sstream>

#include <boost/fusion/adapted/boost_tuple.hpp>
#include <boost/fusion/sequence/io.hpp>

namespace fusion = boost::fusion;

struct CsvString
{
std::string value;

// Stop reading a string once a CSV delimeter is encountered.
friend std::istream& operator>>(std::istream& s, CsvString& v) {
v.value.clear();
for(;;) {
auto c = s.peek();
if(std::istream::traits_type::eof() == c || ',' == c || '
' == c)
break;
v.value.push_back(c);
s.get();
}
return s;
}

friend std::ostream& operator<<(std::ostream& s, CsvString const& v) {
return s << v.value;
}
};

int main() {
std::stringstream input("abc,123,true,3.14
"
"def,456,false,2.718
");

typedef boost::tuple<CsvString, int, bool, double> CsvRow;

using fusion::operator<<;
std::cout << std::boolalpha;

using fusion::operator>>;
input >> std::boolalpha;
input >> fusion::tuple_open("") >> fusion::tuple_close("
") >> fusion::tuple_delimiter(',');

for(CsvRow row; input >> row;)
std::cout << row << '
';
}

输出：

1 2	(abc 123 true 3.14) (def 456 false 2.718)

我写了一个解析csv文件的好方法，我想我应该把它作为一个答案添加进去：

#include
#include <fstream>
#include <iostream>
#include <stdlib.h>
#include <stdio.h>

struct CSVDict
{
std::vector< std::string > inputImages;
std::vector< double > inputLabels;
};

/**
\brief Splits the string

\param str String to split
\param delim Delimiter on the basis of which splitting is to be done

eturn results Output in the form of vector of strings
*/
std::vector<std::string> stringSplit( const std::string &str, const std::string &delim )
{
std::vector<std::string> results;

for (size_t i = 0; i < str.length(); i++)
{
std::string tempString ="";
while ((str[i] != *delim.c_str()) && (i < str.length()))
{
tempString += str[i];
i++;
}
results.push_back(tempString);
}

return results;
}

/**
\brief Parse the supplied CSV File and obtain Row and Column information.

Assumptions:
1. Header information is in first row
2. Delimiters are only used to differentiate cell members

\param csvFileName The full path of the file to parse
\param inputColumns The string of input columns which contain the data to be used for further processing
\param inputLabels The string of input labels based on which further processing is to be done
\param delim The delimiters used in inputColumns and inputLabels

eturn Vector of Vector of strings: Collection of rows and columns
*/
std::vector< CSVDict > parseCSVFile( const std::string &csvFileName, const std::string &inputColumns, const std::string &inputLabels, const std::string &delim )
{
std::vector< CSVDict > return_CSVDict;
std::vector< std::string > inputColumnsVec = stringSplit(inputColumns, delim), inputLabelsVec = stringSplit(inputLabels, delim);
std::vector< std::vector< std::string > > returnVector;
std::ifstream inFile(csvFileName.c_str());
int row = 0;
std::vector< size_t > inputColumnIndeces, inputLabelIndeces;
for (std::string line; std::getline(inFile, line, '
');)
{
CSVDict tempDict;
std::vector< std::string > rowVec;
line.erase(std::remove(line.begin(), line.end(), '"'), line.end());
rowVec = stringSplit(line, delim);

// for the first row, record the indeces of the inputColumns and inputLabels
if (row == 0)
{
for (size_t i = 0; i < rowVec.size(); i++)
{
for (size_t j = 0; j < inputColumnsVec.size(); j++)
{
if (rowVec[i] == inputColumnsVec[j])
{
inputColumnIndeces.push_back(i);
}
}
for (size_t j = 0; j < inputLabelsVec.size(); j++)
{
if (rowVec[i] == inputLabelsVec[j])
{
inputLabelIndeces.push_back(i);
}
}
}
}
else
{
for (size_t i = 0; i < inputColumnIndeces.size(); i++)
{
tempDict.inputImages.push_back(rowVec[inputColumnIndeces[i]]);
}
for (size_t i = 0; i < inputLabelIndeces.size(); i++)
{
double test = std::atof(rowVec[inputLabelIndeces[i]].c_str());
tempDict.inputLabels.push_back(std::atof(rowVec[inputLabelIndeces[i]].c_str()));
}
return_CSVDict.push_back(tempDict);
}
row++;
}

return return_CSVDict;
}

可以使用std::regex。

根据文件的大小和可用的内存，可以逐行或完全在std::string中读取。

要读取文件，可以使用：

1
2
3

std::ifstream t("file.txt");
std::string sin((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());

然后您可以匹配这个，它实际上是根据您的需要定制的。

1
2
3
4
5
6
7
8
9
10

std::regex word_regex(",\\s]+");
auto what =
std::sregex_iterator(sin.begin(), sin.end(), word_regex);
auto wend = std::sregex_iterator();

std::vector<std::string> v;
for (;what!=wend ; wend) {
std::smatch match = *what;
v.push_back(match.str());
}

因为我现在不习惯提振，我建议一个更简单的解决方案。假设.csv文件有100行，每行有10个数字，用"，"分隔。可以使用以下代码以数组的形式加载此数据：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
using namespace std;

int main()
{
int A[100][10];
ifstream ifs;
ifs.open("name_of_file.csv");
string s1;
char c;
for(int k=0; k<100; k++)
{
getline(ifs,s1);
stringstream stream(s1);
int j=0;
while(1)
{
stream >>A[k][j];
stream >> c;
j++;
if(!stream) {break;}
}
}

}

我需要一个易于使用的C++库解析CSV文件，但找不到任何可用的，所以我最终建立了一个。RAPIDCSV是一个C++ 11的头文件库，它在选择的数据类型中直接访问解析列(或行)作为向量。例如：

1
2
3
4
5
6
7
8
9
10
11

#include <iostream>
#include <vector>
#include <rapidcsv.h>

int main()
{
rapidcsv::Document doc("../tests/msft.csv");

std::vector<float> close = doc.GetColumn<float>("Close");
std::cout <<"Read" << close.size() <<" values." << std::endl;
}

相关讨论

您可以使用此库：https://github.com/vadamsky/csvworker

代码例如：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

#include <iostream>
#include"csvworker.h"

using namespace std;

int main()
{
//
CsvWorker csv;
csv.loadFromFile("example.csv");
cout << csv.getRowsNumber() <<" " << csv.getColumnsNumber() << endl;

csv.getFieldRef(0, 2) ="0";
csv.getFieldRef(1, 1) ="0";
csv.getFieldRef(1, 3) ="0";
csv.getFieldRef(2, 0) ="0";
csv.getFieldRef(2, 4) ="0";
csv.getFieldRef(3, 1) ="0";
csv.getFieldRef(3, 3) ="0";
csv.getFieldRef(4, 2) ="0";

for(unsigned int i=0;i<csv.getRowsNumber();++i)
{
//cout << csv.getRow(i) << endl;
for(unsigned int j=0;j<csv.getColumnsNumber();++j)
{
cout << csv.getField(i, j) <<".";
}
cout << endl;
}

csv.saveToFile("test.csv");

//
CsvWorker csv2(4,4);

csv2.getFieldRef(0, 0) ="a";
csv2.getFieldRef(0, 1) ="b";
csv2.getFieldRef(0, 2) ="r";
csv2.getFieldRef(0, 3) ="a";
csv2.getFieldRef(1, 0) ="c";
csv2.getFieldRef(1, 1) ="a";
csv2.getFieldRef(1, 2) ="d";
csv2.getFieldRef(2, 0) ="a";
csv2.getFieldRef(2, 1) ="b";
csv2.getFieldRef(2, 2) ="r";
csv2.getFieldRef(2, 3) ="a";

csv2.saveToFile("test2.csv");

return 0;
}

对于它的价值，这里是我的实现。它处理wstring输入，但可以很容易地调整为字符串。它不处理字段中的换行符(因为我的应用程序也不处理，但是添加它的支持并不太困难)，它不符合RFC(如使用STD：：GETLIN)的"
"的结尾，但是它确实处理了空白修剪和双引号(希望)。

using namespace std;

// trim whitespaces around field or double-quotes, remove double-quotes and replace escaped double-quotes (double double-quotes)
wstring trimquote(const wstring& str, const wstring& whitespace, const wchar_t quotChar)
{
wstring ws;
wstring::size_type strBegin = str.find_first_not_of(whitespace);
if (strBegin == wstring::npos)
return L"";

wstring::size_type strEnd = str.find_last_not_of(whitespace);
wstring::size_type strRange = strEnd - strBegin + 1;

if((str[strBegin] == quotChar) && (str[strEnd] == quotChar))
{
ws = str.substr(strBegin+1, strRange-2);
strBegin = 0;
while((strEnd = ws.find(quotChar, strBegin)) != wstring::npos)
{
ws.erase(strEnd, 1);
strBegin = strEnd+1;
}

}
else
ws = str.substr(strBegin, strRange);
return ws;
}

pair<unsigned, unsigned> nextCSVQuotePair(const wstring& line, const wchar_t quotChar, unsigned ofs = 0)
{
pair<unsigned, unsigned> r;
r.first = line.find(quotChar, ofs);
r.second = wstring::npos;
if(r.first != wstring::npos)
{
r.second = r.first;
while(((r.second = line.find(quotChar, r.second+1)) != wstring::npos)
&& (line[r.second+1] == quotChar)) // WARNING: assumes null-terminated string such that line[r.second+1] always exist
r.second++;

}
return r;
}

unsigned parseLine(vector<wstring>& fields, const wstring& line)
{
unsigned ofs, ofs0, np;
const wchar_t delim = L',';
const wstring whitespace = L" \t\xa0\x3000\x2000\x2001\x2002\x2003\x2004\x2005\x2006\x2007\x2008\x2009\x200a\x202f\x205f";
const wchar_t quotChar = L'"';
pair<unsigned, unsigned> quot;

fields.clear();

ofs = ofs0 = 0;
quot = nextCSVQuotePair(line, quotChar);
while((np = line.find(delim, ofs)) != wstring::npos)
{
if((np > quot.first) && (np < quot.second))
{ // skip delimiter inside quoted field
ofs = quot.second+1;
quot = nextCSVQuotePair(line, quotChar, ofs);
continue;
}
fields.push_back( trimquote(line.substr(ofs0, np-ofs0), whitespace, quotChar) );
ofs = ofs0 = np+1;
}
fields.push_back( trimquote(line.substr(ofs0), whitespace, quotChar) );

return fields.size();
}

我有一种更快的解决方法，最初是针对这个问题：

如何拉不同弦的特定部分？

但是很明显它是关闭的。不过，我不会把这个扔掉的：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

#include <iostream>
#include <string>
#include <regex>

std::string text =""4,""3"",""Mon May 11 03:17:40 UTC 2009"",""kindle2"",""tpryan"",""TEXT HERE""";;;;";

int main()
{
std::regex r("(".*")(".*")(".*")(".*")(".*")(".*")(".*")(".*")(".*")(".*")");
std::smatch m;
std::regex_search(text, m, r);
std::cout<<"FOUND:"<<m[9]<<std::endl;

return 0;
}

只需根据索引从smatch集合中选择您想要的匹配项。正则表达式是极乐世界。

相关讨论

如果您使用的是Visual Studio/MFC，以下解决方案可能会使您的生活更轻松。它支持unicode和mbcs，有注释，除了cstring没有依赖项，并且对我来说工作得很好。它不支持嵌入在引用字符串中的换行符，但我不在乎它在这种情况下是否崩溃，而事实并非如此。

总体策略是，将引用字符串和空字符串作为特殊情况处理，并对其余字符串使用标记化。对于带引号的字符串，策略是找到真正的右引号，跟踪是否遇到了成对的连续引号。如果是，则使用replace将对转换为single。毫无疑问，有更有效的方法，但在我的案例中，性能并没有足够的关键性来证明进一步优化的合理性。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70

class CParseCSV {
public:
// Construction
CParseCSV(const CString& sLine);

// Attributes
bool GetString(CString& sDest);

protected:
CString m_sLine; // line to extract tokens from
int m_nLen; // line length in characters
int m_iPos; // index of current position
};

CParseCSV::CParseCSV(const CString& sLine) : m_sLine(sLine)
{
m_nLen = m_sLine.GetLength();
m_iPos = 0;
}

bool CParseCSV::GetString(CString& sDest)
{
if (m_iPos < 0 || m_iPos > m_nLen) // if position out of range
return false;
if (m_iPos == m_nLen) { // if at end of string
sDest.Empty(); // return empty token
m_iPos = -1; // really done now
return true;
}
if (m_sLine[m_iPos] == '"') { // if current char is double quote
m_iPos++; // advance to next char
int iTokenStart = m_iPos;
bool bHasEmbeddedQuotes = false;
while (m_iPos < m_nLen) { // while more chars to parse
if (m_sLine[m_iPos] == '"') { // if current char is double quote
// if next char exists and is also double quote
if (m_iPos < m_nLen - 1 && m_sLine[m_iPos + 1] == '"') {
// found pair of consecutive double quotes
bHasEmbeddedQuotes = true; // request conversion
m_iPos++; // skip first quote in pair
} else // next char doesn't exist or is normal
break; // found closing quote; exit loop
}
m_iPos++; // advance to next char
}
sDest = m_sLine.Mid(iTokenStart, m_iPos - iTokenStart);
if (bHasEmbeddedQuotes) // if string contains embedded quote pairs
sDest.Replace(_T(""""), _T(""")); // convert pairs to singles
m_iPos += 2; // skip closing quote and trailing delimiter if any
} else if (m_sLine[m_iPos] == ',') { // else if char is comma
sDest.Empty(); // return empty token
m_iPos++; // advance to next char
} else { // else get next comma-delimited token
sDest = m_sLine.Tokenize(_T(","), m_iPos);
}
return true;
}

// calling code should look something like this:

CStdioFile fIn(pszPath, CFile::modeRead);
CString sLine, sToken;
while (fIn.ReadString(sLine)) { // for each line of input file
if (!sLine.IsEmpty()) { // ignore blank lines
CParseCSV csv(sLine);
while (csv.GetString(sToken)) {
// do something with sToken here
}
}
}