如何使用CSV文件中的标题从CSV文件复制到PostgreSQL表？

How to copy from CSV file to PostgreSQL table with headers in CSV file?

我想将CSV文件复制到Postgres表。这个表中大约有100列，所以如果我不需要，我不想重写它们。

我正在使用\copy table from 'table.csv' delimiter ',' csv;命令但没有创建表我得到ERROR: relation"table" does not exist。如果我添加一个空白表我没有错误，但没有任何反应。我尝试了这个命令两三次，没有输出或消息，但是当我通过PGAdmin检查时表没有更新。

有没有办法导入包含标题的表，就像我想要做的那样？

相关讨论

这很有效。第一行中包含列名。

1	COPY wheat FROM 'wheat_crop_data.csv' DELIMITER ';' CSV HEADER

相关讨论

使用Python库pandas，您可以轻松地从csv文件创建列名并推断数据类型。

1
2
3
4
5
6

FROM sqlalchemy import create_engine
import pandas AS pd

engine = create_engine('postgresql://user:pass@localhost/db_name')
df = pd.read_csv('/path/to/csv_file')
df.to_sql('pandas_db', engine)

可以将if_exists参数设置为替换或附加到现有表，例如df.to_sql('pandas_db', engine, if_exists='replace')。这适用于其他输入文件类型，这里和这里的文档。

相关讨论

终端的替代方案未经许可

NOTES的pg文档
说

The path will be interpreted relative to the working directory of the server process (normally the cluster's data directory), not the client's working directory.

因此，从字面上看，使用psql或任何客户端，即使在本地服务器中，也存在问题......并且，如果您正在为其他用户表达COPY命令，例如。在Github自述文件中，读者会遇到问题......

使用客户端权限表达相对路径的唯一方法是使用STDIN，

When STDIN or STDOUT is specified, data is transmitted via the connection between the client and the server.

记得在这里：

1
2
3

psql -h remotehost -d remote_mydb -U myuser -c \
"copy mytable (column1, column2) from STDIN with delimiter as ','" \
< ./relative_path/file.csv

我已经使用这个功能一段时间没有问题。您只需要提供csv文件中的数字列，它将从第一行获取标题名称并为您创建表：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

CREATE OR REPLACE FUNCTION DATA.load_csv_file
(
target_table text, -- name of the table that will be created
csv_file_path text,
col_count INTEGER
)

RETURNS void

AS $$

DECLARE
iter INTEGER; -- dummy integer to iterate columns with
col text; -- to keep column names in each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet

BEGIN
SET schema 'data';

CREATE TABLE temp_table ();

-- add just enough number of columns
FOR iter IN 1..col_count
loop
EXECUTE format ('alter table temp_table add column col_%s text;', iter);
END loop;

-- copy the data from csv file
EXECUTE format ('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_file_path);

iter := 1;
col_first := (SELECT col_1
FROM temp_table
LIMIT 1);

-- update the column names based on the first row which has the column names
FOR col IN EXECUTE format ('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
loop
EXECUTE format ('alter table temp_table rename column col_%s to %s', iter, col);
iter := iter + 1;
END loop;

-- delete the columns row // using quote_ident or %I does not work here!?
EXECUTE format ('delete from temp_table where %s = %L', col_first, col_first);

-- change the temp table name to the name given as parameter, if not blank
IF LENGTH (target_table) > 0 THEN
EXECUTE format ('alter table temp_table rename to %I', target_table);
END IF;
END;

$$ LANGUAGE plpgsql;

相关讨论

您可以使用d6tstack为您创建表，并且比pd.to_sql()更快，因为它使用本机数据库导入命令。它支持Postgres以及MYSQL和MS SQL。

1
2
3
4

import pandas AS pd
df = pd.read_csv('table.csv')
uri_psql = 'postgresql+psycopg2://usr:pwd@localhost/db'
d6tstack.utils.pd_to_psql(df, uri_psql, 'table')

在写入db之前，它还可用于导入多个CSV，解决数据模式更改和/或使用pandas进行预处理(例如日期)，请参阅示例笔记本中的更多内容

1 2	d6tstack.combine_csv.CombinerCSV(glob.glob('*.csv'), apply_after_read=apply_fun).to_psql_combine(uri_psql, 'table')