The talk shows an example that contains 7 tables: one person-table, 3 tables for details for a person (street, im and phone), which contains contact-data for the person, and 3 additional tables to map each person to multiple streets, im and phone.
Assumed, there are 2 entries for street, im and phone for a person, this results in 1 insert into the person-table, respectively 2 inserts into street, im and phone. In addition it needs 2 inserts for every joining-table.
With the new feature 'writeable CTE', the effort is reduced to single, (but complex) SQL-command, which contains - among others - all inserts. Of course, this procedure is complex.
In the following posting i want to analyse, if this single, complex command is faster than many simple traditional commands.
For this purpose i wrote 2 plpgsql-functions, which generate proper SQL-commands, one for the old style with many inserts
and one for the new syntax, both functions with exactly the same amount of data for the inserts.
Here are the two functions:
create or replace function fill_tables(IN c int, OUT sql text) returns text as $$ declare c_im int; c_phone int; c_street int; count int; begin c_im:= 1 + strpos(md5(c::text),'a'); c_phone := 1 + strpos(md5(c::text),'b'); c_street := 1 + strpos(md5(c::text),'c'); sql := E'INSERT INTO person (first_name, last_name, birthdate) VALUES (\'' || md5(c::text) || E'\', \'' || md5((c+1)::text) || E'\', \'' || ('1970-01-01'::date + ('0' || regexp_replace(substring(md5((c)::text),0,8),'[^0-9]','','g'))::int * '1 minute'::interval)::date || E'\');\n'; for count in 0 .. c_im loop sql := sql || E'INSERT INTO im (provider, handle) VALUES (\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\');\n'; sql := sql || E'INSERT INTO person_im(person_id, im_id) VALUES (' || E'currval(\'person_id_seq\'),currval(\'im_id_seq\'));\n'; end loop; for count in 0 .. c_phone loop sql := sql || E'INSERT INTO phone (country_code, phone_number, extension) VALUES (\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\');\n'; sql := sql || E'INSERT INTO person_phone(person_id, phone_id) VALUES (' || E'currval(\'person_id_seq\'),currval(\'phone_id_seq\'));\n'; end loop; for count in 0 .. c_street loop sql = sql || E'INSERT INTO street (street1,city,state,country,post_code) VALUES (\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\');\n'; sql := sql || E'INSERT INTO person_street(person_id, street_id) VALUES (' || E'currval(\'person_id_seq\'),currval(\'street_id_seq\'));\n'; end loop; end; $$ language plpgsql volatile;
And:
create or replace function fill_tables_new(IN c int, OUT sql text) returns text as $$ declare c_im int; c_phone int; c_street int; count int; begin c_im:= 1 + strpos(md5(c::text),'a'); c_phone := 1 + strpos(md5(c::text),'b'); c_street := 1 + strpos(md5(c::text),'c'); sql := E'WITH t_person AS (\n ' || E'INSERT INTO person (first_name, last_name, birthdate) VALUES (\n \'' || md5(c::text) || E'\', \'' || md5((c+1)::text) || E'\', \'' || ('1970-01-01'::date + ('0' || regexp_replace(substring(md5((c)::text),0,10),'[^0-9]','','g'))::int * '1 minute'::interval)::date || E'\') RETURNING id),\n sql := sql || E't_im AS (INSERT INTO im (provider, handle) VALUES '; for count in 0 .. c_im loop sql := sql || E'\n (\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\'),'; end loop; sql := trim(sql,','); sql := sql || E' RETURNING id),\n'; sql := sql || E't_person_im AS (INSERT INTO person_im SELECT * FROM t_person CROSS JOIN t_im),\n'; sql := sql || E't_phone AS (INSERT INTO phone (country_code, phone_number, extension) VALUES '; for count in 0 .. c_phone loop sql := sql || E'\n (\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\'),'; end loop; sql := trim(sql,','); sql := sql || E' RETURNING id),\n'; sql := sql || E't_person_phone AS (INSERT INTO person_phone SELECT * FROM t_person CROSS JOIN t_phone),\n'; sql := sql || E't_street AS (INSERT INTO street (street1, city, state, country, post_code) VALUES '; for count in 0 .. c_street loop sql := sql || E'\n (\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\'),'; end loop; sql := trim(sql,','); sql := sql || E' RETURNING id),\n'; sql := sql || E't_person_street AS (INSERT INTO person_street SELECT * FROM t_person CROSS JOIN t_street)\n'; sql := sql || 'VALUES (true);'; end; $$ language plpgsql volatile;
And here the output of this functions (for example, for one person):
INSERT INTO person (first_name, last_name, birthdate) VALUES ('c4ca4238a0b923820dcc509a6f75849b', 'c81e728d9d4c2f636f067f89cc14862c', '1970-01-04'); INSERT INTO im (provider, handle) VALUES ('c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b'); INSERT INTO person_im(person_id, im_id) VALUES (currval('person_id_seq'),currval('im_id_seq')); INSERT INTO im (provider, handle) VALUES ('c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c'); INSERT INTO person_im(person_id, im_id) VALUES (currval('person_id_seq'),currval('im_id_seq')); INSERT INTO im (provider, handle) VALUES ('eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3'); INSERT INTO person_im(person_id, im_id) VALUES (currval('person_id_seq'),currval('im_id_seq')); INSERT INTO im (provider, handle) VALUES ('a87ff679a2f3e71d9181a67b7542122c','a87ff679a2f3e71d9181a67b7542122c'); INSERT INTO person_im(person_id, im_id) VALUES (currval('person_id_seq'),currval('im_id_seq')); INSERT INTO im (provider, handle) VALUES ('e4da3b7fbbce2345d7772b0674a318d5','e4da3b7fbbce2345d7772b0674a318d5'); INSERT INTO person_im(person_id, im_id) VALUES (currval('person_id_seq'),currval('im_id_seq')); INSERT INTO im (provider, handle) VALUES ('1679091c5a880faf6fb5e6087eb1b2dc','1679091c5a880faf6fb5e6087eb1b2dc'); INSERT INTO person_im(person_id, im_id) VALUES (currval('person_id_seq'),currval('im_id_seq')); INSERT INTO phone (country_code, phone_number, extension) VALUES ('c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b'); INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq')); INSERT INTO phone (country_code, phone_number, extension) VALUES ('c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c'); INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq')); INSERT INTO phone (country_code, phone_number, extension) VALUES ('eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3'); INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq')); INSERT INTO phone (country_code, phone_number, extension) VALUES ('a87ff679a2f3e71d9181a67b7542122c','a87ff679a2f3e71d9181a67b7542122c','a87ff679a2f3e71d9181a67b7542122c'); INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq')); INSERT INTO phone (country_code, phone_number, extension) VALUES ('e4da3b7fbbce2345d7772b0674a318d5','e4da3b7fbbce2345d7772b0674a318d5','e4da3b7fbbce2345d7772b0674a318d5'); INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq')); INSERT INTO phone (country_code, phone_number, extension) VALUES ('1679091c5a880faf6fb5e6087eb1b2dc','1679091c5a880faf6fb5e6087eb1b2dc','1679091c5a880faf6fb5e6087eb1b2dc'); INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq')); INSERT INTO phone (country_code, phone_number, extension) VALUES ('8f14e45fceea167a5a36dedd4bea2543','8f14e45fceea167a5a36dedd4bea2543','8f14e45fceea167a5a36dedd4bea2543'); INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq')); INSERT INTO phone (country_code, phone_number, extension) VALUES ('c9f0f895fb98ab9159f51fd0297e236d','c9f0f895fb98ab9159f51fd0297e236d','c9f0f895fb98ab9159f51fd0297e236d'); INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq')); INSERT INTO phone (country_code, phone_number, extension) VALUES ('45c48cce2e2d7fbdea1afc51c7c6ad26','45c48cce2e2d7fbdea1afc51c7c6ad26','45c48cce2e2d7fbdea1afc51c7c6ad26'); INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq')); INSERT INTO phone (country_code, phone_number, extension) VALUES ('d3d9446802a44259755d38e6d163e820','d3d9446802a44259755d38e6d163e820','d3d9446802a44259755d38e6d163e820'); INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq')); INSERT INTO phone (country_code, phone_number, extension) VALUES ('6512bd43d9caa6e02c990b0a82652dca','6512bd43d9caa6e02c990b0a82652dca','6512bd43d9caa6e02c990b0a82652dca'); INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq')); INSERT INTO phone (country_code, phone_number, extension) VALUES ('c20ad4d76fe97759aa27a0c99bff6710','c20ad4d76fe97759aa27a0c99bff6710','c20ad4d76fe97759aa27a0c99bff6710'); INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq')); INSERT INTO phone (country_code, phone_number, extension) VALUES ('c51ce410c124a10e0db5e4b97fc2af39','c51ce410c124a10e0db5e4b97fc2af39','c51ce410c124a10e0db5e4b97fc2af39'); INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq')); INSERT INTO street (street1,city,state,country,post_code) VALUES ('c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b'); INSERT INTO person_street(person_id, street_id) VALUES (currval('person_id_seq'),currval('street_id_seq')); INSERT INTO street (street1,city,state,country,post_code) VALUES ('c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c'); INSERT INTO person_street(person_id, street_id) VALUES (currval('person_id_seq'),currval('street_id_seq')); INSERT INTO street (street1,city,state,country,post_code) VALUES ('eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3'); INSERT INTO person_street(person_id, street_id) VALUES (currval('person_id_seq'),currval('street_id_seq'));
And, with writeable CTE:
WITH t_person AS ( INSERT INTO person (first_name, last_name, birthdate) VALUES ( 'c4ca4238a0b923820dcc509a6f75849b', 'c81e728d9d4c2f636f067f89cc14862c', '1970-01-31') RETURNING id), t_im AS (INSERT INTO im (provider, handle) VALUES ('c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b'), ('c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c'), ('eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3'), ('a87ff679a2f3e71d9181a67b7542122c','a87ff679a2f3e71d9181a67b7542122c'), ('e4da3b7fbbce2345d7772b0674a318d5','e4da3b7fbbce2345d7772b0674a318d5'), ('1679091c5a880faf6fb5e6087eb1b2dc','1679091c5a880faf6fb5e6087eb1b2dc') RETURNING id), t_person_im AS (INSERT INTO person_im SELECT * FROM t_person CROSS JOIN t_im), t_phone AS (INSERT INTO phone (country_code, phone_number, extension) VALUES ('c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b'), ('c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c'), ('eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3'), ('a87ff679a2f3e71d9181a67b7542122c','a87ff679a2f3e71d9181a67b7542122c','a87ff679a2f3e71d9181a67b7542122c'), ('e4da3b7fbbce2345d7772b0674a318d5','e4da3b7fbbce2345d7772b0674a318d5','e4da3b7fbbce2345d7772b0674a318d5'), ('1679091c5a880faf6fb5e6087eb1b2dc','1679091c5a880faf6fb5e6087eb1b2dc','1679091c5a880faf6fb5e6087eb1b2dc'), ('8f14e45fceea167a5a36dedd4bea2543','8f14e45fceea167a5a36dedd4bea2543','8f14e45fceea167a5a36dedd4bea2543'), ('c9f0f895fb98ab9159f51fd0297e236d','c9f0f895fb98ab9159f51fd0297e236d','c9f0f895fb98ab9159f51fd0297e236d'), ('45c48cce2e2d7fbdea1afc51c7c6ad26','45c48cce2e2d7fbdea1afc51c7c6ad26','45c48cce2e2d7fbdea1afc51c7c6ad26'), ('d3d9446802a44259755d38e6d163e820','d3d9446802a44259755d38e6d163e820','d3d9446802a44259755d38e6d163e820'), ('6512bd43d9caa6e02c990b0a82652dca','6512bd43d9caa6e02c990b0a82652dca','6512bd43d9caa6e02c990b0a82652dca'), ('c20ad4d76fe97759aa27a0c99bff6710','c20ad4d76fe97759aa27a0c99bff6710','c20ad4d76fe97759aa27a0c99bff6710'), ('c51ce410c124a10e0db5e4b97fc2af39','c51ce410c124a10e0db5e4b97fc2af39','c51ce410c124a10e0db5e4b97fc2af39') RETURNING id), t_person_phone AS (INSERT INTO person_phone SELECT * FROM t_person CROSS JOIN t_phone), t_street AS (INSERT INTO street (street1, city, state, country, post_code) VALUES ('c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b'), ('c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c'), ('eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3') RETURNING id), t_person_street AS (INSERT INTO person_street SELECT * FROM t_person CROSS JOIN t_street) VALUES (true);
First i created 2 sql-files with the output of my functions, each contains 10.000 rows for persons, in total for all tables 727104 rows.kretschmer@tux:~/cte$ psql -X -t test -c "select fill_tables(s) from generate_series(1,10000) s;" > data_old.sql kretschmer@tux:~/cte$ psql -X -t test -c "select fill_tables_new(s) from generate_series(1,10000) s;" > data_new.sql kretschmer@tux:~/cte$ ls -l data* -rw-r--r-- 1 kretschmer kretschmer 51113563 Nov 13 18:43 data_new.sql -rw-r--r-- 1 kretschmer kretschmer 102391561 Nov 13 18:43 data_old.sqlThe file with the new syntax is just half the size. Okay, lets rock'n'roll, my 8.5devel is listening on port 5434, i'm using the default configuration on a quadcore with 4 GByte RAM, the operating system is Ubuntu. I'm executing the files with the following command(s):time /usr/local/pgsql/bin/psql -X -p 5434 -h localhost test -f data_old.sql real 9m2.466s user 0m10.789s sys 0m11.357s time /usr/local/pgsql/bin/psql -X -p 5434 -h localhost test -f data_new.sql real 0m34.294s user 0m1.620s sys 0m0.528sVery impressive, isn't it? (i know, i could optimize both versions with prepared statements etc.) I hope, the upcoming 8.5 will contain this feature. This feature has a small disadvantage: it's not portable, since no other RDBMS knows about writable CTE! Many thanks to all the people who developed and are developing PostgreSQL!
4 Kommentare:
Hello Andreas!
I see on problem in your short perfomance test.
You are running a completly different amount of transactions in your both testcases, and there is evidence that more transactions causing less perfomance (we both know that, do we? :))
I have currently a little lack of time (test time at my university) so i would be very interested in a testcase where both cases are executed in a single transaction.
I think using the -1 switch for psql is the approriate method for that.
Best regards and looking forward to meet you at FOSDEM!
Björn
Okay, right, with begin and commit around the sql-files:
old-style:
real 1m41.368s
user 0m9.965s
sys 0m12.037s
new-style:
real 0m30.037s
user 0m1.600s
sys 0m0.404s
Regards, and thanks for the hint, Andreas
Welcome to planet PostgreSQL!
Andreas
Между прочим, лучший способ защитить человека от навязчивых мобилок - приобрести Подавители сотовой связи
Kommentar veröffentlichen