about me and my world

Sonntag, 15. November 2009

writeable CTE - a short performance test

Inspired by the talk from David Fetter and Mirko Tiikkaja performed on the PGDAY 2009 in Paris about writeable CTE i want to analyse a special aspect: minimizing I/O-operations with query clustering.

The talk shows an example that contains 7 tables: one person-table, 3 tables for details for a person (street, im and phone), which contains contact-data for the person, and 3 additional tables to map each person to multiple streets, im and phone.

Assumed, there are 2 entries for street, im and phone for a person, this results in 1 insert into the person-table, respectively 2 inserts into street, im and phone. In addition it needs 2 inserts for every joining-table.


With the new feature 'writeable CTE', the effort is reduced to single, (but complex) SQL-command, which contains - among others - all inserts. Of course, this procedure is complex.

In the following posting i want to analyse, if this single, complex command is faster than many simple traditional commands.

For this purpose i wrote 2 plpgsql-functions, which generate proper SQL-commands, one for the old style with many inserts
and one for the new syntax, both functions with exactly the same amount of data for the inserts.


Here are the two functions:


create or replace function fill_tables(IN c int, OUT sql text) returns text as $$
declare
  c_im int;
  c_phone int;
  c_street int;
  count int;
begin
  c_im:= 1 + strpos(md5(c::text),'a');
  c_phone := 1 + strpos(md5(c::text),'b');
  c_street := 1 + strpos(md5(c::text),'c');

  sql := E'INSERT INTO person (first_name, last_name, birthdate) VALUES (\'' ||
         md5(c::text) || E'\', \'' ||
         md5((c+1)::text) || E'\', \'' ||
         ('1970-01-01'::date + ('0' || regexp_replace(substring(md5((c)::text),0,8),'[^0-9]','','g'))::int * '1 minute'::interval)::date || E'\');\n';

  for count in 0 .. c_im loop
    sql := sql ||
           E'INSERT INTO im (provider, handle) VALUES (\'' ||
           md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\');\n';
    sql := sql ||
           E'INSERT INTO person_im(person_id, im_id) VALUES (' ||
           E'currval(\'person_id_seq\'),currval(\'im_id_seq\'));\n';
  end loop;

  for count in 0 .. c_phone loop
    sql := sql ||
           E'INSERT INTO phone (country_code, phone_number, extension) VALUES (\'' ||
           md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\');\n';

    sql := sql ||
           E'INSERT INTO person_phone(person_id, phone_id) VALUES (' ||
           E'currval(\'person_id_seq\'),currval(\'phone_id_seq\'));\n';
  end loop;

  for count in 0 .. c_street loop
    sql = sql ||
          E'INSERT INTO street (street1,city,state,country,post_code) VALUES (\'' ||
          md5((c+count)::text) || E'\',\'' ||
          md5((c+count)::text) || E'\',\'' ||
          md5((c+count)::text) || E'\',\'' ||
          md5((c+count)::text) || E'\',\'' ||
          md5((c+count)::text) || E'\');\n';
    sql := sql ||
          E'INSERT INTO person_street(person_id, street_id) VALUES (' ||
          E'currval(\'person_id_seq\'),currval(\'street_id_seq\'));\n';
  end loop;
end;
$$ language plpgsql volatile;

And:


create or replace function fill_tables_new(IN c int, OUT sql text) returns text as $$
declare
  c_im int;
  c_phone int;
  c_street int;
  count int;
begin
  c_im:= 1 + strpos(md5(c::text),'a');
  c_phone := 1 + strpos(md5(c::text),'b');
  c_street := 1 + strpos(md5(c::text),'c');

  sql := E'WITH t_person AS (\n  ' ||
         E'INSERT INTO person (first_name, last_name, birthdate) VALUES (\n  \'' ||
         md5(c::text) || E'\', \'' ||
         md5((c+1)::text) || E'\', \'' ||
         ('1970-01-01'::date + ('0' || regexp_replace(substring(md5((c)::text),0,10),'[^0-9]','','g'))::int * '1 minute'::interval)::date || E'\') RETURNING id),\n

  sql := sql ||
         E't_im AS (INSERT INTO im (provider, handle) VALUES ';
  for count in 0 .. c_im loop
    sql := sql || E'\n  (\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\'),';
  end loop;
  sql := trim(sql,',');
  sql := sql || E' RETURNING id),\n';
  sql := sql || E't_person_im AS (INSERT INTO person_im SELECT * FROM t_person CROSS JOIN t_im),\n';

  sql := sql ||
         E't_phone AS (INSERT INTO phone (country_code, phone_number, extension) VALUES ';
  for count in 0 .. c_phone loop
    sql := sql || E'\n  (\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\',\'' || md5((c+count)::text) || E'\'),';
  end loop;
  sql := trim(sql,',');
  sql := sql || E' RETURNING id),\n';
  sql := sql || E't_person_phone AS (INSERT INTO person_phone SELECT * FROM t_person CROSS JOIN t_phone),\n';

  sql := sql ||
        E't_street AS (INSERT INTO street (street1, city, state, country, post_code) VALUES ';
  for count in 0 .. c_street loop
    sql := sql || E'\n  (\'' || md5((c+count)::text) || E'\',\'' ||
                                md5((c+count)::text) || E'\',\'' ||
                                md5((c+count)::text) || E'\',\'' ||
                                md5((c+count)::text) || E'\',\'' ||
                                md5((c+count)::text) || E'\'),';
  end loop;
  sql := trim(sql,',');
  sql := sql || E' RETURNING id),\n';
  sql := sql || E't_person_street AS (INSERT INTO person_street SELECT * FROM t_person CROSS JOIN t_street)\n';

  sql := sql || 'VALUES (true);';


end;
$$ language plpgsql volatile;


And here the output of this functions (for example, for one person):


INSERT INTO person (first_name, last_name, birthdate) VALUES ('c4ca4238a0b923820dcc509a6f75849b', 'c81e728d9d4c2f636f067f89cc14862c', '1970-01-04');
 INSERT INTO im (provider, handle) VALUES ('c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b');
 INSERT INTO person_im(person_id, im_id) VALUES (currval('person_id_seq'),currval('im_id_seq'));
 INSERT INTO im (provider, handle) VALUES ('c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c');
 INSERT INTO person_im(person_id, im_id) VALUES (currval('person_id_seq'),currval('im_id_seq'));
 INSERT INTO im (provider, handle) VALUES ('eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3');
 INSERT INTO person_im(person_id, im_id) VALUES (currval('person_id_seq'),currval('im_id_seq'));
 INSERT INTO im (provider, handle) VALUES ('a87ff679a2f3e71d9181a67b7542122c','a87ff679a2f3e71d9181a67b7542122c');
 INSERT INTO person_im(person_id, im_id) VALUES (currval('person_id_seq'),currval('im_id_seq'));
 INSERT INTO im (provider, handle) VALUES ('e4da3b7fbbce2345d7772b0674a318d5','e4da3b7fbbce2345d7772b0674a318d5');
 INSERT INTO person_im(person_id, im_id) VALUES (currval('person_id_seq'),currval('im_id_seq'));
 INSERT INTO im (provider, handle) VALUES ('1679091c5a880faf6fb5e6087eb1b2dc','1679091c5a880faf6fb5e6087eb1b2dc');
 INSERT INTO person_im(person_id, im_id) VALUES (currval('person_id_seq'),currval('im_id_seq'));
 INSERT INTO phone (country_code, phone_number, extension) VALUES ('c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b');
 INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq'));
 INSERT INTO phone (country_code, phone_number, extension) VALUES ('c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c');
 INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq'));
 INSERT INTO phone (country_code, phone_number, extension) VALUES ('eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3');
 INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq'));
 INSERT INTO phone (country_code, phone_number, extension) VALUES ('a87ff679a2f3e71d9181a67b7542122c','a87ff679a2f3e71d9181a67b7542122c','a87ff679a2f3e71d9181a67b7542122c');
 INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq'));
 INSERT INTO phone (country_code, phone_number, extension) VALUES ('e4da3b7fbbce2345d7772b0674a318d5','e4da3b7fbbce2345d7772b0674a318d5','e4da3b7fbbce2345d7772b0674a318d5');
 INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq'));
 INSERT INTO phone (country_code, phone_number, extension) VALUES ('1679091c5a880faf6fb5e6087eb1b2dc','1679091c5a880faf6fb5e6087eb1b2dc','1679091c5a880faf6fb5e6087eb1b2dc');
 INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq'));
 INSERT INTO phone (country_code, phone_number, extension) VALUES ('8f14e45fceea167a5a36dedd4bea2543','8f14e45fceea167a5a36dedd4bea2543','8f14e45fceea167a5a36dedd4bea2543');
 INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq'));
 INSERT INTO phone (country_code, phone_number, extension) VALUES ('c9f0f895fb98ab9159f51fd0297e236d','c9f0f895fb98ab9159f51fd0297e236d','c9f0f895fb98ab9159f51fd0297e236d');
 INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq'));
 INSERT INTO phone (country_code, phone_number, extension) VALUES ('45c48cce2e2d7fbdea1afc51c7c6ad26','45c48cce2e2d7fbdea1afc51c7c6ad26','45c48cce2e2d7fbdea1afc51c7c6ad26');
 INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq'));
 INSERT INTO phone (country_code, phone_number, extension) VALUES ('d3d9446802a44259755d38e6d163e820','d3d9446802a44259755d38e6d163e820','d3d9446802a44259755d38e6d163e820');
 INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq'));
 INSERT INTO phone (country_code, phone_number, extension) VALUES ('6512bd43d9caa6e02c990b0a82652dca','6512bd43d9caa6e02c990b0a82652dca','6512bd43d9caa6e02c990b0a82652dca');
 INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq'));
 INSERT INTO phone (country_code, phone_number, extension) VALUES ('c20ad4d76fe97759aa27a0c99bff6710','c20ad4d76fe97759aa27a0c99bff6710','c20ad4d76fe97759aa27a0c99bff6710');
 INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq'));
 INSERT INTO phone (country_code, phone_number, extension) VALUES ('c51ce410c124a10e0db5e4b97fc2af39','c51ce410c124a10e0db5e4b97fc2af39','c51ce410c124a10e0db5e4b97fc2af39');
 INSERT INTO person_phone(person_id, phone_id) VALUES (currval('person_id_seq'),currval('phone_id_seq'));
 INSERT INTO street (street1,city,state,country,post_code) VALUES ('c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b');
 INSERT INTO person_street(person_id, street_id) VALUES (currval('person_id_seq'),currval('street_id_seq'));
 INSERT INTO street (street1,city,state,country,post_code) VALUES ('c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c');
 INSERT INTO person_street(person_id, street_id) VALUES (currval('person_id_seq'),currval('street_id_seq'));
 INSERT INTO street (street1,city,state,country,post_code) VALUES ('eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3');
 INSERT INTO person_street(person_id, street_id) VALUES (currval('person_id_seq'),currval('street_id_seq'));



And, with writeable CTE:


WITH t_person AS (
   INSERT INTO person (first_name, last_name, birthdate) VALUES (
   'c4ca4238a0b923820dcc509a6f75849b', 'c81e728d9d4c2f636f067f89cc14862c', '1970-01-31') RETURNING id),
 t_im AS (INSERT INTO im (provider, handle) VALUES
   ('c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b'),
   ('c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c'),
   ('eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3'),
   ('a87ff679a2f3e71d9181a67b7542122c','a87ff679a2f3e71d9181a67b7542122c'),
   ('e4da3b7fbbce2345d7772b0674a318d5','e4da3b7fbbce2345d7772b0674a318d5'),
   ('1679091c5a880faf6fb5e6087eb1b2dc','1679091c5a880faf6fb5e6087eb1b2dc') RETURNING id),
 t_person_im AS (INSERT INTO person_im SELECT * FROM t_person CROSS JOIN t_im),
 t_phone AS (INSERT INTO phone (country_code, phone_number, extension) VALUES
   ('c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b'),
   ('c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c'),
   ('eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3'),
   ('a87ff679a2f3e71d9181a67b7542122c','a87ff679a2f3e71d9181a67b7542122c','a87ff679a2f3e71d9181a67b7542122c'),
   ('e4da3b7fbbce2345d7772b0674a318d5','e4da3b7fbbce2345d7772b0674a318d5','e4da3b7fbbce2345d7772b0674a318d5'),
   ('1679091c5a880faf6fb5e6087eb1b2dc','1679091c5a880faf6fb5e6087eb1b2dc','1679091c5a880faf6fb5e6087eb1b2dc'),
   ('8f14e45fceea167a5a36dedd4bea2543','8f14e45fceea167a5a36dedd4bea2543','8f14e45fceea167a5a36dedd4bea2543'),
   ('c9f0f895fb98ab9159f51fd0297e236d','c9f0f895fb98ab9159f51fd0297e236d','c9f0f895fb98ab9159f51fd0297e236d'),
   ('45c48cce2e2d7fbdea1afc51c7c6ad26','45c48cce2e2d7fbdea1afc51c7c6ad26','45c48cce2e2d7fbdea1afc51c7c6ad26'),
   ('d3d9446802a44259755d38e6d163e820','d3d9446802a44259755d38e6d163e820','d3d9446802a44259755d38e6d163e820'),
   ('6512bd43d9caa6e02c990b0a82652dca','6512bd43d9caa6e02c990b0a82652dca','6512bd43d9caa6e02c990b0a82652dca'),
   ('c20ad4d76fe97759aa27a0c99bff6710','c20ad4d76fe97759aa27a0c99bff6710','c20ad4d76fe97759aa27a0c99bff6710'),
   ('c51ce410c124a10e0db5e4b97fc2af39','c51ce410c124a10e0db5e4b97fc2af39','c51ce410c124a10e0db5e4b97fc2af39') RETURNING id),
 t_person_phone AS (INSERT INTO person_phone SELECT * FROM t_person CROSS JOIN t_phone),
 t_street AS (INSERT INTO street (street1, city, state, country, post_code) VALUES
   ('c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b','c4ca4238a0b923820dcc509a6f75849b'),
   ('c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c','c81e728d9d4c2f636f067f89cc14862c'),
   ('eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3','eccbc87e4b5ce2fe28308fd9f2a7baf3') RETURNING id),
 t_person_street AS (INSERT INTO person_street SELECT * FROM t_person CROSS JOIN t_street)
 VALUES (true);

First i created 2 sql-files with the output of my functions, each contains 10.000 rows for persons, in total for all tables 727104 rows.




kretschmer@tux:~/cte$ psql -X -t test -c "select fill_tables(s) from generate_series(1,10000) s;" > data_old.sql
kretschmer@tux:~/cte$ psql -X -t test -c "select fill_tables_new(s) from generate_series(1,10000) s;" > data_new.sql
kretschmer@tux:~/cte$ ls -l data*
-rw-r--r-- 1 kretschmer kretschmer  51113563 Nov 13 18:43 data_new.sql
-rw-r--r-- 1 kretschmer kretschmer 102391561 Nov 13 18:43 data_old.sql
The file with the new syntax is just half the size. Okay, lets rock'n'roll, my 8.5devel is listening on port 5434, i'm using the default configuration on a quadcore with 4 GByte RAM, the operating system is Ubuntu. I'm executing the files with the following command(s):
time /usr/local/pgsql/bin/psql -X -p 5434 -h localhost test -f data_old.sql

real    9m2.466s
user    0m10.789s
sys     0m11.357s
time /usr/local/pgsql/bin/psql -X -p 5434 -h localhost test -f data_new.sql

real    0m34.294s
user    0m1.620s
sys     0m0.528s
Very impressive, isn't it? (i know, i could optimize both versions with prepared statements etc.) I hope, the upcoming 8.5 will contain this feature. This feature has a small disadvantage: it's not portable, since no other RDBMS knows about writable CTE! Many thanks to all the people who developed and are developing PostgreSQL!

Kommentare:

bhaeuser hat gesagt…

Hello Andreas!

I see on problem in your short perfomance test.

You are running a completly different amount of transactions in your both testcases, and there is evidence that more transactions causing less perfomance (we both know that, do we? :))

I have currently a little lack of time (test time at my university) so i would be very interested in a testcase where both cases are executed in a single transaction.
I think using the -1 switch for psql is the approriate method for that.

Best regards and looking forward to meet you at FOSDEM!

Björn

akretschmer hat gesagt…

Okay, right, with begin and commit around the sql-files:

old-style:

real 1m41.368s
user 0m9.965s
sys 0m12.037s


new-style:


real 0m30.037s
user 0m1.600s
sys 0m0.404s


Regards, and thanks for the hint, Andreas

Anonym hat gesagt…

Welcome to planet PostgreSQL!

Andreas

Anonym hat gesagt…

Между прочим, лучший способ защитить человека от навязчивых мобилок - приобрести Подавители сотовой связи

Labels