Vous êtes sur la page 1sur 11

NZLoad Working Examples

nzload is the utility used by Netezza to load data into tables. Heres some working examples.nzload options. Option Information -u -pw -db -t -df -cf Username to access the database Password for the username supplied to get into the database Name of the database into which you want to load the data. Table name into which the data is to be loaded. The datafile containing the data to be loaded. The file which contains the formatting and option for nvload. Useful if you are going to repeat the command often.

-delim The delimiter for the data in the file being loaded.

nzload alone is powerful enough to load most required data into netezza. for example, for a file containing tab delimited data, use the following options. To load into database prod, using user fred, password barney and table wilma from a tab delimited file called loadme.data, you would use the following: nzload -u fred -pw barney -db prod -t wilma -delim t -df loadme.data. If your file is delimited by a | symbol, you could then use the following to load into table piped table from file datafile.dat nzload -u admin -pw password -db thedatabase -delim | -df datafile.dat

Using Named Pipes If you are loading a lot of data, and do not have space to keep the data on the system, you can feed it to a named pipe, and feed the command from that. This will not exit until the end of file indicator is given. First, create the pipe file.

mkfifo pipefile

Now you can pipe the data from a supplied file and place it on the pipe.

cat /export/home/nz/dbase/loadme.data > pipefile

Now you specify the pipe file with the -df option to load the data in.

nzload -u admin -pw password -db thedatabase -delim | -df pipefile

Datafile loadme.dat { Database mydatabase TableName datatable. } Now you can run that with the following:

nzload -cf loadme.bat

Load session of table DATATABLE completed successfully When you use the nzload command, note that you cannot specify both the -cf and -df options in the same command. You can load from a specified data file, or load from a control file, but not both in one command. This is because the control file requires a datafile definition, so you cannot specify the file from outside of the controlfile. The following control file options define two data sets to load. Note that the options can vary for each data set. Datafile /home/operation/data/customer.dat { Database dev TableName customer Delimiter | Logfile operation.log Badfile customer.bad } Datafile /home/imports/data/inventory.dat { Database dev TableName inventory Delimiter # Logfile importload.log Badfile inventory.bad }

If you save these control file contents as a text file (named import_def.txt in this example) you can specify it using the nzload command as follows: nzload -cf /home/nz/sample/import_def.txt Load session of table CUSTOMER completed successfully Load session of table INVENTORY completed successfully

View deleted data in Netezza with view_deleted_records.


So, Netezza does not delete data until the table has been groomed, they are marked as deleted on the system. So, if they are deleted how are you supposed to see them ? We know the row exists, and it is still there where we have not groomed the table, so heres an example of how to view deleted data. LABDB(ADMIN)=> d bod Table BOD Attribute | Type | Modifier | Default Value ++-+ ID | BIGINT | | Distributed on hash: ID LABDB(ADMIN)=> insert into bod values (1); INSERT 0 1 LABDB(ADMIN)=> insert into bod values (2); INSERT 0 1 LABDB(ADMIN)=> delete from bod where id = 1; DELETE 1 LABDB(ADMIN)=> insert into bod values (3); INSERT 0 1 LABDB(ADMIN)=> select createxid, deletexid, rowid, * from bod; CREATEXID | DELETEXID | ROWID | ID +++353286 | 0 | 515101001 | 2 353290 | 0 | 515101002 | 3 (2 rows) LABDB(ADMIN)=> set show_deleted_records=true; SET VARIABLE LABDB(ADMIN)=> select createxid, deletexid, rowid, * from bod; CREATEXID | DELETEXID | ROWID | ID +++353286 | 0 | 515101001 | 2

353290 | 0 | 515101002 | 3 353284 | 353288 | 515101000 | 1 (3 rows) But updates do the same, dont they ? An updated row is logically updated but physically deleted and recreated. LABDB(ADMIN)=> select createxid, deletexid, rowid, * from bod; CREATEXID | DELETEXID | ROWID | ID | JUNK ++-+-+ (0 rows) LABDB(ADMIN)=> insert into bod values (1, null); INSERT 0 1 LABDB(ADMIN)=> insert into bod values (2, null); INSERT 0 1 LABDB(ADMIN)=> update bod set junk = TWO where id = 2; UPDATE 1 LABDB(ADMIN)=> select createxid, deletexid, rowid, * from bod order by createxid; CREATEXID | DELETEXID | ROWID | ID | JUNK +++-+ 353362 | 0 | 515102002 | 1 | 353364 | 353366 | 515102003 | 2 | 353366 | 0 | 515102003 | 2 | TWO (3 rows) This shows that the row has NOT been updated, but actually inserted and the old record marked as deleted. There is a gotcha here, the view_deleted_records does not work in exactly the way that you would expect. LABDB(ADMIN)=> select createxid, deletexid, rowid, * from bod order by createxid, rowid; CREATEXID | DELETEXID | ROWID | ID | JUNK +++-+ 353284 | 353288 | 515101000 | 1 | 353286 | 0 | 515101001 | 2 | 353290 | 0 | 515101002 | 3 | (3 rows) This is fine. Lets now update the table. LABDB(ADMIN)=> update bod set junk = ONE where id = 1; ERROR: 056408 : Concurrent update or delete of same row Well, that would be silly that you could actually update a deleted row in a table.

LABDB(ADMIN)=> update bod set junk = TWO where id = 2; UPDATE 1 LABDB(ADMIN)=> select createxid, deletexid, rowid, * from bod order by createxid, rowid; CREATEXID | DELETEXID | ROWID | ID | JUNK +++-+ 353284 | 353288 | 515101000 | 1 | 353286 | 353312 | 515101001 | 2 | 353290 | 0 | 515101002 | 3 | 353310 | 1 | 515101000 | 1 | ONE 353312 | 0 | 515101001 | 2 | TWO (5 rows) So you can see two things which are interesting. Firstly the deleted row was actually updated, despite the error. And we can see that id 2 was marked deleted at XID 353312, and a new one at the same XID. Lets just try that again. LABDB(ADMIN)=> set show_deleted_records=false; SET VARIABLE LABDB(ADMIN)=> insert into bod values (4, null); INSERT 0 1 LABDB(ADMIN)=> update bod set junk = FOUR where id = 4; UPDATE 1 LABDB(ADMIN)=> select * from bod where id = 4; ID | JUNK -+ 4 | FOUR (1 row) LABDB(ADMIN)=> delete from bod where id = 4; DELETE 1 LABDB(ADMIN)=> select * from bod where id = 4; ID | JUNK -+ (0 rows) LABDB(ADMIN)=> set show_deleted_records=true; SET VARIABLE LABDB(ADMIN)=> select createxid, deletexid, rowid, * from bod where id = 4; CREATEXID | DELETEXID | ROWID | ID | JUNK +++-+ 353330 | 353332 | 515102000 | 4 | 353332 | 353336 | 515102000 | 4 | FOUR (2 rows) Thats what we expect to happen. Lets repeat without the setting.

LABDB(ADMIN)=> set show_deleted_records=false; SET VARIABLE LABDB(ADMIN)=> insert into bod values (5, null); INSERT 0 1 LABDB(ADMIN)=> delete from bod where id = 5; DELETE 1 LABDB(ADMIN)=> update bod set junk = FIVE where id = 5; UPDATE 0 LABDB(ADMIN)=> set show_deleted_records=true; SET VARIABLE LABDB(ADMIN)=> select createxid, deletexid, rowid, * from bod where id = 5; CREATEXID | DELETEXID | ROWID | ID | JUNK +++-+ 353342 | 353344 | 515102001 | 5 | (1 row) Which shows that if you set show_deleted_records to true, you can not only view the deleted data, but you can update it too. This should never be set on the netezza system without being requested by support.

Netezza, Schemas and Cross Database Access.


Netezza does not have schemas per-se, unlike Oracle. Here a database can not contain more than one object of the same name, be it tables and users or materialized views. So an object does have an owner, which relates in a similar manner to schemas, but now not in particular require adding when enabling and managing cross database access. Heres an example. I have a couple of databases here, and some tables. For the second user Ill create the whole thing and show permissions required to define this for the minimum requirements. SYSTEM(ADMIN)=> c gc gary gary You are now connected to database gc as user gary. GC(GARY)=> d t1 Table T1 Attribute | Type | Modifier | Default Value ++-+ ID | BIGINT | | MF | CHARACTER(1) | | Distributed on hash: MF Organized on: ID

SYSTEM(ADMIN)=> create database gc2; CREATE DATABASE SYSTEM(ADMIN)=> create user gary2 with password gary2; CREATE USER Currently we cannot use this database as the new user. LABDB(ADMIN)=> c gc2 gary2 gary2 FATAL 1: database connection refused Previous connection kept SYSTEM(ADMIN)=> grant list on gc2 to gary2; GRANT Now we can connect. LABDB(ADMIN)=> c gc2 gary2 gary2 You are now connected to database gc2 as user gary2. GC2(GARY2)=> select count(*) from gc.gary.t1; ERROR: Permission denied on T1. The following allows access. GC2(ADMIN)=> c gc admin password You are now connected to database gc as user admin. GC(ADMIN)=> grant select on t1 to gary2; GRANT Which is the same as GC(ADMIN)=> grant select on gary.t1 to gary2; GRANT Now we can access across the database. GC2(GARY2)=> select count(*) from gc.gary.t1; COUNT 134217728 (1 row) We still cannot create a table in the database though. GC2(GARY2)=> create table t1 as select * from gc.gary.t1 limit 10; ERROR: CREATE TABLE: permission denied.

GC(ADMIN)=> c gc2 admin password You are now connected to database gc2 as user admin. GC2(ADMIN)=> grant create table to gary2; GRANT You will note that there is no requirement to reconnect to the database to obtain these permissions. GC2(GARY2)=> create table t1 as select * from gc.gary.t1 limit 10; INSERT 0 10 GC(ADMIN)=> c gc gary gary You are now connected to database gc as user gary. GC(GARY)=> create table t4 as select * from t1 limit 0; INSERT 0 0 GC(GARY)=> c gc admin password You are now connected to database gc as user admin. GC(ADMIN)=> grant insert on t4 to gary2; GRANT GC2(GARY2)=> insert into gc.gary.t4 select * from t1; ERROR: Cross Database Access not supported for this type of command Bearing in mind that you cannot have two tables the same in a single database, there is no reason to place the schema name of the table in the command. Therefore the following are identical. GC2(GARY2)=> insert into t1 select * from gc.gary.t1 limit 10; INSERT 0 10 GC2(GARY2)=> insert into t1 select * from gc..t1 limit 10; INSERT 0 10 GC2(GARY2)=> insert into t1 select * from gc.t1 limit 10; INSERT 0 10 But there is another option which can be set. GC2(GARY2)=> show enable_schema_dbo_check; NOTICE: ENABLE_SCHEMA_DBO_CHECK is 0 SHOW VARIABLE This changes the behaviour of using the schema name when referencing a table. 0 raises no message, 1 produces a warning whilst 2 denies access. GC(ADMIN)=> c gc2 gary2 gary2 You are now connected to database gc2 as user gary2. GC2(GARY2)=> insert into t1 select * from gc.gary.t1 limit 10;

INSERT 0 10 GC2(GARY2)=> set enable_schema_dbo_check = 1; SET VARIABLE GC2(GARY2)=> insert into t1 select * from gc.gary.t1 limit 10; NOTICE: Schema GARY does not exist INSERT 0 10 GC2(GARY2)=> set enable_schema_dbo_check = 2; SET VARIABLE GC2(GARY2)=> insert into t1 select * from gc.gary.t1 limit 10; ERROR: Schema GARY does not exist The setting can be made permenant. [nz@netezza data]$ pwd /nz/data [nz@netezza data]$ more postgresql.conf | grep schema # # Cross Database Access Settings # # enable_schema_dbo_check = 0 So it is originally hashed out and not read when the database starts. Change it in the configuration file. # # Cross Database Access Settings # enable_schema_dbo_check = 2 Now connect and try the same. GC2(GARY2)=> insert into t1 select * from gc.gary.t1 limit 10; INSERT 0 10 Nothing. So the default is still in place. Restart the system to enable the change. [nz@netezza data]$ nzstop [nz@netezza data]$ nzstart Now it takes effect. [nz@netezza data]$ nzsql gc2 gary2 gary2 Welcome to nzsql, the Netezza SQL interactive terminal. Type: h for help with SQL commands ? for help on internal slash commands

g or terminate with semicolon to execute query q to quit GC2(GARY2)=> show enable_schema_dbo_check; NOTICE: ENABLE_SCHEMA_DBO_CHECK is 2 SHOW VARIABLE GC2(GARY2)=> insert into t1 select * from gc.t1 limit 10; ERROR: Schema GC does not exist

Copying statistics from a netezza database to another.


A customer had a query which performed well on one database, but was not optimal on another. In this case the plans for the queries differed between the two databases. In this case the client required to copy the statistics on the tables in the first database to the second to see if the problem was down to statistics. How do you do it though ? Firstly there is no tool which can import and export statistics, unlike dbms_stats in oracle which allows you to import and export table statistics. However there is a way around this in Netezza. Of course, the logical method is to recreate statistics in the second database, and check distribution to ensure that they are distributed in the same manner. Normally this would be down to the distribution on one of the tables in the query resulting in a changed plan. Using the script nzdumpschema, Netezza will extract the table definitions and statistics on a database to allow you to run this against another database. Part of the process this script generates includes a number of statements which insert data into the _t_class, _t_attribute and _t_statistic tables. The script does firstly drop and re-create the database so DONT RUN IT without editing the script first. Remove the drop and create database statements if you want the database name to be the same, else, if your database is called PROD, then the database created will be called PROD_SHADOW. So, either remove or rename the database name in the create statement to ensure you have the required setup in the script. If your second database already has the tables and data in it, then you should remove all the create table statements, and execute it against the database which has performance problems, to update the statistics in that database to appear the same as the production database. Running the script having modified the database name will create a new database which is empty but contains all the statistics in the original database. This will allow you to ensure execution is the same as the originial database.

Remember to run the nzdumpschema on the second database and perform the same changes incase you want to re-import the statistics back into the database after testing.

Vous aimerez peut-être aussi