Vous êtes sur la page 1sur 4

Talend

Tutorial Task Aid >


Creating and Using Metadata

This tutorial uses Talend Open Studio Data Integration version 6


Talend Open Studio allows you to create and run Java ETL programs or Jobs using pre-defined
components.
Each component can be configured either as a Built-in or as a Repository component.
For Built-in components, information such as how to read the file and what it contains,
Is defined within the component.
Applies only to this component.
Cannot be reused with any other component.
For Repository components, information
Is saved as metadata.
Can be efficiently and consistently reused.
Can be easily maintained because changes to metadata can be propagated to all Jobs
that use it.


1. Create a metadata definition for a delimited file
a. In the Project Repository, click Metadata, right-click File delimited, and click Create file
delimited.
b. In the Name field of the wizard, type movies and click Next.
c. To specify a sample file, click Browse next to the File field, select the file moviesSorted
from the local disk, and click Open.

The file is displayed in the File Viewer section of the wizard.


d. To define the Property Type settings, click Next.

In the wizard window that appears, you can define settings such as how the file should be
read, the number of rows, if any, that should be skipped when reading the file, and the
maximum number of rows to process.
e. To indicate that the first row of the file is column names and should be ignored, in the
Preview tab, select the Set heading row as column names checkbox.

Talend takes the complexity out of integration


Based on open source Scalable Future-proof Predictable cost
Visit www.talend.com Follow us on Twitter @Talend
Talend Tutorial Task Aid >

Note that when you do so, the Header checkbox is automatically checked with the value 1.
f. To refresh the file display to reflect the change made, click the Refresh Preview button
and then click Next.
g. In the Name field, type moviesSchema.
If the first line of the sample file includes column names, they will be displayed. If not, the
columns will appear as Column 0, Column 1, and so on and will have to be renamed manually.

When guessing the schema, Talend only reads the fifty first lines of the sample file and based
on the data in these rows, defines the column types and length. You should validate the
information displayed or correct it, if necessary.

h. Update the displayed schema to reflect the structure of the sample file. In this case,
change the length of the title and url fields to 100 and 250 respectively. Also, change the
type of the directorID field to integer. Click Finish.

Under Metadata in the Project Repository, the movies 0.1 entry is displayed with the file
properties. Under the entry movies 0.1, the schema of the metadata file, moviesSchema, is
displayed.

If you need to modify the property type or the schema, right-click on the component in the
Project Repository and select Edit File Delimited or Edit Schema.

2. Use the metadata to configure a component


a. Create a new Job and name it useMetadata and add a tFileInputDelimited component to
it.

Note: By default, the component is configured with Built-in parameters.

b. In the Property Type field of the Component view, select Repository.


c. To select the metadata, click [] next to the field that appears, click movies 0.1, and then
click OK.

Note that the parameters set of the metadata is displayed. Also note that all the fields are in
grey, indicating that they belong to the metadata and not to the component.

To change the schema, click [] next to text Edit schema and choose an option:

Change to built-in property to edit the schema for this component only.
Update repository connection to edit the metadata schema in the repository.

Talend takes the complexity out of integration


Based on open source Scalable Future-proof Predictable cost
Visit www.talend.com Follow us on Twitter @Talend
Talend Tutorial Task Aid >

d. To view the schema, click [] next to text Edit schema and select View schema.

3. Use the metadata to configure a second component


a. As an alternative method to use metadata: drag it straight from the Repository to the
Designer and choose the tFileInputDelimited component to use it.

Talend allows you to create metadata based on several parameters such as databases, SAP
connections, and several file types.

Note: To illustrate this, MySQL Workbench 6.3 CE along with a test dataset called talend_dq is
used. You can either try it with a similar configuration or with your own databases.

4. Create a database connection and define it as metadata


a. In the Project Repository, under Metadata, right-click Db Connections, and click Create
connection.
b. In the Name field of the Database Connection wizard, enter MySql and click Next.
c. In the DB type field, select MySQL.
d. Fill in the connection parameters.
e. To check the connection to the database, click Check.
f. To close the wizard and create the metadata, click Finish.
g. To automatically retrieve all the table schemas, right-click the MySql 0.1 metadata in the
Project Repository and then click Retrieve Schema.
h. Click Next.

The connected database is displayed.

i. To select all the tables and views, select the checkbox to the left of the database name
and click Next.

The database and all tables and details are displayed.

All table schemas have been imported as metadata and can be used.
The tables and the views appear under the mysql 0.1 connection in the Project Repository. To
view the field in a table, click the table.

5. Read a database table using the metadata


a. To read a table from the list, select the table and drop it on the Job Designer.

Talend takes the complexity out of integration


Based on open source Scalable Future-proof Predictable cost
Visit www.talend.com Follow us on Twitter @Talend
Talend Tutorial Task Aid >

b. In the Components window, click tMySqlInput and click OK.

A tMysqlInput component is created with the repository information. It used the MySql 0.1
connection, and for the Schema it used the repository information from the metadata table
tdq_values.

In addition, Talend generates the SQL query and sends it to the table tdq_values.

c. To display the table data, add the tLogRow component and link the tdq_values component
to the tLogRow_1 component.
d. To run the Job, in the Run view, click Run.

The data from the table tdq_values is displayed.

Talend takes the complexity out of integration


Based on open source Scalable Future-proof Predictable cost
Visit www.talend.com Follow us on Twitter @Talend

Vous aimerez peut-être aussi