Académique Documents
Professionnel Documents
Culture Documents
Batch
Article based on
This article is taken from the book Spring Batch in Action. The authors show how to configure
the Spring Batch flat file ItemReader and how to write our own ItemWriter to handle writing
products to the database.
Get 35% off any version of Spring Batch in Action with the checkout code fcc35.
Offer is only valid through www.manning.com.
Reading and writing the product catalog is at the core of a Spring Batch process. Reading and writing is Spring
Batch’s sweet spot: for the import product process you only have to configure one Spring Batch component to read
the content of the flat file, implement a simple interface for the writing component, and create a configuration file
for Spring Batch to handle the batch execution flow. Table 1 lists the Spring Batch features introduced by the
import catalog batch process.
Table 1 Spring Batch features introduced by the import catalog batch process
Validation Validating job parameters to immediately stop a job if some of the inputs
are invalid
For Source Code, Sample Chapters, the Author Forum and other resources, go to
http://www.manning.com/templier/
Skipping invalid records instead of failing the whole process
Let’s start by using Spring Batch to implement the read-write use case.
Figure 1 In read-write scenarios, Spring Batch uses chunk processing. Spring Batch reads items one by one from an ItemReader,
collects the items in a chunk of a given size, and sends that chunk to an ItemWriter.
Spring Batch handles read-write scenarios by managing an ItemReader and an ItemWriter. Spring Batch
collects items one at a time from the item reader into a chunk, where the chunk size is configurable. Spring Batch
then sends the chunk to the item writer and goes back to using the item reader to create another chunk, and so on
until the input is exhausted. This is what we call chunk processing.
CHUNK PROCESSING
Chunk processing is particularly well suited to handle large data operations since items are handled in small
chunks instead of processed all at once. Practically speaking, a large file won’t be loaded in memory, instead it
will be streamed, which is more efficient in terms of memory consumption. Chunk processing allows more
flexibility to manage the data flow in a job. Spring Batch also handles transactions and errors around read and
write operations.
Speaking of flexibility, Spring Batch provides an optional processing step to chunk processing: items read can be
processed (transformed) before being sent to the ItemWriter. The ability to process an item is useful when you
don’t want to write an item as is. The component that handles this transformation is an implementation of the
ItemProcessor interface. Because item processing in Spring Batch is optional, the illustration of chunk
processing shown in figure 1 is still valid. Figure 2 illustrates chunk processing combined with item processing.
For Source Code, Sample Chapters, the Author Forum and other resources, go to
http://www.manning.com/templier/
Figure 2 Chunk processing combined with item processing, where an item processor can transform input items before calling the
item writer
What can be done in an ItemProcessor? You can perform any transformations you need on each item before it is
sent to the ItemWriter. This is where we implement the logic to transform the data from the input format into
the format expected by the target system. Spring Batch also lets you validate and filter an input item; if you return
null from the ItemProcessor method process, processing for that item will stop and the item will not be
inserted into the database.
NOTE
Our read-write use case doesn’t have an item processing step.
Listing 1 shows the definition of the chunk processing interfaces ItemReader, ItemProcessor, and
ItemWriter.
package org.springframework.batch.item; #B
#B
public interface ItemProcessor<T, O> { #B
#B
O process(T item) throws Exception; #B
#B
} #B
package org.springframework.batch.item; #C
#C
import java.util.List; #C
#C
public interface ItemWriter<O> { #C
#C
void write(List<? extends O> items) throws Exception; #C
For Source Code, Sample Chapters, the Author Forum and other resources, go to
http://www.manning.com/templier/
#C
} #C
#A Reads item
#B Transforms item (optional)
#C Writes a chunk of items
Note that all interfaces listed above use Java generics, meaning that a chain composed of a reader, processor, and
writer will be type compatible, as shown in figure 3.
Figure 3 Because the item interfaces (reader, processor, and writer) are parameterized using generics, they will be compatible when
composed into a chunk processing chain.
In the next two sub-sections, we’ll see how to configure the Spring Batch flat file ItemReader and how to write
our own ItemWriter to handle writing products to the database.
PRODUCT_ID,NAME,DESCRIPTION,PRICE
PR....210,BlackBerry 8100 Pearl,A cell phone,124.60
PR....211,Sony Ericsson W810i,Yet another cell phone!,139.45
PR....212,Samsung MM-A900M Ace,A cell phone,97.80
PR....213,Toshiba M285-E 14,A cell phone,166.20
PR....214,Nokia 2610 Phone,A cell phone,145.50
You may recognize this format as comma-separated values (CSVs). There is nothing out of the ordinary in this flat
file: for a given row, the format separates each column value from the next with a comma. Spring Batch will map
each row in the flat file to a Product domain object.
package com.manning.sbia.ch02.domain;
import java.math.BigDecimal;
(...)
For Source Code, Sample Chapters, the Author Forum and other resources, go to
http://www.manning.com/templier/
NOTE
We use a BigDecimal for the product price because the Java float and double primitive types are not well suited
for monetary calculations. For example, it’s impossible to exactly represent 0.1.
Let’s now use the FlatFileItemReader to create Product objects out of the flat file.
Figure 4 The FlatFileItemReader reads the flat file and delegates the mapping between a line and a domain object to a LineMapper.
The LineMapper implementation we use delegates the splitting of lines and the mapping between split lines and domain objects.
That’s a lot of delegation and it means you’ll have more to configure, but this is the price of reusability and
flexibility. You’ll be able to configure and use built-in Spring Batch components or provide your own
implementations to do something more specific.
The DefaultLineMapper is a typical example, it needs:
A LineTokenizer to split a line into fields.
You’ll use a stock Spring Batch implementation for this.
We’ll soon see in listing 2 the whole Spring configuration (LineTokenizer is of particular interest) but, next, we
are going to focus on our FieldSetMapper implementation to create Product domain objects.
For Source Code, Sample Chapters, the Author Forum and other resources, go to
http://www.manning.com/templier/
T mapFieldSet(FieldSet fieldSet) throws BindException;
}
The FieldSet parameter comes from the LineTokenizer. You can think of it as an equivalent to the JDBC
ResultSet: it retrieves field values and does some conversion work between String objects and richer objects
like BigDecimal. The following snippet shows our ProductFieldSetMapper implementation:
package com.manning.sbia.ch02.batch;
import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.FieldSet;
import org.springframework.validation.BindException;
import com.manning.sbia.ch02.domain.Product;
Our ProductFieldSetMapper implementation is not rocket science and that’s exactly the point: it focuses on
retrieving the data from the flat file and converts values into Product domain objects. We leave Spring Batch to
deal with all of the I/O plumbing and efficiently reading the flat file. You’ll notice in the mapFieldSet method the
String literals PRODUCT_ID, NAME, DESCRIPTION, and PRICE. Where do these references come from? They’re
part of the LineTokenizer configuration, so let’s study the Spring configuration for our FlatFileItemReader.
For Source Code, Sample Chapters, the Author Forum and other resources, go to
http://www.manning.com/templier/
In the above example, the resource property (#1) defines the input file. Because the first line of the input file
contains headers, we ask Spring Batch to skip this line by setting the property linesToSkip (#2) to 1. At #3, we
use a DelimitedLineTokenizer to split each input line into fields; Spring Batch uses a comma as the default
separator. Then, at #4, we define the name of each field. These are the names we use in the
ProductFieldSetMapper class to retrieve values from the FieldSet. Finally, at #5, we inject an instance of
our ProductFieldSetMapper into the DefaultLineMapper.
That’s it; your flat file reader is ready! Don’t feel overwhelmed because flat file support in Spring Batch uses
many components—that’s what makes it powerful and flexible. Next up, in order to implement the database item
writer, you will need to do less configuration work but more Java coding. Let’s dig in.
import java.util.List;
import javax.sql.DataSource;
import org.springframework.batch.item.ItemWriter;
import org.springframework.jdbc.core.simple.SimpleJdbcTemplate;
import com.manning.sbia.ch02.domain.Product;
For Source Code, Sample Chapters, the Author Forum and other resources, go to
http://www.manning.com/templier/
The ProductJdbcItemWriter uses Spring’s JdbcTemplate to interact with the database. The JdbcTemplate
is created with a DataSource injected in the constructor (#1). In the write method, we iterate over a chunk of
products and first try to update an existing record (#2). If the database tells us the update statement didn’t
update any record, we are sure that this record does not exist and we can then insert it (#3).
That’s it for the implementation! Notice how simple it was to implement this ItemWriter because Spring Batch
handles getting records from the ItemReader, creating chunks, opening transactions, and so on. Next, let’s
configure the database item writer.
<bean id="writer"
class="com.manning.sbia.ch02.batch.ProductJdbcItemWriter">
<constructor-arg ref="dataSource" />
</bean>
Now that we have created the two parts of our read-write step, we can assemble them in a Spring Batch job.
<job id="importProducts" #1
xmlns="http://www.springframework.org/schema/batch"> #1
<step id="readWriteProducts">
<tasklet>
<chunk reader="reader" writer="writer" #2
commit-interval="100" /> #3
</tasklet>
</step>
</job>
</beans>
#1 Starts job configuration
#2 Configures chunk processing with reader and writer
#3 Sets commit interval
The configuration file starts with the usual declaration of the namespaces and associated prefixes we want to use:
the Spring namespace and Spring Batch namespace with the batch prefix. The Spring namespace is declared as
the default namespace, so we don’t need to use a prefix to use its elements. Unfortunately, this is not convenient
for the overall configuration because we’ll have to use the batch prefix for the batch job elements. In order to
make the configuration more readable, we use a workaround in XML: when we start the job configuration XML
element at #1, we specify a default XML namespace as an attribute of the job element. The scope of this new
default namespace is the job element and its child elements.
For Source Code, Sample Chapters, the Author Forum and other resources, go to
http://www.manning.com/templier/
Our chunk-processing step is configured with a chunk element (#2), in a step element, itself in a tasklet
element. In the chunk element (#2), we refer to the reader and writer beans with the reader and writer
attributes, respectively. The values of these two attributes are the IDs we previously defined in the reader and
writer configuration. Finally, at #3 the commit-interval attribute is set to a chunk size of 100.
We’re done with the copy portion of our batch process.
Summary
Spring Batch performs a lot of the work on our behalf: it reads the products from the flat file and imports them into
the database. We didn’t write any code for reading the data. For the write operation, we only created the logic to
insert and update products in the database. Putting these components together is straightforward thanks to
Spring’s lightweight container and the Spring Batch XML vocabulary.
For Source Code, Sample Chapters, the Author Forum and other resources, go to
http://www.manning.com/templier/
Here are some other Manning titles you might be interested in:
Craig Walls
MEAP Began: June 2009
Softbound print: Fall 2010 | 700 pages
ISBN: 9781935182351
Spring in Practice
EARLY ACCESS EDITION
For Source Code, Sample Chapters, the Author Forum and other resources, go to
http://www.manning.com/templier/