JOINS Vs

Getting Started Newsletters Store
Products Services & Support About SCN Downloads

Industries Training & Education Partnership Developer Center
Lines of Business University Alliances Events & Webinars Innovation
Log On Join Us Hi, Guest Search the Community
Activity Communications Actions
Browse
ABAP Testing and Troubleshooting
Previous
post
Next
post
0 Tweet 0
In earlier blogs, I looked at various performance tuning techniques and tried to identify some that are more important
than others. In this blog, I want to look at different ways to construct a simple SELECT statement.
A very common question that is asked in the ABAP forum is Which is better: a JOIN or FOR ALL ENTRIES ? Ive
written a program that compares six different ways of constructing a SELECT statement: a simple, fully qualified
SELECT ; a nested SELECT ; a SELECT using FOR ALL ENTRIES ; a SELECT using an INNER JOIN ; a SELECT
using an OUTER JOIN ; and a SELECT using a sub-query. All of these SELECT s are fully qualified in the sense that
they use all fields of the primary key. For comparison, Ive also added a SELECT that doesnt fully use the primary key.
This task was made more difficult by the fact that its not really easy to compare a JOIN with a sub-query. A JOIN
assumes that you want the data from more than one table. A sub-query assumes that you need data only from the
main table. So the SELECT statements that I have constructed are quite simple and in some cases, not practical.
They are just for comparison purposes.
Ive used the GET RUN TIME statement for comparison rather than the EXPLAIN function of transaction ST05 because
its difficult to compare multiple SELECT s with single SELECT s using this function. GET RUN TIME is not perfect
either, but if you do multiple comparisons, particularly in a system with little activity, the results should be OK. Ive put
all of the SELECTs used in comparisons within loops. You can adjust the number of loop passes on the selection
screen.
In any event, here is the program:
REPORT ztest_selects LINE-SIZE 80 MESSAGE-ID 00.DATA: t001 TYPE t001,
bkpf TYPE bkpf.SELECT-OPTIONS: s_bukrs FOR bkpf-bukrs MEMORY ID buk
OBLIGATORY.SELECT-OPTIONS: s_belnr FOR bkpf-belnr MEMORY ID bln
OBLIGATORY.PARAMETERS: p_gjahr LIKE bkpf-gjahr MEMORY ID gjr
OBLIGATORY.SELECTION-SCREEN ULINE.PARAMETERS: p_loop1 TYPE
i OBLIGATORY DEFAULT
5, p_loop2 TYPE i
OBLIGATORY DEFAULT 10.TYPES: BEGIN OF t001_type,
bukrs TYPE t001-bukrs, END OF t001_type, BEGIN OF
bkpf_type, bukrs TYPE bkpf-bukrs, belnr TYPE bkpf-
belnr, gjahr TYPE bkpf-gjahr, END OF bkpf_type.DATA: t001_int
TYPE TABLE OF t001_type, t001_wa TYPE t001_type, bkpf_int
TYPE TABLE OF bkpf_type, bkpf_wa TYPE bkpf_type.DATA: start
TYPE i, end TYPE i, dif TYPE i.START-OF-SELECTION. DO
p_loop1 TIMES. PERFORM simple_select. PERFORM nested_select. PERFORM
for_all_entries. PERFORM inner_join. PERFORM outer_join. PERFORM
sub_query. PERFORM unqualified_select. SKIP 1. ENDDO.*&------------------
---------------------------------------------------**& Form
simple_select*&-----------------------------------------------------------------
----** First we get documents using a select statement that is* fully qualified
on the primary key. Because buffering may be an issue,* the first select will be
disregarded in this test. However, in real* life, this would be the important
time.*----------------------------------------------------------------------
*FORM simple_select.* Do an initial select of the documents we intend to get.
Due to* buffering, the first select may take much longer then the next one.
SELECT bukrs belnr gjahr FROM bkpf INTO TABLE bkpf_int WHERE bukrs
IN s_bukrs AND belnr IN s_belnr AND gjahr EQ p_gjahr. IF sy-subrc <>
0. MESSAGE ID '00' TYPE 'E' NUMBER '001' WITH 'No Data meets
selection criteria'. ENDIF.* Next we get the same document using the same fully
qualified select* statement. We will use this in comparisons. GET RUN TIME
FIELD start. DO p_loop2 TIMES. SELECT bukrs belnr gjahr FROM
bkpf INTO TABLE bkpf_int WHERE bukrs IN s_bukrs AND belnr IN
s_belnr AND gjahr EQ p_gjahr. ENDDO. GET RUN TIME FIELD end. dif = end
- start. WRITE: /001 'Time for first SELECT (fully qualified)', 055
':', dif, 'microseconds'.ENDFORM. " simple_select*&----------
-----------------------------------------------------------**& Form
nested_select*&-----------------------------------------------------------------
----** text*--------------------------------------------------------------
--------*FORM nested_select.* Use the same fully qualified SELECT, but this time
The program has two SELECT-OPTIONS and one PARAMETER for selecting data: Company code, document number
and fiscal year. I ran it four different ways: with a single company code and document number, with a single company
code and a range of document numbers, with a range of company codes and a single document number and with
ranges of both company codes and document numbers.
I ran the program a number of times in a 4.7 environment with DB2 databases. I was a bit surprised at some of the
results:
For the simple case (single company code and document number) all of the methods worked almost equally
well. The single fully qualified SELECT worked best, while the OUTER JOIN was worst. But the worst case only
added about 25% execution time. The nested SELECT was really no worse than the others
With a single company code and range of document numbers, the execution times increased, but the overall
JOINS vs. FOR ALL ENTRIES - Which Performs
Better?
Posted by Rob Burbank in ABAP Testing and Troubleshooting on Mar 19, 2007 1:46:48 PM
Share 2 Like
Average User Rating
(1 rating)
0 Tweet 0
results were quite similar to the simple case with the exception that the nested SELECT added about 75% to the
execution time.
With a range of company codes either with a single or range of document numbers, the results were different:
the execution times for both the OUTER JOIN and fully qualified SELECT were dramatically higher (500 to 1000
times) than the other methods. This (to me at least) was the really surprising result. The following statement:
SELECT bukrs belnr gjahr FROM bkpf INTO
TABLE bkpf_int WHERE bukrs IN
s_bukrs AND belnr IN s_belnr AND
gjahr EQ p_gjahr.
Is far less efficient than:
SELECT t001~bukrs bkpf~belnr
bkpf~gjahrFROM bkpf INNER JOIN t001 ON
t001~bukrs EQ bkpf~bukrs INTO TABLE
bkpf_int WHERE t001~bukrs IN s_bukrs
AND bkpf~belnr IN s_belnr AND
bkpf~gjahr EQ p_gjahr.
when a range of company codes is used. The increase in execution time for the OUTER JOIN is probably due to the
fact that I could not use T001~BUKRS in the WHERE clause because of that limitation on OUTER JOINs
In the final analysis, there is no one size fits all answer to the question is Which is better: a JOIN or FOR ALL
ENTRIES ? In many, if not most cases, my money is on the JOIN , but the difference is not large enough to spend
much time jumping through hoops to pare off the last microsecond. In the end, if you are interested in the differences
for your particular case, then you must code different SELECTs to find which is best. But then you also have to bear in
mind that the same SELECT may behave differently based on the makeup of the WHERE clause.
There are other considerations that come into play as well:
INNER JOINs only look at the intersection of the results that meet the WHERE clause.
FOR ALL ENTRIES eliminates duplicates from the results.
I find JOINs to be more time consuming to code. (I can never find the ~ key.)
When using FOR ALL ENTRIES you generally end up with at least two internal tables. This may or may not be a
good thing.
The example I have shown uses the full primary key. Some preliminary testing I have done comparing JOINs
with FOR ALL ENTRIES show that FOR ALL ENTRIES can give better performance in that case.
One final thing to note: in the above program, the one SELECT that consistently underperformed was the one that did
not use the index effectively. And that is the real point here. All of the techniques that I have shown here work
reasonably effectively. The most important thing to remember is to use an index.
6370 Views
Topics: perf ormance, abap Tags: access, table, beginner, analytics, questions, cluster, identical, selects, indexes, st12,
select, index, bseg, loop, join
Share 2 Like
19 Comments
Like (0)
Suresh Datti Mar 19, 2007 4:17 PM
I have followed your other blogs/responses that foucs on performance issues too.
One small correction to the final analysis though..
"FOR ALL ENTRIES eliminates duplicates from the results. "
Not always.. If you include all the key fields in your SELECT clause or use a SELECT *, the duplicates
do get picked with the FOR ALL ENTRIES option.
~Suresh
Rob Burbank Mar 20, 2007 5:17 AM (in response to Suresh Datti)
But then they're not duplicates ;)
Like (1)
Rob
Like (0)
Jason Scott Mar 20, 2007 3:33 PM (in response to Rob Burbank)
I think much more reailistic results to compare the differences can be gained by
selecting much larger sets of data. Try queries that search thru tables containing
millions of records...
Like (0)
Rob Burbank Mar 21, 2007 11:42 AM (in response to Jason Scott)
Yes, I agree. I wanted to compare as many different methods as
possible - including a sub-query. I tried but wan't able to come up with a
combination of tables that were both large and allowed me to do
everything I wanted. I'm sure I missed something, but there you are.
As I said in response to another question, I did some testing on joins
using secondary indices that showed somewhat different results. For
this testing, I did use larger tables (EKKO, EKPO and some others). In
another blog: Performance - what will kill you and what will leave you with
only a flesh wound ( Performance - what will kill you and what will
leave you with only a flesh wound)
I also used larger tables.
Thanks for your comments.
Rob
Like (0)
Peter Inotai Mar 20, 2007 4:42 AM
Hi Rob,
Thanks for this interesting weblog!
I believe the result also depends on the DB tuning. There are several FOR ALL ENTRIES relevant
profile parameters.
It's explained in the following OSS notes:
Note 48230 - Parameters for the SELECT ... FOR ALL ENTRIES statement
Note 652634 - FOR ALL ENTRIES performance with Microsoft SQL Server
Note 634263 - Selects with FOR ALL ENTRIES as of kernel 6.10
It's worth to check these parameters, before making any decision, which way to choose.
Best regards,
Peter
Like (0)
Rob Burbank Mar 20, 2007 5:15 AM (in response to Peter Inotai)
Thanks for the reply - yes, it certainly does depend on database tuning. I guess, I didn't
state it overtly, but this is just a tool to help with analysis. I'm not a DBA and don't the details
of how the database is tuned, all I can do is write a program that shows up the differences.
(Or ask a DBA, but what's the fun in that?)
Rob
Like (0)
Lars Breddemann Mar 20, 2007 5:19 AM
How have the join conditions been supported by indexes on db-level in your tests?
DBs heavily rely on additional structures that enable the efficient handling of Join-Selects.
I really would like to see how the Join-Performance-Comparisation looks like if the database has the
right indexes in place.
KR Lars
Rob Burbank Mar 20, 2007 5:27 AM (in response to Lars Breddemann)
I'm not entirely sure I understand your question. But I'll try to answer. If I'm off base, let me
know.
I only looked in this blog at the primary index. As I did some testing while doing the
research, I did some tests using secondary indices and found that FOR ALL ENTRIES was
somewhat faster than a JOIN.
If time permits, I'll try to look at this in another blog.
Like (0)
Rob
Like (0)
Joe Reddy Mar 21, 2007 9:05 PM
Hi Rob,
I understand the profound performance difference it makes when we use Joins and For All Entries.
But again the Performance depends on various other System Factors also.
Cheers,
Joseph.
Like (0)
Rob Burbank Mar 23, 2007 6:10 AM (in response to Joe Reddy)
Yes, it does. But this was really just an attempt to show one way to answer the question for
yourself programmatically without knowing the system factors.
Rob
Like (0)
Kjetil Kilhavn Aug 8, 2007 4:56 AM
I see that your program first executes the selection, and then measures the execution time of
performing it again (in the loop).
I re-arranged your PERFORMs, turning the list upside down. Result: fully qualified select was fastest
every time when I selected data for two company codes (all document numbers)
I have possibly misunderstood how you gave your inputs, but it beats me how a join with T001 can
be faster than a direct select on BKPF, unless you specify a lot of invalid company codes in the
selection criteria.
Kjetil Kilhavn Aug 8, 2007 5:35 AM (in response to Kjetil Kilhavn)
Aha! I tried again, but instead of specifying the two company codes (0010 and 0040) as two
individual entries, I specified it as a range from 0010 to 0040.
Performance got considerably worse for ALL routines except the partially qualified one. For
the partially qualified routine the results were virtually unchanged, for the others the runtime
was increased to the same level as for the partially qualified routine.
So while the first run gave the following average results for the five iterations:
Partially qualified: 3,229 seconds
Sub-query: 0,446 seconds
Outer join: 0,582 seconds
Inner join: 0,444 seconds
FOR ALL ENTRIES: 0,466 seconds
Nested: 0,562 seconds
Fully qualified: 0,436 seconds
The second run gave the following average results:
The second run includes four company codes instead of two, but it surprised me that the
performance where now almost the same in all cases. So I tried specifying a range of 0010
to 0040 and excluding the two company codes 0020 and 0030.
Finally I tried specifying the four company codes 0010, 0020, 0030, and 0040 as four
individual entries. This should compare to the second case (same data at least).
Like (0)
Why is the fully qualified select be four times as fast when specifying the company codes
individually rather than as a range?
What to make of it? I am not really sure... except to support the statement that there is no
step-by-step recipe you can follow to improve performance.
If I am to draw one (small) conclusion it is that fully qualified selects are much less robust
to different specifications of ranges than a join or sub-query or FOR ALL ENTRIES.
Like (0)
Rob Burbank Aug 8, 2007 11:59 AM (in response to Kjetil Kilhavn)
The direct select on BKPF was first without any company codes.
So I used a technique that I showed in an earlier BLOG:
Using an Index When You Don't Have all of the Fields
If you don't know one of the leftmost key fields, it turns out to be faster to use all possible
entries in the SELECT rather than just leave it out of the WHERE.
As for the order of doing the SELECTS, that's why I put them in a DO that can be executed
multiple times. That way each SELECT both comes before and after every other one.
Thanks for taking the time to comment.
Rob
Like (0)
Siegf ried Boes Jan 31, 2008 1:02 AM
Hi Rob,
I have played a bit with your program just now, and I must say that I find the set-up a bit special. The
number of found records should be much larger and two case should be handled.
Many joins work in a way that the conditions on table A give - lets say - 1000 records and the
conditions on table B 2000 records, but the inner join is fulfilled on by an intersection of the two set,
containing 50 records. In this cases it is quite obvious that a join is much faster than a FOR ALL
ENTRIES.
FOR ALL ENTRIES makes sense if the first select gives 1000 records and every further selects adds
information to the 1000 records. FOR ALL ENTRIES is perfect if the SELECTS are not close together
because there is processing in between. If the SELECTS come close together then a JOIN would
also be an option. Be aware that the join can put the information of all tables into one internal table
with the results. The FOR ALL ENTRIES can not do that (not yet). There the internal table must be
mixed up separately, and if no BINARY SEARCH is used then it is definitely slower.
Siegfried
Like (0)
Rob Burbank Jan 31, 2008 2:43 PM (in response to Siegf ried Boes)
Siegfried - I mostly agree, but the real point here is that the performance gains in either
case are small. You will likely not be able to cut down the execution time in half of a
SELECT using one method over another.
I generally find FOR ALL ENTRIES to be easier to use and I mostly use that.
But if you look at the forums (and I know you do), you'd think that the most important
performance tuning technique is to use SELECT ... INTO CORRESPONDING FIELDS OF...
(or is it to avoid using that) and some other things that don't much matter.
But if you want to reduce execution times by 1/20 or 1/30, you have to look at other things.
You might also want to look at
http://blogs.ittoolbox.com/sap/db2/archives/for-all-entries-vs-db2-join-8912
Rob
Like (0)
Jay Dalwadi Jul 4, 2013 9:43 AM (in response to Rob Burbank)
sir, can you tell me which is better in performance tunning? i am still confuse in for
all entries or inner join because Mr. Matthew Billingham told me that For All Entries
is LESS efficient than an inner join.
Matthew Billingham Jul 3, 2013 5:34 PM (in response to Jay Dalwadi)
Follow SCN
Site Index Contact Us SAP Help Portal
Privacy Terms of Use Legal Disclosure Copyright
Like (0)
Note, Rob's post was made five years ago. Things have moved on since
then. See these:

http://scn.sap.com/thread/3370614
and
http://scn.sap.com/message/13134098#13134098
Like (0)
Peter Inotai Jul 4, 2013 11:46 AM (in response to Matthew Billingham)
And since it also depends on your DB and DB tuning, for
example after HANA conversion it might behave also differently.
There are some info about HANA related tuning here
(Performance Check 1.2: Search SELECT ..
FOR ALL ENTRIES-clauses to be transformed):
http://scn.sap.com/community/abap/hana/blog/2013/06/05/abap-
on-hana--from-analysis-to-optimization
Like (0)
Manuel Collet-Beillon Jul 22, 2013 10:40 AM
HI Rob,

Thanks for sharing,

Very interesting.

Best regards, Manuel

JOINS Vs

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

JOINS Vs

Transféré par

Droits d'auteur :

Formats disponibles

Getting Started Newsletters Store

Products Services & Support About SCN Downloads

Vous aimerez peut-être aussi