Industries Training & Education Partnership Developer Center Lines of Business University Alliances Events & Webinars Innovation Log On Join Us Hi, Guest Search the Community Activity Communications Actions Browse ABAP Testing and Troubleshooting Previous post Next post 0 Tweet 0 In earlier blogs, I looked at various performance tuning techniques and tried to identify some that are more important than others. In this blog, I want to look at different ways to construct a simple SELECT statement. A very common question that is asked in the ABAP forum is Which is better: a JOIN or FOR ALL ENTRIES ? Ive written a program that compares six different ways of constructing a SELECT statement: a simple, fully qualified SELECT ; a nested SELECT ; a SELECT using FOR ALL ENTRIES ; a SELECT using an INNER JOIN ; a SELECT using an OUTER JOIN ; and a SELECT using a sub-query. All of these SELECT s are fully qualified in the sense that they use all fields of the primary key. For comparison, Ive also added a SELECT that doesnt fully use the primary key. This task was made more difficult by the fact that its not really easy to compare a JOIN with a sub-query. A JOIN assumes that you want the data from more than one table. A sub-query assumes that you need data only from the main table. So the SELECT statements that I have constructed are quite simple and in some cases, not practical. They are just for comparison purposes. Ive used the GET RUN TIME statement for comparison rather than the EXPLAIN function of transaction ST05 because its difficult to compare multiple SELECT s with single SELECT s using this function. GET RUN TIME is not perfect either, but if you do multiple comparisons, particularly in a system with little activity, the results should be OK. Ive put all of the SELECTs used in comparisons within loops. You can adjust the number of loop passes on the selection screen. In any event, here is the program: REPORT ztest_selects LINE-SIZE 80 MESSAGE-ID 00.DATA: t001 TYPE t001, bkpf TYPE bkpf.SELECT-OPTIONS: s_bukrs FOR bkpf-bukrs MEMORY ID buk OBLIGATORY.SELECT-OPTIONS: s_belnr FOR bkpf-belnr MEMORY ID bln OBLIGATORY.PARAMETERS: p_gjahr LIKE bkpf-gjahr MEMORY ID gjr OBLIGATORY.SELECTION-SCREEN ULINE.PARAMETERS: p_loop1 TYPE i OBLIGATORY DEFAULT 5, p_loop2 TYPE i OBLIGATORY DEFAULT 10.TYPES: BEGIN OF t001_type, bukrs TYPE t001-bukrs, END OF t001_type, BEGIN OF bkpf_type, bukrs TYPE bkpf-bukrs, belnr TYPE bkpf- belnr, gjahr TYPE bkpf-gjahr, END OF bkpf_type.DATA: t001_int TYPE TABLE OF t001_type, t001_wa TYPE t001_type, bkpf_int TYPE TABLE OF bkpf_type, bkpf_wa TYPE bkpf_type.DATA: start TYPE i, end TYPE i, dif TYPE i.START-OF-SELECTION. DO p_loop1 TIMES. PERFORM simple_select. PERFORM nested_select. PERFORM for_all_entries. PERFORM inner_join. PERFORM outer_join. PERFORM sub_query. PERFORM unqualified_select. SKIP 1. ENDDO.*&------------------ ---------------------------------------------------**& Form simple_select*&----------------------------------------------------------------- ----** First we get documents using a select statement that is* fully qualified on the primary key. Because buffering may be an issue,* the first select will be disregarded in this test. However, in real* life, this would be the important time.*---------------------------------------------------------------------- *FORM simple_select.* Do an initial select of the documents we intend to get. Due to* buffering, the first select may take much longer then the next one. SELECT bukrs belnr gjahr FROM bkpf INTO TABLE bkpf_int WHERE bukrs IN s_bukrs AND belnr IN s_belnr AND gjahr EQ p_gjahr. IF sy-subrc <> 0. MESSAGE ID '00' TYPE 'E' NUMBER '001' WITH 'No Data meets selection criteria'. ENDIF.* Next we get the same document using the same fully qualified select* statement. We will use this in comparisons. GET RUN TIME FIELD start. DO p_loop2 TIMES. SELECT bukrs belnr gjahr FROM bkpf INTO TABLE bkpf_int WHERE bukrs IN s_bukrs AND belnr IN s_belnr AND gjahr EQ p_gjahr. ENDDO. GET RUN TIME FIELD end. dif = end - start. WRITE: /001 'Time for first SELECT (fully qualified)', 055 ':', dif, 'microseconds'.ENDFORM. " simple_select*&---------- -----------------------------------------------------------**& Form nested_select*&----------------------------------------------------------------- ----** text*-------------------------------------------------------------- --------*FORM nested_select.* Use the same fully qualified SELECT, but this time The program has two SELECT-OPTIONS and one PARAMETER for selecting data: Company code, document number and fiscal year. I ran it four different ways: with a single company code and document number, with a single company code and a range of document numbers, with a range of company codes and a single document number and with ranges of both company codes and document numbers. I ran the program a number of times in a 4.7 environment with DB2 databases. I was a bit surprised at some of the results: For the simple case (single company code and document number) all of the methods worked almost equally well. The single fully qualified SELECT worked best, while the OUTER JOIN was worst. But the worst case only added about 25% execution time. The nested SELECT was really no worse than the others With a single company code and range of document numbers, the execution times increased, but the overall JOINS vs. FOR ALL ENTRIES - Which Performs Better? Posted by Rob Burbank in ABAP Testing and Troubleshooting on Mar 19, 2007 1:46:48 PM Share 2 Like Average User Rating (1 rating) 0 Tweet 0 results were quite similar to the simple case with the exception that the nested SELECT added about 75% to the execution time. With a range of company codes either with a single or range of document numbers, the results were different: the execution times for both the OUTER JOIN and fully qualified SELECT were dramatically higher (500 to 1000 times) than the other methods. This (to me at least) was the really surprising result. The following statement: SELECT bukrs belnr gjahr FROM bkpf INTO TABLE bkpf_int WHERE bukrs IN s_bukrs AND belnr IN s_belnr AND gjahr EQ p_gjahr. Is far less efficient than: SELECT t001~bukrs bkpf~belnr bkpf~gjahrFROM bkpf INNER JOIN t001 ON t001~bukrs EQ bkpf~bukrs INTO TABLE bkpf_int WHERE t001~bukrs IN s_bukrs AND bkpf~belnr IN s_belnr AND bkpf~gjahr EQ p_gjahr. when a range of company codes is used. The increase in execution time for the OUTER JOIN is probably due to the fact that I could not use T001~BUKRS in the WHERE clause because of that limitation on OUTER JOINs In the final analysis, there is no one size fits all answer to the question is Which is better: a JOIN or FOR ALL ENTRIES ? In many, if not most cases, my money is on the JOIN , but the difference is not large enough to spend much time jumping through hoops to pare off the last microsecond. In the end, if you are interested in the differences for your particular case, then you must code different SELECTs to find which is best. But then you also have to bear in mind that the same SELECT may behave differently based on the makeup of the WHERE clause. There are other considerations that come into play as well: INNER JOINs only look at the intersection of the results that meet the WHERE clause. FOR ALL ENTRIES eliminates duplicates from the results. I find JOINs to be more time consuming to code. (I can never find the ~ key.) When using FOR ALL ENTRIES you generally end up with at least two internal tables. This may or may not be a good thing. The example I have shown uses the full primary key. Some preliminary testing I have done comparing JOINs with FOR ALL ENTRIES show that FOR ALL ENTRIES can give better performance in that case. One final thing to note: in the above program, the one SELECT that consistently underperformed was the one that did not use the index effectively. And that is the real point here. All of the techniques that I have shown here work reasonably effectively. The most important thing to remember is to use an index. 6370 Views Topics: perf ormance, abap Tags: access, table, beginner, analytics, questions, cluster, identical, selects, indexes, st12, select, index, bseg, loop, join Share 2 Like 19 Comments Like (0) Suresh Datti Mar 19, 2007 4:17 PM I have followed your other blogs/responses that foucs on performance issues too. One small correction to the final analysis though.. "FOR ALL ENTRIES eliminates duplicates from the results. " Not always.. If you include all the key fields in your SELECT clause or use a SELECT *, the duplicates do get picked with the FOR ALL ENTRIES option. ~Suresh Rob Burbank Mar 20, 2007 5:17 AM (in response to Suresh Datti) But then they're not duplicates ;) Like (1) Rob Like (0) Jason Scott Mar 20, 2007 3:33 PM (in response to Rob Burbank) I think much more reailistic results to compare the differences can be gained by selecting much larger sets of data. Try queries that search thru tables containing millions of records... Like (0) Rob Burbank Mar 21, 2007 11:42 AM (in response to Jason Scott) Yes, I agree. I wanted to compare as many different methods as possible - including a sub-query. I tried but wan't able to come up with a combination of tables that were both large and allowed me to do everything I wanted. I'm sure I missed something, but there you are. As I said in response to another question, I did some testing on joins using secondary indices that showed somewhat different results. For this testing, I did use larger tables (EKKO, EKPO and some others). In another blog: Performance - what will kill you and what will leave you with only a flesh wound ( Performance - what will kill you and what will leave you with only a flesh wound) I also used larger tables. Thanks for your comments. Rob Like (0) Peter Inotai Mar 20, 2007 4:42 AM Hi Rob, Thanks for this interesting weblog! I believe the result also depends on the DB tuning. There are several FOR ALL ENTRIES relevant profile parameters. It's explained in the following OSS notes: Note 48230 - Parameters for the SELECT ... FOR ALL ENTRIES statement Note 652634 - FOR ALL ENTRIES performance with Microsoft SQL Server Note 634263 - Selects with FOR ALL ENTRIES as of kernel 6.10 It's worth to check these parameters, before making any decision, which way to choose. Best regards, Peter Like (0) Rob Burbank Mar 20, 2007 5:15 AM (in response to Peter Inotai) Thanks for the reply - yes, it certainly does depend on database tuning. I guess, I didn't state it overtly, but this is just a tool to help with analysis. I'm not a DBA and don't the details of how the database is tuned, all I can do is write a program that shows up the differences. (Or ask a DBA, but what's the fun in that?) Rob Like (0) Lars Breddemann Mar 20, 2007 5:19 AM How have the join conditions been supported by indexes on db-level in your tests? DBs heavily rely on additional structures that enable the efficient handling of Join-Selects. I really would like to see how the Join-Performance-Comparisation looks like if the database has the right indexes in place. KR Lars Rob Burbank Mar 20, 2007 5:27 AM (in response to Lars Breddemann) I'm not entirely sure I understand your question. But I'll try to answer. If I'm off base, let me know. I only looked in this blog at the primary index. As I did some testing while doing the research, I did some tests using secondary indices and found that FOR ALL ENTRIES was somewhat faster than a JOIN. If time permits, I'll try to look at this in another blog. Like (0) Rob Like (0) Joe Reddy Mar 21, 2007 9:05 PM Hi Rob, I understand the profound performance difference it makes when we use Joins and For All Entries. But again the Performance depends on various other System Factors also. Cheers, Joseph. Like (0) Rob Burbank Mar 23, 2007 6:10 AM (in response to Joe Reddy) Yes, it does. But this was really just an attempt to show one way to answer the question for yourself programmatically without knowing the system factors. Rob Like (0) Kjetil Kilhavn Aug 8, 2007 4:56 AM I see that your program first executes the selection, and then measures the execution time of performing it again (in the loop). I re-arranged your PERFORMs, turning the list upside down. Result: fully qualified select was fastest every time when I selected data for two company codes (all document numbers) I have possibly misunderstood how you gave your inputs, but it beats me how a join with T001 can be faster than a direct select on BKPF, unless you specify a lot of invalid company codes in the selection criteria. Kjetil Kilhavn Aug 8, 2007 5:35 AM (in response to Kjetil Kilhavn) Aha! I tried again, but instead of specifying the two company codes (0010 and 0040) as two individual entries, I specified it as a range from 0010 to 0040. Performance got considerably worse for ALL routines except the partially qualified one. For the partially qualified routine the results were virtually unchanged, for the others the runtime was increased to the same level as for the partially qualified routine. So while the first run gave the following average results for the five iterations: Partially qualified: 3,229 seconds Sub-query: 0,446 seconds Outer join: 0,582 seconds Inner join: 0,444 seconds FOR ALL ENTRIES: 0,466 seconds Nested: 0,562 seconds Fully qualified: 0,436 seconds The second run gave the following average results: Partially qualified: 3,214 seconds Sub-query: 3,252 seconds Outer join: 4,159 seconds Inner join: 3,238 seconds FOR ALL ENTRIES: 2,721 seconds Nested: 3,227 seconds Fully qualified: 3,173 seconds The second run includes four company codes instead of two, but it surprised me that the performance where now almost the same in all cases. So I tried specifying a range of 0010 to 0040 and excluding the two company codes 0020 and 0030. Partially qualified: 3,225 seconds Sub-query: 0,572 seconds Outer join: 2,818 seconds Inner join: 0,564 seconds FOR ALL ENTRIES: 0,467 seconds Nested: 0,562 seconds Fully qualified: 2,617 seconds Finally I tried specifying the four company codes 0010, 0020, 0030, and 0040 as four individual entries. This should compare to the second case (same data at least). Partially qualified: 3,218 seconds Sub-query: 2,539 seconds Outer join: 3,180 seconds Inner join: 3,273 seconds FOR ALL ENTRIES: 2,712 seconds Nested: 3,223 seconds Fully qualified: 2,507 seconds Like (0) Why is the fully qualified select be four times as fast when specifying the company codes individually rather than as a range? What to make of it? I am not really sure... except to support the statement that there is no step-by-step recipe you can follow to improve performance. If I am to draw one (small) conclusion it is that fully qualified selects are much less robust to different specifications of ranges than a join or sub-query or FOR ALL ENTRIES. Like (0) Rob Burbank Aug 8, 2007 11:59 AM (in response to Kjetil Kilhavn) The direct select on BKPF was first without any company codes. So I used a technique that I showed in an earlier BLOG: Using an Index When You Don't Have all of the Fields If you don't know one of the leftmost key fields, it turns out to be faster to use all possible entries in the SELECT rather than just leave it out of the WHERE. As for the order of doing the SELECTS, that's why I put them in a DO that can be executed multiple times. That way each SELECT both comes before and after every other one. Thanks for taking the time to comment. Rob Like (0) Siegf ried Boes Jan 31, 2008 1:02 AM Hi Rob, I have played a bit with your program just now, and I must say that I find the set-up a bit special. The number of found records should be much larger and two case should be handled. Many joins work in a way that the conditions on table A give - lets say - 1000 records and the conditions on table B 2000 records, but the inner join is fulfilled on by an intersection of the two set, containing 50 records. In this cases it is quite obvious that a join is much faster than a FOR ALL ENTRIES. FOR ALL ENTRIES makes sense if the first select gives 1000 records and every further selects adds information to the 1000 records. FOR ALL ENTRIES is perfect if the SELECTS are not close together because there is processing in between. If the SELECTS come close together then a JOIN would also be an option. Be aware that the join can put the information of all tables into one internal table with the results. The FOR ALL ENTRIES can not do that (not yet). There the internal table must be mixed up separately, and if no BINARY SEARCH is used then it is definitely slower. Siegfried Like (0) Rob Burbank Jan 31, 2008 2:43 PM (in response to Siegf ried Boes) Siegfried - I mostly agree, but the real point here is that the performance gains in either case are small. You will likely not be able to cut down the execution time in half of a SELECT using one method over another. I generally find FOR ALL ENTRIES to be easier to use and I mostly use that. But if you look at the forums (and I know you do), you'd think that the most important performance tuning technique is to use SELECT ... INTO CORRESPONDING FIELDS OF... (or is it to avoid using that) and some other things that don't much matter. But if you want to reduce execution times by 1/20 or 1/30, you have to look at other things. You might also want to look at http://blogs.ittoolbox.com/sap/db2/archives/for-all-entries-vs-db2-join-8912 Rob Like (0) Jay Dalwadi Jul 4, 2013 9:43 AM (in response to Rob Burbank) sir, can you tell me which is better in performance tunning? i am still confuse in for all entries or inner join because Mr. Matthew Billingham told me that For All Entries is LESS efficient than an inner join. Matthew Billingham Jul 3, 2013 5:34 PM (in response to Jay Dalwadi) Follow SCN Site Index Contact Us SAP Help Portal Privacy Terms of Use Legal Disclosure Copyright Like (0) Note, Rob's post was made five years ago. Things have moved on since then. See these:
http://scn.sap.com/thread/3370614 and http://scn.sap.com/message/13134098#13134098 Like (0) Peter Inotai Jul 4, 2013 11:46 AM (in response to Matthew Billingham) And since it also depends on your DB and DB tuning, for example after HANA conversion it might behave also differently. There are some info about HANA related tuning here (Performance Check 1.2: Search SELECT .. FOR ALL ENTRIES-clauses to be transformed): http://scn.sap.com/community/abap/hana/blog/2013/06/05/abap- on-hana--from-analysis-to-optimization Like (0) Manuel Collet-Beillon Jul 22, 2013 10:40 AM HI Rob,