sql - Query quirks with DB2, no-op coalesce halves the runtime -
i'm working on optimizing queries db2 database. of them running fine, few rather large returned datasets (up 30k rows or so) giving me trouble. there's nothing terribly complicated these queries. select perhaps 15 different columns dozen joins, inner joins , left joins. there few in clauses reduce dataset parameters , subqueries. it's nothing special.
i've been looking @ indexes, without great deal of luck. thing is, in attempting force use of new index wasn't being utilized, stumbled upon nifty trick in db2: no-op coalesce in predicate can potentially reduce cost , runtime.
basically, works this:
select ... join b on a.id1 = b.id1 , a.id2 = b.id2 --more inner , left joins-- b.type_code in (:parameter) , a.x in (--subquery--) , coalesce(b.type_code,b.type_code) = b.type_code
the coalesce nothing logical standpoint , b.type_code never null. however, tricks db2 optimizer estimating lower number of rows, explanation here:
http://www-01.ibm.com/support/docview.wss?uid=swg21259831
as table b, type_code column has 15 different possible values , distribution not uniform. there 2.7 million rows in there , 900k rows in a. such, id1/id2 columns in b may repeat same value pair 15 times or occur once. :parameter no more 4 values.
the odd thing coalesce statement cuts query time in half. in checking explain plan, noticed several table scans on became index scans, due lower estimate of returned rows making index scan feasible optimizer. i've been digging around , trying find possible reason this. runstats run on these tables , syscat.coldist table has pretty recent data in it, @ least within 1% of values if check counts manually. did notice doesn't have of values type_code column in it, missing ones occur , omitting rare values in clause has no real effect on query time. i've tried coalesce on other columns, never same effect.
the explanation can think of stats stale, bad, outdated, etc, darned if know @ point. can there no column group stats in place, i'm not knowledgeable enough if that's going make difference.
i don't want put coalesce statement production it's bit hacky , more of band-aid on deeper problem solution in , of itself. seems me db2 overestimating number of rows returned in join, darned if know why. hoping others db2 experience might able shed light on this.
i'm wondering why have in (:parameter)
, parameter
not single value?
do better performance separate or
clauses in
?
i'm targeting in
clause because apparently can give performance issues (see: performance of "in" condition query in db2 ).
Comments
Post a Comment