[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [E-devel] fix for that silly -mfpmath=sse weirdness
David Sharp wrote:
> On 9/25/06, The Rasterman Carsten Haitzler <email@example.com> wrote:
>> On Mon, 25 Sep 2006 10:50:09 -0700 Blake Barnett <firstname.lastname@example.org>
>> > On Sep 25, 2006, at 10:44 AM, David Sharp wrote:
>> > > amd64 users, rejoice!
>> > >
>> > > i got tired of slowing down e with -mfpmath=387 all the time, so i
>> > > finally dug in to this bug after realizing it was probably a problem
>> > > with floating point cancelation (occurs when subtracting two numbers
>> > > that are very near equal, and causes an extreme loss of precision).
>> > >
>> > > the problem is in _edje_part_recalc() when it linearly interpolates
>> > > all the part parameters. all of the caclulations are of this form:
>> > > p3.x = (p1.x * (1.0 - pos)) + (p2.x * (pos));
>> > > i believe there is some cancelation occuring in the (1.0-pos) part of
>> > > this, especially as pos approaches 1.0.
>> > >
>> > > replacing the above line with the following fixes the problem:
>> > > p3.x = p1.x + (p2.x - p1.x) * pos;
>> > > mathematically equivalent, but, alas, computers aren't as good at
>> > > as we think.
>> > Awesome! My 3500+ CPU has felt fairly sluggish in E, my video card
>> > has 128MB of video memory (ATI 9700PRO, using ATI's drivers) and I've
>> > noticed that things just aren't as fluid as in Rasters video
>> > captures. Hopefully this'll help. Nice work.
>> it won't make any difference you can measure. the amount of fp math
>> done is <
>> 0.1% of the work - by using sse math you lose a bit of precision and
>> maybe gain
>> 50% speedup - on that < 0.01%. frankly- you will not be able to even
>> the speedup letalone notice it. it's a fallacy to think it will help.
>> trust me.
> yah, it will only save about 37 instructions (b/c the new formulas use
> one less operation, and there are 37 of them), and the sse
> instructions will only be slightly faster than the FPU ones. that's a
> savings of about, say 25 ns each time _edje_part_recalc is run.
> btw, you gave blake my credit in the CVS log... :(
Bummer. I guess ATI's drivers just aren't as fast as Nvidia's. At least
we don't need a special case for amd64 when building packages now.