[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [E-devel] Memory pool management



Hi,

	I finally got some more time to take a look at the memory management of eet 
and evas.

On Thursday 23 March 2006 03:29, Carsten Haitzler wrote:
> On Wed, 22 Mar 2006 16:53:07 +0100 Cedric <cedric.bail@free.fr> babbled:
> > On Wednesday 22 March 2006 12:40, Carsten Haitzler wrote:

[snip benchmark]

I finally run callgring with the benchmark and the callgraph is not looking 
like the callgraph I have when running E17 
(http://chac.le-poulpe.net/~cedric/ememoa/benchmark/). I ran the bench with 
the default theme as a parameter, and it was more looking like E17. But 
still eet_data_get_string was the most biggest allocator by far. So ememoa, 
was quite useless for this bench in any case.

> i actually think the speed differences are going to be much of a muchness
> - depending on cpu arch (this is an amd64 cpu with everything in 64bit),
> how your libc is compiled, etc. etc. etc. - i am not sure we can show
> consistent wins. if its faster on one box, and slower on another - it
> likely is better to do "nothingg" until u win on at least 1 arch/cpu and
> don't loose on others.

I updated my library with some code optimisation and a possibility to 
compile it for accessing memory bitmap 64bits at a time. It didn't seems to 
have any noticeable effect on the amd64 system I used to run the test. I am 
not sure but I think only Opteron has a 64bits memory bus. The code is 
available at 
http://chac.le-poulpe.net/~cedric/ememoa/v0.0.10/ememoa-0.0.10.tar.bz2
	I also solved the problem with compiler options needed by others program 
using ememoa. Now the library permanently track all mempool. This extends 
the possibility of garbage collector and statistics. As eet_init and 
eet_shutdown are not called the same amount of time, I wasn't able before 
that change to see the real usage of the memory pool.
	With this global statistic capability, I discovered that enlightenment 
didn't use eet_cache capability at all ! I don't know if it's intended, but 
when reactivating it, it segv due to a bug in eet_close 
(http://chac.le-poulpe.net/~cedric/ememoa/fix/eet-cacheburst.patch). This 
bug correction seems to render useless any improvement in eet_open 
regarding allocation. After this patch, it's nearly impossible to see 
anything interesting from the callgraph as this seems to change the number 
of call to eet_open and the content of the cache. Ememoa statistics for a 
run of enlightenment: 
http://chac.le-poulpe.net/~cedric/ememoa/v0.0.10/ememoa_statitics_for_e.txt
	I also found a bug in edge_cc.c. It didn't call eet_init, nor eet_shutdown. 
I fixed it in edge_cc.c, but it's perhaps a better idea to call eet_int 
from edje_init 
(http://chac.le-poulpe.net/~cedric/ememoa/fix/edje-eet-init.patch).

	I took some time also to evaluate the difference between your evas 
allocator and ememoa. I tested 2 differents ideas: 
	- One mempool per list : bad idea, a lot of small list, took much more time 
to create/destroy the mempool 
(http://chac.le-poulpe.net/~cedric/ememoa/v0.0.10/evas_list_ememoa_per_list.patch)
	- One mempool for every thing : slower to allocate (around two time 
slower), but seems to improve cache locality and memory usage seems lower 
too 
(http://chac.le-poulpe.net/~cedric/ememoa/v0.0.10/evas_list_ememoa_one.patch)

	In fact for having one pool per list I was forced to reimplement 
evas_list_sort with out destroying mempool. So I implemented a mergesort in 
place without allocation 
(http://chac.le-poulpe.net/~cedric/ememoa/fix/evas_list_sort.patch). A test 
with result from my machine could be found at 
http://chac.le-poulpe.net/~cedric/ememoa/fix/test_sort.tar.gz . Basically 
it wins a factor around 2.5. (The second solution with one mempool, need an 
additional fix to eet_lib, 
http://chac.le-poulpe.net/~cedric/ememoa/v0.0.10/evas_list_ememoa_one_sort.patch).

> in some ways i think some eet "rewrok" would speed things up more.

> 1. actually mmap() instead of fopen() and actually don't malloc data for
> things like the strings in the key table if we can return a DIRECT
> POINTER to it from the mmaped segment.
> 2. add a eet_read_direct() that reads the data and returns a pointer to
> the mmap()ed data segment (if it is not compressed - if it's compressed
> you cannot use this call). this will avoid an alloc and memcopy for
> several useful cases like image loads.

Well, with eet cache working again, I don't know really know how to 
interpret the callgraph and what will be interesting to do. What is your 
opinion on that ? You can find callgrind I am refering to at 
http://chac.le-poulpe.net/~cedric/ememoa/v0.0.10/ .

In any case, I also identified 2 obvious functions using malloc a lot in 
evas_scale_sample.c and evas_scale_smooth_scaler_up.c, that could be easily 
replaced with alloca 
(http://chac.le-poulpe.net/~cedric/ememoa/fix/evas-alloca-fix.patch). I 
know that some portability issue exist with it, but EFL already use alloca.

Cedric