RMalloc

RMalloc is a malloc implementation which means, a dynamic memory allocator .

One reason for this version of malloc was a problem I encountered with one of my other programs when allocating several gigabytes. Because of the used algorithms I deallocated and allocated most of this memory (which was used by relativly small chunks). This behaviour confused all other malloc implementations I tested and led to unacceptable program-performance. So I decided to do it by myself and wrote this lib which performed MUCH better.

The first version was a simple segregated storage allocator without splitting and coalescing. Allthough it was much faster and scaled very well wrt. the number of processors, it suffered from severe fragmentation (as is known for such kind of allocators) when used with a special variant of one of my other projects (see libHmatrix). But it performs very well if the application only uses a limited number of different sizes when allocating blocks.

To reduce this fragmentation I completly rewrote the allocator and added splitting and coalescing. It now works similar to LKmalloc by using private heaps with ownership for each threads. Thereby each memory-chunk is put back into the heap, it was allocated from.

The main differences between rmalloc and LKmalloc are the handling of small blocks, the allocation of memory from the operating system and the mapping of heaps to threads.

For performance reasons small chunk are only allocated from containers, which hold several equal sized chunks. Especially if a lot of small blocks are allocated, this improves the overall performance of the allocator. This also avoids splitting and coalescing these small blocks (which might reduce fragmentation).

Memory from the operating system is not allocated in equal sized stripes (as in LKmalloc) but in dynamically growing chunks via mmap (or sbrk). This reduces the number of calls to these system-functions. The size of each chunk is adjusted according to the memory in use.

Heaps are associated to threads by using thread-private data (via PThread-functions). Thereby the number of heaps is automatically adjusted to the number of concurrent threads (LKmalloc need some initialisation for this).