Puzzling behavior, malloc and free(), with libuv

Using the sample code to learn about libuv I have come across a side effect I don't understand for sure. The code uses malloc() to obtain memory to store data from a client on the network and then send the same data back, just echos. It then uses free to release the memory. This repeats over and over through a call back loop. The line of code getting the memory is:

uv_write_t *req = (uv_write_t *) malloc(sizeof(uv_write_t));

and the lines freeing the memory are:

free((char*) req->data);
free(req);

However if you input a long string such as "Whats the word on the street?" to be echoed and then in put shorter strings like "Hi" fragments of the older string will reappear after the shorter string is echoed back. For instance output can be like this:

Whats the word on the street? hi hi howdy howdy he word on the street?

Since the memory is being freed I am uncertain why the older fragment is showing back up. My thoughts on the subject is that either there is something I don't understand about malloc and free() or there is a bug in the library in how it determines the size needed for the incoming data and after using a longer string I am getting garbage as part of a memory block that was to big. If that is the case then the fact it is a fragment of my earlier input is just happenstance. Is this the likely reason, or am I missing something? Is there any other info. that I should include to clarify it?


ANSWERS:


Implementations of malloc() will vary, but its safe to assume assume that calls to malloc() can return a pointer to a previously free()-ed chunks of memory and that the memory returned will not have been zeroed out. In other words, its perfectly normal for malloc() to give us a pointer to data that contains previously initialized data.

That said, I suspect the root problem here will be an unterminated string, which was probably an artifact of the way you are serializing the string. For example, if you are merely writing strlen(str) bytes from the client, you are not writing a NULL. As a result when the server receives the message it will have an un-terminated string. If this is how you plan to pass the string and you plan to treat it as a normal null-terminated string, the server will need to copy the data into a buffer large enough to accomodate the string plus the additional NULL char.

So why then are you seeing fragments of past messages? Probably dumb luck. If this is a really simple app, its very possible for malloc() to return a chunk of memory that overlaps with the previous request.

So then why am I getting such clean output, shouldn't I see tons of garbled data, or a segfault for my string operations walking off into infinity? Again, dumb luck. Keep in mind that when the kernel first gives your application a page of memory, it will have first zeroed-out page out (this is done for security reasons). So, even though you might not have terminated the string, the page of heap memory where your string resides might be sitting in a relatively pristine zeroed-out state.


uv_write_t *req is not the data to be sent or received. It's just something like a handle to a write request.

Neither is req->data. That is a pointer to arbitrary private data for you. It might be used for example if you wanted to pass around some data related to the connection.

The actual payload data are sent through a write buffer (uv_buf_t) and received into a buffer that is allocated when a read request is served. That's why the read function wants an alloc parameter. Later that buffer is passed to the read callback.

The freeing of req->data assumes that 'data' pointed to some private data, typically a structure, that was malloc'd (by you).

As a rule of thumb, a socket is represented by a uv_xxx_t while reading and writing use 'request' structures. Writing a server (a typical uv use case) one doesn't know how many connections there will be, hence everything is allocated dynamically.

To make your life easier you might think in terms of pairs (open/close or start/done). So when accepting a new connection you start a cycle and allocate the client. When closing that connection you free it. When writing, you allocate the request as well as the payload data buffer. When done writing, you free them. When reading you allocate a read request and the payload data buffer is allocated behind the scene (through the alloc callback) when done reading (and having copied the payload data) you free them both.

There are ways to get the job done without all those malloc/free pairs (which aren't glourious performance wise) but for a novice I would agree with the uv docs; you should definitely start with the malloc/free route. To give you an idea: I pre-allocate everything for some ten or hundred thousand connections but that brings some administration and trickery with it, e.g. faking in the alloc call back to merely assign one of your pre-allocated buffers.

If asked to guess I'd suggest that avoiding malloc/free is only worth the trouble beyond well over 5k - 10k connections at any point in time.



 MORE:


 ? Simple test of malloc and free with int pointer causes double free or corruption error
 ? Kernighan & Ritchie malloc free logic
 ? C malloc and free
 ? How do malloc() and free() work?
 ? How do malloc() and free() work?
 ? How do malloc() and free() work?
 ? How is malloc() implemented internally?
 ? How does C free() work?
 ? How are malloc and free implemented in C?
 ? how does dynamic memory allocation work