Packet Update Queue

I was reading TC source and I noticed how packets to process are chosen.

[CODE]WorldPacket *packet = NULL;

while (m_Socket && !m_Socket->IsClosed() && _recvQueue.next(packet, updater))

//handle here[/CODE]

ACE_Based::LockedQueue::next looks like this:

[CODE]template

bool next(T& result, Checker& check)

{

ACE_Guard<LockType> g(this->_lock);


if (_queue.empty())

	return false;


result = _queue.front();

if(!check.Process(result))

	return false;


_queue.pop_front();

return true;

}[/CODE]

This combination means that if next opcode in queue is not processable in current call of WorldSession::Update (from world vs from map) then remaining opcodes are left to be handled by next call.

So what I started to think about is that if you are ‘unlucky’ and send for example opcodes in this order:

[CODE]PROCESS_THREADUNSAFE

PROCESS_THREADSAFE

PROCESS_THREADUNSAFE

PROCESS_THREADSAFE

PROCESS_THREADUNSAFE

PROCESS_THREADSAFE

PROCESS_THREADUNSAFE

PROCESS_THREADSAFE[/CODE]

It would take 8 update cycles to update all those opcodes. That’s 400ms without any server load. That’s unacceptable, right?

I know that #1 argument to say that such system is needed is ‘opcodes must be handled in proper order’ so what I am suggesting is that instead of one queue

ACE_Based::LockedQueue<WorldPacket*, ACE_Thread_Mutex> _recvQueue;We make 3 queues

[CODE]ACE_Based::LockedQueue<WorldPacket*, ACE_Thread_Mutex> m_queuedPackets; // packets that must obey time of arrival

ACE_Based::LockedQueue<WorldPacket*, ACE_Thread_Mutex> m_threadSafeOpcodes; // thread safe opcodes that doesn’t care about time of arrival

ACE_Based::LockedQueue<WorldPacket*, ACE_Thread_Mutex> m_threadUnsafeOpcodes; // thread unsafe opcodes that doesn’t care about time of arrival[/CODE]
With proper handling this could be huge performance gain (read: less client lagg).

And this doesn’t even mean that they will not be handled in proper order (for example if chat messages are handled w/o caring about time of arrival they would still arrive in same order, that’s just how queues work)

Stuff that doesn’t care about time of arrival:

basiaclly all CMSG_*_QUERY

social handler

chat

questgiver stuff (except CMSG_QUESTGIVER_HELLO to prevent false range check fails)

mail

etc, etc…

This is just general idea- I know that opcodes are not sorted only by ‘thread safety’, but I’m willing to hear what others think before I dig deeper.

In my eyes, it seems like you just made one of the biggest performance “boosts” in a very long time, because it seems completely legit to me. Why would opcodes that does not “care” about time of arrival be queued in the same queue as those who need timing, just a waste of time

EDIT: How much coding would this really require? it cannot be that much really i suppose? Just add 2 new queues and change the “queue addition” in the places where it fits in right?

Over 6 hours:

Opcodes handled thread-safe (in map update) = 87678688 (90000000 for easier math)

Opcodes handled thread-unsafe (in world update) = 12232907 (12000000 for easier math)

Total handled opcodes: = 99911595 (100000000 for easier math)

6 hours = 6 * 3600 * 1000 = 21600000ms / 50ms cycle = 432000 update cycles

100000000 opcodes / 432000 updates = 231 opcode per world update

Average players: 500

231 opcode per world update / 500 players = ~0.5 opcodes per player on each world update.

This value seems really low, actually low enough to not cause any ‘unlucky queues’, but this is low server load.

On higher load (2k players) we can assume let’s say 600000000 opcodes (4x more ppl = 6x more actions, as they got more stuff to do) and 324000 update cycles (avg diff increases to 75ms).

The number would be 600000000/324000/2000= ~1 opcode per player on each world update.

In real applications average diff can increase even higher (~120ms or so)

While now this seems less important I will still write this patch, as it’s quite interesting. And that’s why we are here, right? We are amazed and consumed by the programming process itself.

Looking forward to the patch.

Instead of adding more lockqueues, I went for this approach: http://pastebin.com/YA1jzTWZ

  • only locking packets that are not threadsafe or not able to be processed instantly.

Just a quick thought explained in a patch instead of words /emoticons/default_smile.png

@Kaelima, [SIZE=14px]PROCESS_THREADSAFE packets are safe to be processed in map or in single thread context only, but not in WorldSocked context (while looking at yours [/SIZE][SIZE=12px][FONT=monospace]WorldSocket::ProcessIncoming changes)[/FONT][/SIZE]

Had this on my todo list since march 2011… any update? /emoticons/default_smile.png

just started working on v0.01a /emoticons/default_smile.png

http://paste2.org/p/1892917 http://paste2.org/p/1892918

@Kerhong: what’s the use of defining opcode flags in enum and then using the hard coded values in your code? Other than that it looks promising. Would be nice if someone could test the initial implementation.

I have a small server, 200pls simultaneous… I can test this and every change, no matter how little is, on my server, if you want to.

maybe instead adding new field to whole opcodes, use existing one extending its functionallity:

enum PacketProcessing

{

PROCESS_INPLACE = 0x00, // process packet whenever we receive it - mostly for non-handled or non-implemented packets

PROCESS_THREADUNSAFE = 0x01, // packet is not thread-safe - process it in World::UpdateSessions()

PROCESS_THREADSAFE = 0x02, // packet is thread-safe - process it in Map::Update()

PROCESS_ASYNC = 0x04,

PROCESS_SYNC = 0x08,

PROCESS_INPLACE_ASYNC = PROCESS_INPLACE | PROCESS_ASYNC,

PROCESS_THREADUNSAFE_ASYNC = PROCESS_THREADUNSAFE | PROCESS_ASYNC,

PROCESS_THREADSAFE_ASYNC = PROCESS_THREADSAFE | PROCESS_ASYNC,

PROCESS_INPLACE_SYNC = PROCESS_INPLACE | PROCESS_SYNC,

PROCESS_THREADUNSAFE_SYNC = PROCESS_THREADUNSAFE | PROCESS_SYNC,

PROCESS_THREADSAFE_SYNC = PROCESS_THREADSAFE | PROCESS_SYNC,

};

what is the point of combining the bitmasks in the enum instead of in the code when processing the packets? seems like that defeats using bitmasks to begin with.

Probably just coding style. If you look around the MSDN, Microsoft does this a lot with their constants which can ultimately be unwrapped into an enum.

For example, their registry key access options:

KEY_WRITE = ((STANDARD_RIGHTS_WRITE || KEY_SET_VALUE || KEY_CREATE_SUB_KEY) && (!SYNCHRONIZE))

interesting, any news?

Kerhong, do you still have the patches? I am interested in start testing on this.

During further math and tests I figured out that splitting packets into more queues is almost worthless, as amount of incoming packets from clients is quite low (also my server hasn’t been above 60ms diff for half year already, so packets queuing up isn’t really a problem).

The proper approach would be having another thread that runs independantly of main world update loop and handles CMSG_*_QUERY, another loop that handles all chat except commands ((msg[0] == ‘.’ || msg[0] == ‘!’) && msg[0] != msg[1]) to further reduce work for main game loop, yet I haven’t yet started planning or testing this approach.

This also isn’t biggest performance dump in TC, im fooling around with some memory and cpu profiling to figure out which parts require most attention. Will eventually post results of this search.

You know what else would lessen the amount of processing done in the main game loop? separate servers for things like chat, auctionhouse, maps, etc.

I agree with Paradox. It would be good to start off by separating a chat server, an auction house one, and maybe a DB one to handle the *_QUERY opcodes. Much like Repositories in the HERO engine /emoticons/default_wink.png I think it would be good if there is something like a packet dispatcher, so it can route the incoming packets. I’m not sure if the client opens other connection than the one on port 8085(the default one). If not, the role of the packet dispatcher becomes unnecessary.

Code it.