Unjoinable servers break game

There are no stupid questions, just stupid answers.
5 posts Page 1 of 1
Duion
Posts: 1370
Joined: Sun Feb 08, 2015 1:51 am
 
by Duion » Wed Feb 20, 2019 11:36 am
I have a bug where sometimes there are unjoinable servers and after you tried to join such a server you cannot join any server anymore, so basically the game is frozen after that and you can only press menu buttons and only restarting the game changes that. The unjoinable server however stays that way and if you try to join it the problem occurs again.

What I think is happening is, that sometimes after someone or you joined a server, the slot is not properly cleared, so the server thinks you are still on it and of course does not let you join again, but the game itself also thinks a serverconnection is running and does not let you join another game. The server in the list then is listed with +1 human player, that does not get cleared and stays broken and when someone tries to join it it adds +1 more human player, but for the person joining the game breaks, so he has to quit the game and try again, which may repeat the cycle.

What makes it so hard to test is, that you need a server on the internet and a client to join the server and a lot of patience to wait till something breaks and the bug occurs. I cannot open multiple instances of the game myself to test it, because it seems to be locked by the IP, meaning you need real different people to test it with real different IPs.

My current solution is to just restart the servers every X hours, so the bug never occurs, since the longer a server runs and the more players are on it the sooner it occurs. Bots work as well to cause the bug, which I now realized, since I have more bots on the servers by default and the bug happens much sooner. The bots simulate a real player connection, so they appear on the server list as players etc.

So what I think is happening is that at some point server connections are not getting cleared on the dedicated servers and either the server becomes unjoinable after you tried to join it, or it completely gets removed from the list by the master server since it reports some unlegit number of players and also becomes unjoinable.

Well this all sounds very complicated, but this is a potential game breaking bug and I don't really know where to start, so maybe someone here has an idea, I also provide a short console.log of what happens when I try to join such a bugged server and join it again:

==>trace(1);
Console trace enabled.
Leaving ConsoleEntry::eval() - return
Entering ToggleConsole(1)
Entering [CanvasCursorPackage]GuiCanvas::popDialog(Canvas, ConsoleDlg)
Entering ConsoleDlg::onSleep()
Leaving ConsoleDlg::onSleep() - return
Entering [CanvasCursorPackage]GuiCanvas::checkCursor(Canvas)
Entering showCursor()
Leaving showCursor() - return
Leaving [CanvasCursorPackage]GuiCanvas::checkCursor() - return
Leaving [CanvasCursorPackage]GuiCanvas::popDialog() - return
Leaving ToggleConsole() - return
Entering ToggleConsole(0)
Leaving ToggleConsole() - return
Entering JoinServerDlg::join(JoinServerDlg)
Server query canceled.
Adding a pending connection
Sending Connect challenge Request
Leaving JoinServerDlg::join() - return
Got Connect challenge Response
Sending Connect Request
Connection established 25566
Entering GameConnection::onConnectionAccepted(25566)
Leaving GameConnection::onConnectionAccepted() - return
Entering GameConnection::setLagIcon(25566, 1)
Leaving GameConnection::setLagIcon() - return
Mapping string: ServerMessage to index: 0
Mapping string: MsgClientJoin to index: 1
Mapping string: %1 has joined the server. to index: 2
Entering clientCmdServerMessage(23 MsgClientJoin, 70 has joined the server., , 1.00527e+06, , 0, , , , 0)
Entering defaultMessageCallback(23 MsgClientJoin, 70 has joined the server., , 1.00527e+06, , 0, , , , 0, , )
Entering onServerMessage( has joined the server.)
Entering playMessageSound( has joined the server.)
Leaving playMessageSound() - return -1
Entering ChatHud::addLine(ChatHud, has joined the server.)
Leaving ChatHud::addLine() - return
Leaving onServerMessage() - return
Leaving defaultMessageCallback() - return
Entering handleClientJoin(23 MsgClientJoin, 70 has joined the server., , 1.00527e+06, , 0, , , , 0, , )
Entering PlayerListGui::updatePlayerInfo(PlayerListGui, 25568)
Leaving PlayerListGui::updatePlayerInfo() - return
Leaving handleClientJoin() - return
Leaving clientCmdServerMessage() - return
Entering GameConnection::setLagIcon(25566, 0)
Leaving GameConnection::setLagIcon() - return
Entering ToggleConsole(1)
Entering [CanvasCursorPackage]GuiCanvas::pushDialog(Canvas, ConsoleDlg, 99)
Entering ConsoleDlg::onWake()
Leaving ConsoleDlg::onWake() - return
Entering [CanvasCursorPackage]GuiCanvas::checkCursor(Canvas)
Entering showCursor()
Leaving showCursor() - return
Leaving [CanvasCursorPackage]GuiCanvas::checkCursor() - return
Leaving [CanvasCursorPackage]GuiCanvas::pushDialog() - return
Entering updateConsoleErrorWindow()
Leaving updateConsoleErrorWindow() - return
Leaving ToggleConsole() - return
Entering ToggleConsole(0)
Leaving ToggleConsole() - return
Entering ToggleConsole(1)
Entering [CanvasCursorPackage]GuiCanvas::popDialog(Canvas, ConsoleDlg)
Entering ConsoleDlg::onSleep()
Leaving ConsoleDlg::onSleep() - return
Entering [CanvasCursorPackage]GuiCanvas::checkCursor(Canvas)
Entering showCursor()
Leaving showCursor() - return
Leaving [CanvasCursorPackage]GuiCanvas::checkCursor() - return
Leaving [CanvasCursorPackage]GuiCanvas::popDialog() - return
Leaving ToggleConsole() - return
Entering ToggleConsole(0)
Leaving ToggleConsole() - return
Entering JoinServerDlg::join(JoinServerDlg)
scripts/gui/joinServerDlg.cs (151): Cannot re-declare object [ServerConnection].
scripts/gui/joinServerDlg.cs (152): Unable to find object: '0' attempting to call function 'setConnectArgs'
scripts/gui/joinServerDlg.cs (153): Unable to find object: '0' attempting to call function 'setJoinPassword'
scripts/gui/joinServerDlg.cs (154): Unable to find object: '0' attempting to call function 'connect'
Leaving JoinServerDlg::join() - return
Entering toggleJoinServerDlg()
Entering [CanvasCursorPackage]GuiCanvas::popDialog(Canvas, JoinServerDlg)
Entering [CanvasCursorPackage]GuiCanvas::checkCursor(Canvas)
Entering showCursor()
Leaving showCursor() - return
Leaving [CanvasCursorPackage]GuiCanvas::checkCursor() - return
Leaving [CanvasCursorPackage]GuiCanvas::popDialog() - return
Entering JoinServerDlg::query(JoinServerDlg)
Entering onServerQueryStatus(start, Querying master server, 0)
Leaving onServerQueryStatus() - return
No master servers found in this region, trying IP:88.198.65.149:28002.
Requesting the server list from master server IP:88.198.65.149:28002 (2 tries left)...
Leaving JoinServerDlg::query() - return
Leaving toggleJoinServerDlg() - return
Received server list packet 1 of 1 from the master server (4 servers).
Pinging Server IP:88.198.65.149:28000 (3)...
Pinging Server IP:88.198.65.149:28003 (3)...
Pinging Server IP:88.198.65.149:28004 (3)...
Pinging Server IP:88.198.65.149:28005 (3)...
Online Bloodknight
Posts: 226
Joined: Tue Feb 03, 2015 8:58 pm
by Bloodknight » Wed Feb 20, 2019 4:25 pm
Pretty sure this was brought up a long while back, IIRC correctly the bug somehow was with the player count of a server not being correctly decremented in the master server, not sure if it was determined why, or why it even mattered since I thought it was only a display number and that in the end the server should accept/reject the connection based on it being full

So while the client should handle this gracefully (are we sure part of this issue is not the 60-second timeout when things go slightly squiffy?) its the root cause of the problem that needs to be addressed.

Code: Select all

Entering clientCmdServerMessage(23 MsgClientJoin, 70 has joined the server., , 1.00527e+06, , 0, , , , 0) Entering defaultMessageCallback(23 MsgClientJoin, 70 has joined the server., , 1.00527e+06, , 0, , , , 0, , )
this looks weird to me, i never like it when numbes get displayed 'incorrectly'
Duion
Posts: 1370
Joined: Sun Feb 08, 2015 1:51 am
 
by Duion » Wed Feb 20, 2019 6:58 pm
Yes, I discussed that player count issue before, but that "only" caused servers to disappear from the server list that the master server generated, now there also exist servers that are still listed by the master servers, but are not joinable and that break the game once you tried to join them, since the serverConnection object is not deleted and the game still thinks a game is running and does not let you join another, forcing you to exit the game to reset it.

Regarding that message, yes that looks weird, but I don't really understand what is going on there, only that it is a message function with lots of arguments and one of them is a weird number that looks like a floating point imprecision. The numbers are IDs that are assigned to certain things, network optimizatoin or so, one number is the client ID, in this case "70". Behind that are a lot of arguments that are mostly empty and some have 0 assigned to them. I don't really know where to look to debug that or to find out what is being send there.

The bug is also hard to replicate since it needs a server that runs for a long time and needs real human players from time to time, bots seem to speed up the time the bug happens, the more bots the faster, but ultimatively real human players are needed to cause it.

If someone wants to test the bug for himself, I could offer to host a server and let it go bad, meaning not restart it for a long time until it breaks, but the problem there is, at some point it also may disappear from the list. At first I could prevent the bug from happening by restarting the servers every day or twice a day, later I went down to 8 hours and now I had to go down to even 4 hours to prevent the bug from happening in most cases, but there is still a slight chance it will happen.
Online Bloodknight
Posts: 226
Joined: Tue Feb 03, 2015 8:58 pm
by Bloodknight » Wed Feb 20, 2019 8:50 pm
This is one of the reasons why I'm planning on having a bunch of analytical data in my game, should be able to match all the numbers up, ideally something to graph out the numbers and then see what is causing the issue. I mean theres a plethora of reason why running analytics is fun, but finding problems is a good side effect .
Duion
Posts: 1370
Joined: Sun Feb 08, 2015 1:51 am
 
by Duion » Wed Feb 20, 2019 9:20 pm
Who cares about the details I need a solution that works.

I think I need some kind of function that checks if the connection is valid, on the server and on the client and in case the connection process gets interrupted, bugged, timed out or whatever the connection is reset and the slot is cleared on the server.
I know it is not just the master server since you can join servers directly through IP and this also fails.
The join and exit processes are pretty bulletproof, if you test them short term, but if you let the server run for a day and then come back after a few people played on it a few times the server somehow gets broken.
I already reset the servers after the last human left, but something does not seem to get cleared in that process sometimes.
5 posts Page 1 of 1

Who is online

Users browsing this forum: No registered users and 5 guests