This one and its duplicate are mine.
http://ministry.kronosaur.com/record.hexm?id=96353

1.9b4
Taipan Godship

God-jumped to Domitus to check out the new Sapiens and a few seconds later the game crashes.
Occurs with every reload of this sav.

EDIT: The two crashes 15 minutes later are also mine. Sav available.
2nd EDIT: Lot more crashes occurring. All show similar output. The most I have played is about 20 minutes before crashing.

zip
1.9b4 crash.zip
log
Debug.log
log
TestingDebug.log
log
HereticDebug.log
relanat 21 Jul 2021:

Added the debug log. Many crashes and restarting a game nearly always crashes again within seconds. This is a game breaker for me.

ferdinand 31 Jul 2021:

Experiencing the same crashes with similar debug.log output.

relanat 27 Aug 2021:

Just tried 1.9b4 on a Windows 7 machine. It ran for half an hour with no crashes. Might be a Wine problem.
FWIW the crashes occur when running Ubuntu 18.04.4 LTS and Wine 5.0.2.

george moromisato 28 Aug 2021:

What's strange to me is that the stack trace in Debug.log looks like a logic error. I.e., we're not calling the OS inside OnObjDestroyed (I don't think) so I don't know how it could be a Wine problem.

One possibility is that there is a memory corruption or a buffer overwrite bug in the code and it manifests differently on Windows (the corruption is in a different place, so it doesn't crash).

relanat 28 Aug 2021:

Did a bit more testing.
Downloaded a fresh version of 1.9b4 on the Windows machine. Loaded the crashing sav file attached above. The game did not crash; I left it running on autopilot until the ship ran out of fuel many minutes later.
On the Linux machine:
Tried the same sav above and it crashed within seconds.
The sav also crashed when played in the fresh download of 1.9b4.
Got a crash in EP (the EP version was copied from an existing game).
Also got a crash in the Director's Cut adventure (ditto).
Got a crash in a fresh download of VOTG. It took longer but I suspect this is because there isn't quite as much NPC destruction in VOTG (yet) as there is in SOTP based adventures.

On thing I have noticed is the crashes always seem to involve NPC ships and/or stations, but not ones targeted or destroyed by the player. The crash log always seems to point to stations or ships elsewhere in the system.
Maybe non-player code somewhere?
Is there some sort of distance measurement involved in NPC object destruction code? Proximity to the playership?
Or do non-player objects get any data recorded when they destroy another ship?

The crashes also do not occur in the intro screen. Many ships are destroyed there in a short time.

relanat 15 Sep 2021:

Added this code to the Taipan mod and did some testing. 'TestingDebug.log' attached. I can't see anything that helps except the crashes are occurring before OnGlobalObjDestroyed runs.
Of note is this event appears to be running twice, there are two sets of output for every object destroyed. EDIT: This only happens with the Taipan.

<OnGlobalObjDestroyed>
	(block Nil
		(if (neq aDestroyReason 'enteredStargate)
			(block Nil
				(sysSetPOV (objGetPos aObjDestroyed))
				(printTo 'log (cat "Name: " (objGetName aObjDestroyed)))
				(printTo 'log (cat "Reason: " aDestroyReason))
				(printTo 'log (cat "gSource: " (objGetName gSource)))
				(printTo 'log (cat "aDestroyer: " (objGetName aDestroyer)))
			)
		)
	)
</OnGlobalObjDestroyed>
relanat 15 Sep 2021:

I'm starting to think that ships with 'guard' orders are triggering the crash.
Nothing conclusive and I don't understand the debug output but in the second section 'objName' is always a ship class that guards 'm_Objs[0]'.
Enuming the ship orders on all friendly ships in a system to be 'gateOnThreat' greatly increases the time before a crash occurs.
I did this in EtaCeti and the crash output showed Centurion heavy gunship and Rogue Fleet Cache, a 'guard' type and the enemy Centurion orders could still have been 'guard'.

Watching the POV changes backs this up. Although the crash object never gets shown, if multiple enemies are attacking a station the game can crash and show that station and the attacking ship type in the output. The assumption is that several enemy ships are destroyed before a 'guard' ordered ship destroys one.

But I don't think every ship destroyed by a 'guard' ship triggers the crash, not sure. Will test more.

george moromisato 16 Sep 2021:

I'm trying to get as much info as I can from the Debug.log. I still haven't found the bug, but I'm writing all this here so I remember.

Let's take the last crash in the log:

2021-07-21 11:34:50	Crash in CSpaceObject::OnObjDestroyed
2021-07-21 11:34:50	Crash in CSystem::RemoveObject
2021-07-21 11:34:50	Crash in CSpaceObject::Destroy
2021-07-21 11:34:50	Crash in CShip::OnDamage
2021-07-21 11:34:50	pObj = 
2021-07-21 11:34:50	obj destroyed
obj class: CShip
obj name: Hornet-class battlepod
obj pointer: 2cc44bc8
m_pDocked: none
m_pExitGate: none
IOrderModule
Order: 3
m_Objs[0]: 16a4f7d8 Centurion-class heavy gunship (CShip)
m_Objs[1]: none
CAttackOrder
m_State: 0
m_iCountdown: -1
m_pNavPath: none

2021-07-21 11:34:50	Crash in CSpaceObject::Damage
2021-07-21 11:34:50	Crash in CMissile::OnUpdate
2021-07-21 11:34:50	Crash in CSystem::Update
2021-07-21 11:34:50	Crash in CUniverse::UpdateTick
2021-07-21 11:34:50	CException: Win32 exception.
2021-07-21 11:34:50	Crash in CGameSession::OnAnimate
2021-07-21 11:34:50	Unable to continue due to program error.

program state: OnAnimate
program state: in OnObjDestroyed
save file: RELANAT-8044.sav
system: TauCeti
system type: 00810123
system name: Tau Ceti
total objs: 1818
destroyed: 0
deleted: 0
bad: 1
stars: 1
obj class: CShip
obj name: Centurion-class heavy gunship
obj pointer: 16a4f7d8
m_pDocked: none
m_pExitGate: none
IOrderModule
Order: 1
m_Objs[0]: 16a4ca20 Enterprise Armor Dealer (CStation)
m_pNavPath: none
game state: in game


Please contact [email protected] with a copy of Debug.log and your save file. We are sorry for the inconvenience.

The "Crash in ..." lines show a stack trace, from the lowest level up to the highest. That is, the first "Crash in" line in the log shows the lowest level on the stack. Subsequent lines show higher levels. In this case, we crashed inside CSpaceObject::OnObjDestroyed. The next line shows that OnObjDestroyed was called by CSystem::RemoveObject.

A little further up the stack we see CShip::OnDamage which is followed by some lines describing an object: a Hornet-class battlepod.

This indicates that a Hornet was damaged (we called CShip::OnDamage) and inside that code we decided that the Hornet was destroyed. Then we called CSpaceObject::Destroy on the hornet and eventually called CSystem::RemoveObject.

Inside of CSystem::RemoveObject, we call all the objects in the system to tell them that an object was destroyed. We need to do this in case they are holding pointers to a now dead object. For each object in the system, we call CSpaceObject::OnObjDestroyed. Thus we were inside RemoveObject, looping over all objects in the system, and calling each one at OnObjDestroyed to tell them that the Hornet was destroyed.

One of the objects that we called, crashed handling that notification. But which one?

At the end of the log we see the description of a Centurion-class. This is the object that actually caused the crash. With the info for the Hornet and the Centurion, we can reconstruct some of what happened:

  1. A Hornet was damaged and ultimately destroyed. By what? Notice that the Hornet has a pointer in m_Objs[0] which happens to point at the Centurion. This means the Hornet was targeting the Centurion. Maybe the Centurion was attacking the Hornet?
  2. Next we called all the objects in the system to tell them that the Hornet was destroyed. When the Centurion got called we crashed.
  3. The Centurion has Order = 1, which is the guard order. Looking through Debug.log, all the Orders are either 1 (guard) or 8 (patrol). I think @relanat is correct that this is not a coincidence. In 1.9 Beta 4 I did a lot of work on the guard and patrol orders, so the bug is probably there. [It's probably going to be an embarrassing bug too!]
  4. We also see that the Centurion has m_Objs[0] pointing to an Armor Dealer; that's probably the station it's guarding.

Unfortunately, this is not enough to figure out the crash. But looking at the code, I think it might be related to the deter module, which is used by both the guard and patrol orders. I'm going to take a look and report back.

relanat 24 Sep 2021:

Did some testing in Heretic to reduce the number of objects being destroyed. It made it easier to follow the action.
First crash: http://ministry.kronosaur.com/record.hexm?id=97165
And the three after it.
Full Debug.log attached as HereticDebug.log

The crashes happened during the sentinel attack missions. When the crashes occurred the sentinel was flying in a straight line and wasn't firing much, only when a ship was directly in front of it or about to be in front of it.

First one was after the Taikon outpost had been destroyed.
The Sentinel was flying away and being attacked by an Eir cruiser which had been docked at the outpost. The outpost also had a Centurion docked but I didn't see where it was when the game crashed. I believe the Britannia mentioned in the output was a reinforcement but I do not know its 'origin' station. There are no Brits present when the player enters the system.

Second one was after the sentinel had destroyed the Ares outpost. It was flying away and being attacked by the Sandstorms which had been docked at the outpost. It flew past the Ringer station and when the game crashed the Ringer station was firing on the sentinel.

Third crash:
Sentinel had destroyed the Nares-Strong outpost.
Was flying away from the outpost being attacked by Centurions which had been docked at the outpost.
If flew past the Taikon outpost which caused the Centurion and Eir docked there to start attacking it.
At the time of the crash the Taipan was closest to the sentinel but had not fired on it. It is possible that 'm_pTarget' is just the closest ship to the sentinel at the time of the crash and has no bearing on the crash.
Additionally, at the time of the crash only one Centurion was in the area. I don't know if it was the one from the Taikon outpost or one of the Nares-Strong ships. I assume the others had retreated to recharge their shields.
The game crashed when a missile salvo from the Eir hit the sentinel. The Centurion was nearby but not actively attacking at that moment, it was circling for another attack.

The fourth crash is a repeat of the second. Ares outpost destroyed, sentinel flies past the Ringer outpost and is destroyed by Ringer fire. In this crash there was only one Sandstorm left. So the only relevant objects were the Taipan playership (not firing), the Sandstorm from the Ares outpost and the Ringer station.
It may be possible that 'guard' ships at the Ringer station (they never undocked) had some effect but this doesn't concur with the Eir destroying the sentinel in the third crash. It has no ships assigned to it.

A common thread seem to be there is a 'guard' ship nearby or in the process of attacking when the sentinel is destroyed by another object (in the third crash the Eir and in the second crash possibly the Ringer outpost and in the fourth the Ringer outpost). This 'guard' ship can be other than 'm_pTarget'.

Wild guess time: the 'guard' ship is crashing the game because the sentinel was destroyed by another object.
Or (less likely IMO) maybe simultaneous weapons fire onto the sentinel from the 'guard' ship and the other object is glitching the code.

Also many thanks for the info in your post above. It it a great explanation.

george moromisato 15 Oct 2021:

I stepped through all the destroy code and couldn't find any problems.

But I added debug code everywhere so hopefully the next release will give us more clues. I'll continue adding debug code until we can resolve this.

relanat 16 Oct 2021:

From 1.9RC1.
Wolfen in Eridani. I just let it autopilot until it crashed.

2021-10-16 19:32:51	--------------------------------------------------------------------------------
2021-10-16 19:32:51	Start logging session
2021-10-16 19:32:51	Direct3D
2021-10-16 19:32:51	Screen: 1920 x 1080
2021-10-16 19:32:51	Scale: 1.00
2021-10-16 19:32:51	Background blt: Enabled
2021-10-16 19:32:51	Use Direct X: Enabled
2021-10-16 19:32:52	Use GPU compositing: Enabled
2021-10-16 19:32:52	LogPixelsY: 96
2021-10-16 19:32:52	DeviceType: 0x1
2021-10-16 19:32:52	Caps: 0x00000000
2021-10-16 19:32:52	Caps2: 0x60020000
2021-10-16 19:32:52	Caps3: 0x00000320
2021-10-16 19:32:52	DevCaps: 0x0019aff0
2021-10-16 19:32:52	DevCaps2: 0x00000051
2021-10-16 19:32:52	[DX] Buffer Format: 22
2021-10-16 19:32:52	Transcendence 1.9 RC 1
2021-10-16 19:32:52	Graphics quality maximum: 41 ms; 8 cores.
2021-10-16 19:32:52	[33] MCI ERROR ProcessPlay MCIWndOpen [TranscendenceIntro.mp3]: Invalid MCI device ID. Use the ID returned when opening the MCI device.
2021-10-16 19:32:55	Loading news.
2021-10-16 19:32:55	gethostbyname failed: 11002.
2021-10-16 19:32:55	Unable to resolve address for multiverse.kronosaur.com.
2021-10-16 19:32:55	gethostbyname failed: 11002.
2021-10-16 19:32:55	Unable to verify connection to the Internet.
2021-10-16 19:32:55	Error from Multiverse: You are not connected to the Internet.
2021-10-16 19:32:55	All resources downloaded.
2021-10-16 19:33:00	Initializing adventure: Transcendence.tdb
2021-10-16 19:33:06	Created new game
2021-10-16 19:33:06	Starting new game.
2021-10-16 19:33:13	Started game version: 1.9 RC 1 (1.9.300.0)
2021-10-16 19:33:13	Using API version: 53
2021-10-16 19:35:12	Crash in IOrderModule::ObjDestroyed
2021-10-16 19:35:12	Crash in CBaseShipAI::OnObjDestroyed
2021-10-16 19:35:12	Crash in CShip::ObjectDestroyedHook
2021-10-16 19:35:12	Crash in CSpaceObject::OnObjDestroyed
2021-10-16 19:35:12	Crash in CSystem::RemoveObject
2021-10-16 19:35:12	Crash in CSpaceObject::Destroy
2021-10-16 19:35:12	Crash in CShip::OnDamage
2021-10-16 19:35:12	pObj = 
2021-10-16 19:35:12	obj destroyed
obj class: CShip
obj name: Borer-class gunship
obj pointer: 1c7e4c08
m_pDocked: none
m_pExitGate: none
IOrderModule
Order: 2
m_Objs[0]: ccaaaa0 Greenwich Industries (CStation)
m_Objs[1]: none
m_pNavPath: invalid objects: -1, -1

2021-10-16 19:35:12	Crash in CSpaceObject::Damage
2021-10-16 19:35:12	Crash in CMissile::OnUpdate
2021-10-16 19:35:12	Crash in CSystem::Update
2021-10-16 19:35:12	Crash in CUniverse::UpdateTick
2021-10-16 19:35:12	CException: Win32 exception.
2021-10-16 19:35:12	Crash in CGameSession::OnAnimate
2021-10-16 19:35:12	Unable to continue due to program error.

program state: OnAnimate
program state: in OnObjDestroyed
save file: a-7228.sav
system: SE
system type: 00100001
system name: Eridani
total objs: 974
destroyed: 0
deleted: 0
bad: 1
stars: 1
obj class: CShip
obj name: Centauri raider
obj pointer: b9fbed8
m_pDocked: none
m_pExitGate: none
IOrderModule
Order: 1
m_Objs[0]: cc3e3f0 Centauri warlord camp (CStation)
m_pNavPath: none
game state: in game


Please contact [email protected] with a copy of Debug.log and your save file. We are sorry for the inconvenience.

2021-10-16 19:35:12	gethostbyname failed: 11002.
2021-10-16 19:35:12	Unable to resolve address for multiverse.kronosaur.com.
2021-10-16 19:35:12	gethostbyname failed: 11002.
2021-10-16 19:35:12	Unable to verify connection to the Internet.
2021-10-16 19:35:18	End logging session
george moromisato 16 Oct 2021:

Thank you! I believe I figured out the bug. I've uploaded a new build with the fix.

Please download the latest: https://downloads.kronosaur.com/TranscendenceNext.zip. Version is 1.9 RC 1a.

Let me know if you still experience crashes.

relanat 20 Oct 2021:

Looking good. I ran the Wolfen in Eridani for 8 minutes real-time and no crash! :D
Will test for longer over the next couple of days.
Thanks.

relanat 18 Nov 2021:

All good. Can resolve. I played the Taipan in SOTP for over an hour and no crashes.

george moromisato 18 Nov 2021:

Great to hear! Thank you so much for testing this. I couldn't reproduce it, and would never have found it if not for you.

The bug was relatively subtle. Here's the fix: https://github.com/kronosaur/TranscendenceDev/commit/74de8b483e8f4c997db52ba1f3ad52dd1595a7f8#diff-f3dd97fd27620c7f9a1936b934f3b2d0560428551e1ea459c0f0a1452fc80f87

The AI is programmed as a set of "order modules". For example, the CNavigateOrder module knows how to maneuver to dock with a station, while the CAttackOrder module knows how to attack. Each order module is a C++ object instance that we allocate when the order starts executing and we delete (free) when the order is done.

The deter order module is used to chase an enemy ship when a defender is guarding a station. When an enemy ship is destroyed, we tell the order module so that it can stop the order.

The deter order module had this bit of (pseudo)code:

OnObjDestroyed (obj)
   IF obj == ship we're chasing THEN
      delete current order module.

The problem was that an order module usually deals with multiple objects. For example, the deter order module needs to track both the enemy ship that it's chasing and the station that it is guarding. It needs to do something if either of them get destroyed.

So actually the above code is being called in a loop:

FOR EACH obj that we care about DO
   OnObjDestroyed(obj)
   ...

The bug happened when we deleted the order module inside OnObjDestroyed and then came back to the loop. The loop continued and tried to call a destroyed order module! On Windows, this didn't crash because the old order module was still in memory—for whatever reason (probably performance), Windows doesn't clear the old memory, so the code seemed to work. On Linux, though, the memory had already been freed, so trying to access it caused a crash.

When I wrote the loop code, I had already foreseen this situation, and I allowed the call to OnObjDestroyed to return a flag if it wants to destroy the module. The actual code looks something like this:

FOR EACH obj that we care about DO
   OnObjDestroyed(obj, &destroyModuleFlag)
   IF destroyModuleFlag is set THEN
      delete current order module
      break out of loop

But unfortunately, when I wrote the CDeterOrderModule::OnObjDestroyed code, I forgot all about that flag and just destroyed the module inside.

So the fix was simple: instead of destroying the module, just set the flag. But it took a long time for me to even figure out what was going on.

Thanks again for all your help!