The signal (as controlled by the FET) is reduced when the control (rectified audio) voltage goes up. That is the first 180 degree phase shift. Your example circuit just happens to have two RC time constant delays (C4 and C2).
C4 looks to be a larger delay than C2 and swamps out the possible oscillation had both time constants been close to each other. (And, also, looks like it will respond faster to the audio peaks and delay slower due to the diodes).
Unlike temperature, where one degree error could start a thermostat war between the members of your family or office coworkers, it was pointed out (in another post) that peoples response to sound level is logarithmic. So the gain of the control loop can be relaxed, and thus less components like additional amplifiers or comparators in the control part of the loop.
This looser control of the volume level is actually an advantage; as it lets the listener get a sense of the difference between a strong or week signal without as much danger of their eardrums being blown out by the strong one.
The looser control (the lower loop gain) also makes the circuit less hair triggered to start oscillating right at the 360 degree total phase shift point. (Should the phase shift actually start approaching that point.)
If the delays of the control part of the loop (C4, C2) did start to go past 180 degrees, the AGC could oscillate. Causing a throbbing in the intensity of the audio as it raises and lowers the level of the audio, but is slightly behind in its reaction time to the audio level change.
Also, with SSB, the control does not have a constant carrier as a reference for the signal strength. So it has to work around this by looking at the audio level or the peak RF levels in the IF chain. Then it has to apply additional things like fast attack, slow delay in the level adjustment to make up for the signals arriving in bursts as the transmitting person talks and pauses.