The vast majority of the fancy autonomous flying we've seen from quadrotors has relied on some kind of external localization for position information. Usually it's a motion capture system, sometimes it's GPS, but either way, there's a little bit of cheating involved.
Each little quadrotor is equipped with a Qualcomm Snapdragon Flight development board. The board includes an onboard quad-core computer, a downward facing VGA camera with 160◦ field of view, a VGA stereo camera pair, and a 4K video camera. For these flights, though, the drones are only using one or two cores of processing power (running ROS), a simple onboard IMU, and a downward-looking VGA camera with a 160 degree field of view.
Each quadrotor's job is to use visual inertial odometry (VIO) to estimate how far and in what direction it's moved from its starting position, which gives a good approximation of its relative location. To do this, it simply identifies and tracks visual features in its camera field of view: if the drone's camera sees an object, and that object moves right to left across the frame, the drone can infer (with some help from its IMU) that it's moving left to right. Either that, or there's an earthquake going on. Dead reckoning approaches like these do result in some amount of drift, where small errors in position estimation build up over time, but UPenn has managed to keep things under control, with overall positional errors of just over half a meter even after the drones have flown over 100 meters.
Source: IEEE Spectrum