ias-people

The communicating Dwarfs

(Note: The work described on this page was carried out at the University of Reading in the Department of Cybernetics)

Four of the communicating Dwarf robots

The Robots

A group of five autonomous mobile robots were constructed which are equipped with an infrared communication and localisation system as well as ultrasonic sonar for detecting obstacles. The obstacle detection system consists of three sets of ultrasonic sonar transducers; one set looking forward, one set looking to the front-left and the other set looking to the front-right. Provision for a fourth set of transducers is included, but these were never required. Ultrasonic sonar systems operate by transmitting a short pulse of ultrasonic sound, and timing how long it takes to bounce off an object back to the receiver. The time taken for the signal to return gives a measure of the range to that object, since the speed of sound is approximately constant in air. This sonar system returns only the range to the nearest obstacle from each set of transducers. To allow the robots to detect close obstacles, echoes have to be detected whilst the ultrasonic pulse is being transmitted, thus requiring a high threshold. For the detection of objects further away, a much lower threshold is required to allow for the large signal loss (as stated by the 1/(range)⁴ law). The detection of both near and distant obstacles is accomplished by using a varying threshold system. This threshold is initially large, but decreases with time to a pre-set minimum. A time-out system is used to determine if no objects are within range. Each set of sonar transducers is scanned ten times per second, and has a range of 30mm to 1m with a resolution of about 3mm.

An active infrared system is used for inter-robot communications. This is frequency division multiplexed, with each robot having its own channel. The carriers of these channels range from 220kHz to 400kHz, and the transmission medium is 950nm infrared light. In order to obtain more transmission power, and hence range, from inexpensive Light Emitting Diodes (LEDs) at these frequencies, a ring of twelve 60 degree LEDs are used. Each robot also has four photodiodes arranged 90 degrees apart (each with a half power angle of 120 degrees). This combination of LEDs and photodiodes allows transmissions to be received regardless of the angles of rotation of the robots. Information is transmitted by frequency modulating the carriers, with the decoding being carried out using an off-the-shelf radio frequency (RF) integrated circuit. Data is transceived at 1200 baud (120 bytes per second) using the RS232 standard. The range of this system is over 6 metres for localisation, and over 12 meters for data communication, with each of the LEDs being driven by 50mA. The transmitter, therefore, requires a maximum current of about 200mA, at 5V whilst the receiver requires less than 20mA, again at 5V. The communications system can scan ten channels through each of its four photo-sensors at a rate of ten times each second.

Physically the robots are small, having a width of 140mm, a length of 130mm and a height of 140mm. Each robot weighs less than 600 grams. Movement is provided by two small d.c. motors with in-line gearboxes each driving one of the rear wheels. The front of each robot is supported by a single castoring wheel. At present, motor control is provided by open-loop pulse width modulated controllers providing several different speeds (up to just over 1ms^-1) and directional control. Each robot is equipped with a single 8MHz Z80 CPU. The actions of the robot are controlled by the Z80 which is also used for sensor reading under interrupt. As a mater of fact, for both the flocking and group learning experiments no more than 50% of this processor power is required. Rechargeable NiMH 4.8V@1.8Ah batteries provide over five hours of life per charge.

Flocking

Flocking is a behaviour that is commonly found in nature. Animals sometimes flock for protection from wind or low temperatures but mainly for the protection that a flock offers from predators. When a predator attempts to attack a flock all the animals within the flock will flee for their lives, thus confusing the predator who now has to try to target one of many animals. Therefore the chance of any individual being caught and killed by a predator decreases with increasing flock size. In order to maintain a flock animals have to remain in close proximity to each other whilst changing direction and speed. The members of the flock also have to avoid collisions with each other and obstacles within their environment. Flocking behaviour can be found in birds, land animals (i.e. cows where the flock is known as a herd or sheep), and fish (where it is known as a school). Flocking presents an interesting problem in the field of robotics since it requires two opposing goals, the robots have to get close to each other but also have to avoid collisions. With many robots in close proximity, their sensors will receive more noise than when the robots are not so close together, for example false sonar echoes can be received from other robots, thus giving range errors. For these reasons flocking and the required sensor fusion formed a non-trivial task with which to test the robots.

Real time flocking was achieved reactively using a subsumption like architecture, but with a simple dynamic form of leadership. The four levels of control in this architecture are:

avoid objects (most basic behaviour with highest priority),

if no other robots are visible become a leader and wander,

if in a flock try to maintain position,

if a flock can be seen in the distance, speed up and head towards it, with more priory being given to following the closest visible leader.

Avoiding objects and wandering is achieved using information from the ultrasonic sonar, with the position and distance to other robots being determined from the infrared communication and localisation system. The localisation system is used to form an attractive force which brings the robots together, whilst the sonar is used to act as a repulsive force to prevent collisions. To help prevent head-on collisions only the rearward LEDs are switched on, thus the robots are only attracted towards the rear of each other. When a very close object is detected the avoidance behaviour forces the robots to slowly reverse, thus helping to avoid deadlocks and maintain the minimum separation distance between robots. The communication system is used by each robot to inform the other robots whether it is a leader or a follower.

The selection of the leader has to be dynamic, because like Reynolds's boids (and real flocks of animals) the flock should be able to split up to go around obstacles, and rejoin once past the obstacle. If a leader is pre-defined this is not possible. Also, since our robots operate in a finite bounded environment there would be problems when the flock meets a boundary of the environment. In this case the pre-defined leader would have to fight its way through the other robots. Thirdly, if the pre-defined leader should stop working (i.e. die or is killed) then the whole of the flock would also fail. Clearly such a pre-defined leader should not be utilised.

Under a system where any robot can become a leader and can relinquish leadership when required, one or more leaders can coexist. In this system the flock can split up into two smaller flocks to go around both sides of an obstacle and then rejoin once past the obstacle. If the leader should get trapped between other robots, then by definition it is now in the flock and therefore simply gives up leadership. One of the robots on the outside of the flock will take over the leadership and the rest will follow it. To ensure that this new leader does not simply turn around and rejoin the main body of the flock there is a short period of time for which it is not allowed to relinquish leadership to any robots that are followers. However, even during this period a new leader will still relinquish leadership to another leader in front of it.

Initially, if a leader could be seen the following robots would follow it and ignore any other robots in the flock, however the other robots would still be seen as obstacles to avoid collisions. This strategy produced a very poor flocking behaviour. In an open space the robots would follow each other in a straight line instead of flocking and would still tend to clump for short periods of time. By assigning a higher priority to following the leader, than the other following robots, the clumping problem is eliminated and with five or more robots true flocking patterns will emerge in open space. In order that this priority system is flexible to varying numbers of robots, the weighting given to following the leader is equal to the number of other visible robots, in front of any given robot.

From the point of view of any single robot, when no other robots are visible in front, it can become a leader. As leaders, robots wander around by moving forward in a straight line until an object is encountered, upon which they turn away from it. As stated above, to inhibit a leader from rejoining the flock immediately after leaving it, leaders do not start looking for other robots until after a few seconds of taking leadership. After this delay, when one or more other robots are visible ahead, leadership is relinquished. Robots that are followers head towards the greatest density of visible robots, with a higher priority being assigned to following the leader. This attracts robots towards each other, thus forming flocks. If as a follower, any robot sees a flock in the distance, it will catch up with them by increasing its speed. Whilst any robot is in a flock it tries to match the speed and direction of its nearest neighbours.

A flock forming sequence at one frame every two seconds

A typical flocking sequence at one frame every two seconds

Click here for an mpeg movie of the robots flocking in real time (2MB)

Shared experience learning

The ability to learn is an essential component of intelligent behaviour in human society. Individual humans do not need to learn everything by discovering it from scratch for themselves. Instead, they learn from their peers and teachers by exchanging the knowledge and information that they have acquired. Learning from peers does nor occur only in humans. It has also been found in some invertebrates, in many birds, aquatic mammals and of course primates.

When reinforcement learning is used to solve problems that are composed of many different stages, learning is like a search process, in which the agent searches the world for states that maximise reward and thus minimise punishment. The time this search takes depends strongly upon the size and structure of the state space and upon a priori knowledge encoded in the learning agent's initial parameters. When a priori knowledge is not available, the search through the state space is unbiased and can be excessively long. In nature, co-operative mechanisms help to reduce the size of the search area, and hence the search time, by providing the learner(s) with auxiliary sources of experience. Furthermore, such mechanisms can allow agents to successfully learn problems that would otherwise be too difficult. For these reasons we developed, tested and showed that sharing experiences between learning robots does indeed lead to faster and more robust learning.

Our learning algorithm uses a reinforcement technique based on sets of fuzzy automata. It is similar to one step Q-learning, except that the learning function operates directly on the "best" actions as opposed to the expected reward. The task of each robot is to learn to associate the "best" motor action for its current situation so that it moves around whilst avoiding obstacles. If the motor speeds are limited to full speed forward, full speed backwards and stop then with two motors there are nine different possible output actions. With three channels of sonar each giving an 8-bit range value there are 2²⁴ input states. Since we wanted to run many trials, preferably before retirement, the input space was mapped down to five input states that represent the different circumstances the robot can find itself in:

no object near robot,

obstacle in distance (over 500mm) to the right,

obstacle in distance (over 500mm) to the left,

obstacle relatively near (less than 230mm) the right,

obstacle relatively near (less than 230mm) the left.

It should be stated that this learning strategy does not attempt to build a map of the environment, instead each robot learns how to react to obstacles in the five different situations above. Each of these situations, that is position of the nearest obstacle, has a separate fuzzy automaton associated with it. Each automaton is a set of motor actions, and a set of probabilities for taking the associated action. When presented with a particular situation the action that a robot takes is defined by the set of probabilities associated with that situation's automaton. At the start of the learning algorithm these probabilities are all the same, thus for any particular situation initially every action has an equal chance of being selected.

The beginning of each learning cycle starts with each robot examining its sonar system to detect where the nearest obstacle is. This information is used to establish which situation the robot is in and hence which automaton to use. A weighted roulette wheel technique is then used to randomly select the most appropriate action for this situation, given its current probabilities. The action with the highest probability is the most likely to be chosen. This method is similar to the techniques used in genetic algorithms. Initially since the probabilities are equal the robots move around randomly. Over time "better" actions will have their probabilities of being selected increased and inappropriate actions will have their probabilities of being selected decreased. Given enough time, the robot should select the "optimal" actions for each situation.

The chosen action is executed for a short period of time. At the end of this execution time, the sonar system is read again, so that the chosen action can be evaluated. In order to evaluate the chosen action, a definition of which actions are good and which actions are bad is required. These were chosen to be general, so that they would not give the robot any information. Since the overall aim of the robots are to move forward whilst moving away from close objects the following three rules are used:

if there is no object within range, then it is good to go forward,

if an object is relatively near, then it is good to get further away from it,

if the object is in the distance, then it is still good to have forwards motion, but it is also good to get further away from the object.

If the action was successful, its probability of being selected is increased. If the action was unsuccessful, its probability of being selected is decreased. In both cases, the other probabilities of the chosen automaton are adjusted in order to keep the total of the probabilities constant. The whole learning cycle is now repeated, but this time the sonar data that was used to evaluate the robot's last action is used to decide which situation it is now in.

For full two and four way learning, each robot has to transmit the position of the object to which it reacted (i.e. case 1 to 5), the action that it chose (i.e. 0 to 8) and how good or bad that action was (i.e. the Alpha factor). To allow for the low rate of data transmission and the scanning of three other robots, six sets of this data are compressed into one packet of data along with a start of packet or synchronisation code, an incremental packet ID and a two byte check sum. The ID code allows for the detection of missed packets since if the ID of the current packet is not one greater than the last packet then one or more packets of data have been missed. On average this loss of packets is less than 2% for two way learning and 5% for four way learning. The main adaptive loop of the program is:

read sonar system,

choose probability set (automaton) according to obstacle position,

choose and perform an action an of this automation,

wait for a short period of time, whilst transmitting data to other robots,

read sonar system, whilst transmitting data to other robots,

evaluate the chosen action,

collect and Transmit information to other robot(s),

loop back to choose probability set,

(data from the other robots is handled under interrupt. The robot's probabilities are updated when a complete packet of data is received.)

Comparisons of the average learning rates for single robot, dual robot and four way shared experience learning (the units of time are approximately seconds)

For further information on the robots hardware, flocking both with and without leadership, and shared experience learning and its results see my publications.

Back to my home page.