Security Applications: Flying Inspectors

Security Applications: Flying Inspectors

Few research fields are as complex as artificial intelligence. And one of the toughest problems facing researchers in this area is vision. To help move things forward, Siemens is developing video systems that are not only capable of learning but can also independently interpret the visible world.

A strange aircraft buzzes through the air at Siemens Corporate Research laboratories in Princeton, New Jersey. Basically a square wire frame, it is driven by a tiny engine block with four helicopter rotors on top. The vehicle is called a quadcopter. Using lasers, it scans windows, walls, machines; optical sensors and video cameras register every architectural detail. It maneuvers through the air on preplanned paths, ready to sense and avoid obstacles that may appear in its path. The data it collects is processed to create precise 3D models of the environment.

Also known as “Fly & Inspect,” the quadcopter project is the product of a collaborative development effort between computer scientist Yakup Genc at Siemens Corporate Technology in Princeton and robotics researcher Nicholas Roy of the Massachusetts Institute of Technology in Boston. The project is designed to yield a system capable of autonomously acquiring data and building digital models of complex environments such as baggage handling facilities, processing plants, and factory halls. Such 3D digital models would then be used to assess service needs or simulate major renovations. Genc and Roy expect Fly & Inspect technology to make this process efficient and robust. Quadcopter could also inspect hard-to-reach places such as wind parks and power masts for signs of wear or damage, as it can be trained to recognize features such as cracks. “At this point, the device still needs a human operator with a remote control unit,” says Genc. “But we expect that it will soon function autonomously using its optical sensors.”

Learning to Process Image Information in Real Time

The development of systems that can process image information in the environment is one of the major challenges facing the field of machine learning. In February 2011 an IBM supercomputer named “Watson” beat the best human contestants on the quiz show “Jeopardy.” But even Watson was only a sophisticated system for evaluating information from databases and Internet searches. In the real world, computers are still awkward. Whereas a small child can tell a tree from an outdoor antenna without any problem, computers are challenged by the same process. But thanks to work now being performed by research groups at universities and companies, elements of machine vision are now approaching commercial application.

At its research and development facilities in Princeton, New Jersey, Graz, Austria and Munich, Germany, Siemens is developing systems that search satellite images for complex patterns such as industrial sites, buildings, roads, and infrastructures; other systems analyze Xray images of containers and packages for suspicious objects, read road signs and monitor crowds and queues, and — as in the case of the quadcopter — map and inspect places that are hard to access. What all these applications have in common is the ability to learn in much the same way as a small child develops the ability to distinguish and differentiate objects. In a process known as “supervised learning” computer scientists feed hundreds of thousands of object images to programs. Algorithms, in turn, distill the characteristics that classes of objects have in common. For example, people on streets usually walk upright, have arms and legs, and roughly oval-shaped heads. A table, on the other hand, has a horizontal surface to place things on and legs underneath to support it. Programs create digital representations of such classes of objects. This can, in turn, make it possible to conduct a semantic search for specialized image data or allow a driver assistance system, for example, to detect traffic signs automatically.

Recognizing Differences. Often, however, researchers would like vision systems to perform more complex tasks, such as counting people in a subway station. But suppose, for instance, that a vision system detects a head but no torso, because a person is occluded. It still has to be able to figure out that it is seeing a person. It does so by knowing how one person or object can occlude and conceal someone behind it and then reasoning about the physical implications of such occlusions.

“In the future it should be possible for computers to recognize more complex patterns from archived video data, especially in forensic systems,” says Vinay Shet, a computer scientist in Princeton. An example of such a forensic search for complex patterns could be tracking of a person across multiple cameras installed at a large facility such as an airport. Shet compares this search for a visual pattern to looking for “visual grammar.” “Like sentences in language, image and video data have a structure that can be formalized and interpreted as visual grammar,” he says. This works by ascribing characteristics to the visual data; the combination of these characteristics in turn can be evaluated in order to assess if the same person can be seen in the images from different cameras. The same visual grammar technology can be used for security screening of cargo and luggage — a project the Siemens Infrastructure & Logistics Division is interested in. Visual pattern recognition can help, for instance, to recognize the characteristic arrangement of a bomb, including a detonator cord, explosives and a phone trigger device. At this point, this task is still undertaken by humans at airports around the world.

 

In a step-by-step process, researchers are teaching artificial systems how to see.

 

At present, the automatic detection algorithms that drive visual searches do not work perfectly. One innovative approach to achieve both the accuracy of humans and the speed of machines is being explored by a team at New York’s Columbia University under the direction of Paul Sajda, an electro-encephalogram (EEG) expert. Funding is being provided by the United States Department of Defense, and machine vision scientists at Siemens Corporate Technology in Princeton are also participating. The idea is to quickly scan very large satellite images in order to detect objects of significance, such as industrial sites, buildings, roads, helicopter landing pads, etc.

The researchers have combined machine vision with electronically-augmented human vision in a system that significantly speeds the total image analysis process. First, machine vision software developed by Siemens masks out regions that are unlikely to contain targets — like homogenous areas without any distinctive features, such as deserts, dense forests or steppes. Second, the remaining potentially interesting parts of the image are divided intosmall square images or “chips” and presented to an image analyst wearing a multi-electrode electro-encephalogram sensor connected to a signal analysis computer. The chips are shown in very rapid succession (five to ten per second) — more quickly than the analyst can consciously analyze and respond to them. But the EEG system can learn to detect a brain signal generated when a chip contains a target of interest. Third, the analyst is shown the regions with the EEG-detected chips and makes the final conscious target detection decision. “This combined approach has increased the speed of analysis fourfold,” says Claus Bahlmann, a Siemens researcher in Princeton.

 
 

Driverless Forklifts

 Intelligent image analysis is also essential for movement in industrial environments. A case in point is the “Autonomous Navigation System” developed by Siemens in Munich and Stuttgart, Germany for commercial vehicles such as forklifts. The vehicle learns its route by being led along it by a worker. It takes cues from the upper regions of a space, which rarely change. This allows it to orient itself and to reliably drive the same route over and over again.

“The system is also capable of object recognition to a certain degree. It recognizes important objects for its tasks in warehouses, such as pallets and crates,” says Gisbert Lawitzky, a robotics expert at Siemens in Munich. Autonomous navigation vehicles are being used at Daimler — especially for transporting pallets to the loading ramps and bringing them back. “Depending on the task, these vehicles will learn to recognize other objects in the future, and they will also recognize the area in which they are located,” says Lawitzky. The potential range of applications for such systems could include security robots, robotic guides in museums, and robotic helpers in department stores. "

 

Research scientist Maneesh Singh in Princeton is also working with a mobile robot. He took a commercially available robot, which essentially looks like a pressure cooker on wheels, and equipped it with a “Kinect” camera system offered by Microsoft that can recognize and interpret user arm and hand motions. The camera, which was originally developed for the Xbox 360 game console, is equipped with a 3D sensor. The sensor enables the robot to not only detect and avoid obstacles but also to produce a real-time model of its surroundings, thereby allowing it to determine its own location after a while.

“As with humans, this mobile device will be able to look at a floor plan at a building’s entrance, understand it, and use it to autonomously navigate to any part of the building. At the same time, it will build a visual memory of the areas it has traveled through,” says Singh.

But Singh has even more ambitious plans for his robot. In the near future, he wants it to use machine learning to not only recognize humans and their activities but also to communicate with them and learn from them via natural interaction. “Sometime soon,” he says, “we will be able to teach robots in much the same way that we teach our children — for example, by pointing objects and speaking to them.”

Like Genc’s Fly & Inspect technology, Singh’s learning robot is still being explored. Within Siemens’ Research and Development Department ideas like these are usually tested for a certain time before a decision is made — in conjunction with associated business units — regarding their suitability for market launch. This gives Siemens engineers freedom to continue trying out new things. One of these attempts is “Outlier,” which pushes the idea of learning further out of the box.

Most adaptive image recognition algorithms are trained before a system is deployed. “Outlier,” on the other hand, is an intelligent surveillance system that learns on the job. Up to now, it has done so only within the laboratory. As it captures video data, Outlier develops statistical models of what can be considered normal within its field of view. If, however, an unusual event presents itself — such as a vehicle skidding across a street — it will detect this as an anomaly and report the incident to a supervisor. It can then learn from feedback to determine whether the incident was relevant or not and will alter its reporting accordingly in the future.

“Outlier is a paradigm shift,” says Josef Birchbauer, a researcher at Siemens in Graz. Its unique feature is that it can constantly adapt to new conditions — and that, as Birchbauer stresses, is essential “in a complex world where it is almost impossible to predict every development” — whether it takes place at an airport or in the heart of Times Square. “Most likely, this approach will not stand on its own though,” cautions Birchbauer. “In the future, video security systems will probably be trained before actual use with the help of thousands of example images, but will then learn in real time during operation.”