Blog 7 November 2021 4 min read

Visual SLAM for Robotic Digital Inspection

Industrial Digital Engineering
Jibin Johnson

A brief of SLAM Technology

Visual Simultaneous Localization and Mapping (VSLAM) is developing into an essential advancement in embedded vision, with many different possible applications. Even though it has immense potential in various settings, from a commercial standpoint, VSLAM is still in its infancy from a technical perspective. The role of VSLAM in digital inspection for the future is a hot topic. Robotic digital inspection is one among those several potential industrial applications that can be revolutionized using VSLAM.

It is a good idea to understand the evolution of VSLAM technology over the past decade and current trends as a starting point. In the robotics community, VSLAM has been studied for many decades. Visual SLAM has made significant progress with geometric model-based methods becoming increasingly sophisticated and accurate. These methods, however, become troublesome under challenging conditions. In recent years there have been several initiatives in the development community to augment VSLAM with data-driven approaches like Deep Learning to solve typical visual inspection problems with better and more accurate results.

Robotic Digital Inspection and VSLAM

Robotic inspection can effectively and efficiently complete inspection tasks and reduce the workers’ labor intensity. For robots to perform inspection tasks, accurate and stable positioning and navigation are essential. SLAM is the technique that helps map the 3D structure of an unknown environment and a sensor in motion in the environment. SLAM was originally proposed to achieve autonomous control of robots in robotics. Visual SLAM refers to the process of determining the position and orientation of a sensor concerning its surroundings while simultaneously mapping the environment around the sensor.

The Evolution of VSLAM

Early works in the SLAM were based on odometry and ultrasonic sensing as inputs. In the 1990s, the research on SLAM in mobile robotics and structure from motion (SFM) in computer vision developed independently. However, later they almost turned away from each other. The landmark achievement of bringing vision into SLAM came from A. Davison’s Mono-SLAM in the early 2000s.

Later parallel tracking and mapping (PTAM) became the new standard for localized tracking and mapping. PTAM achieved considerately satisfactory results in small local space, while the problem of large-scale mapping remained unresolved. The solution to a large-scale map includes two parts, efficient map representation and refinement, and loop closing. Efficient map representation and refinement is a necessary condition for good loop closing.

(Reference from web, source acknowledged)

Then came ORB-SLAM, a more traditional feature-based system, and quite like PTAM in some way yet attained impressive performance in practice. As robotics and computer vision became prominent research fields to the public, Visual SLAM attracted many researchers to devote their energy and intelligence to enhance VSLAM and share their works in communities.

There are several different types of SLAM technology. Some of these didn’t involve a camera at all. Visual SLAM is a specific type of SLAM system that leverages 3D vision to perform location and mapping functions when neither the environment nor the sensors’ location is known. Visual SLAM systems are also used in a wide variety of field robots. For example, rovers and landers for exploring Mars use VSLAM systems to navigate autonomously. Robots in agriculture fields and drones use the same technology to travel around crop fields independently.

VSLAM offers many more promising applications for intelligent environmental perception. Image recognition technologies and AI can often only show their full strength if they know where an object is about a vehicle or robot. With VSLAM, applications such as trained parking or AR-based navigation can be made available to the mass. VSLAM relies on information from a camera installed on the robot, like the human eyes’ spatial perception of the environment. With this information, software constructs an up-to-date 3D map of the environment where the robot determines its exact position and reacts to changes in real-time.

Recent Developments in VSLAM

Visual SLAM has recently received lots of attention due to abundant external environment texture information available for robots. Visual sensing still faces numerous challenges because of image degeneration caused by sensor noise, environment lightings, or rapid movement and requires more advanced methods. With the development of computer vision technology, the technical market of VSLAM also advanced and has been applied in indoor navigation, Virtual Reality/ Augmented Reality and other fields. Recently several researchers have investigated ways to obtain increased image information and employed other sensors for data fusion for more robust SLAM methods that could be applied to accurate localization and mapping in complex environments.

Recent Trends in Visual SLAM:

● Visual SLAM based on deep learning.
● Multi-Sensor fusion SLAM to achieve robust results in a complex environment.
● Extraction and matching of multiple visual features.
● Direct SLAM methods for precise results and minimizing error.

Industry Application Scenario

With the increasing demands for energy, oil and gas companies demand to improve their efficiency, productivity, and safety. Any potential corrosions and cracks on their production, storage, or transportation facilities could cause disasters to both society and the natural environment. Since many oil and gas assets are in extreme environments, there is high demand for robots to perform inspection tasks, which will be more cost-effective, accurate, and safer. With the implementation of VSLAM for digital inspection in Oil and Gas Industries, the autonomous localization and navigation performance of Autonomous Underwater Vehicles (AUV) and Unmanned Aria Vehicles (UAV) have improved tremendously.

Challenges to Practical Applications

In the digital inspection domain, the robots should be as close to the facilities as possible to perform the inspection tasks accurately. This, in turn, causes the scenery around the robot to change very rapidly. As a result, perceiving the dynamic environment and reacting in real-time is a significant problem that needs to be solved for the real-world application of digital inspection tasks.

The ability to sense the location of a camera (sensor) and the environment around it without knowing either data point beforehand is incredibly tough. Visual SLAM systems are proving highly effective at tackling this challenge and have emerged as one of the most sophisticated technologies in embedded vision. The advancement of computer vision, digital image processing, and artificial intelligence augments the effectiveness of vision-based SLAM.


The robustness of positioning and mapping in a complex environment remains a challenge from a practical application standpoint as the variation of the environment texture and lighting impacts the estimation of both. The combination of depth cameras and other sensors is being explored to meet the demands of robust positioning and mapping of Visual SLAM under challenging environments to attain the accuracy required for practical applications. However, the current Visual SLAM algorithms need to be improved significantly for applying in commercial scenarios. With the advancements in Visual SLAM technology combined with other computer vision technologies and sensor data, VSLAM is expected to play a critical role in digital inspection applications in the near future.

Explore More