All a Kaggle: DSTL teamup identifies novel ways to evaluate defence data

The UK’s Defence Science and Technology Laboratory (Dstl) recently partnered with Kaggle, the world’s largest data science competition community, to identify novel methods to evaluate large, complex data sets. The competition attracted over 5,000 submissions and will be used to help intelligence analysts evaluate information more quickly, accurately and effectively. By Claire Apthorp.


Imagery captured by satellites orbiting high above the earth’s surface have revolutionised the way we understand our planet, its man-made and naturally occurring features, and how it is impacted by weather, climate patterns and natural disasters.

On a scientific scale, the proliferation of this imagery has offered radically improved ways of locating and mobilising resources during emergencies, and enabling agriculture, vegetation and natural resources to be monitored more effectively than ever before.

For national security and defence, the ability to analyse the movements of our allies and enemies, understand the battlefield in detail, and keep track of assets as they move around the globe has proved a dramatic game-changer between the haves and have nots when it comes to satellite imagery capabilities.

Taking up the challenge

The capture and dissemination of satellite imagery is just the first step in the intelligence-gathering mission. All the satellite imagery in the world is next to useless in its raw state, without advanced technology that can process and extract actionable information from the large, complex datasets. In large part, this process – that allows roads, buildings, vegetation, or potential airfields, hospitals and enemy ammunition arsenals to be identified from within the reams of available imagery – has fallen to human eyes or imperfect semi-automated methods.

This is the challenge taken up by Dstl and Kaggle with its Satellite Imagery Feature Detection competition. Posing the question, Can you train an eye in the sky?, the competition looked to draw on one of the world’s fastest growing resources: crowdsourcing.

"The competition looked to draw on one of the world’s fastest growing resources: crowdsourcing."

The competition sought novel solutions to alleviate the burden on image analysts by challenging teams of ‘Kagglers’ to accurately classify features in overhead imagery. The goal was not only to help Dstl make smart decisions more quickly regarding the UK’s defence and security issues, but also to encourage innovation to computer vision methodologies, applied to satellite imagery.

“Dstl has a finite number of data scientists and there is huge demand for their expertise,” said a Dstl spokesperson. ”Crowdsourcing enables us to engage with data scientists from across the world to help us develop solutions to some of the issues that we are facing.”

Sourcing the crowd

Evidence from the private sector demonstrated the benefits of employing a crowdsourcing approach to satisfy requirements with quality solutions. And while this was Dstl’s first foray into using crowdsourcing, Kaggle brought niche capabilities to the table that allowed the agency to dip into the concept.

“Kaggle provided access to the largest and most varied community of data scientists in the world,” the spokesperson said about the company’s selection. “Additionally, Kaggle as a platform and the Kaggle participants had a track record of expertise in the strand of machine learning that was of most interest to the project - convolutional neural networks.”

Participants were tasked with detecting and classifying the types of objects found within a 1km by 1km satellite image captured by the WorldView-3 commercial Earth observation satellite in both 3-band and 16-band formats. The 3-band images were traditional RGB natural colour images; while the 16-band images contained spectral information by capturing wider wavelength channels. Both imagery types were taken from the multispectral (400 – 1040nm) and short-wave infrared (1195-2365nm) range.

Ten different classes of objects were located within the imagery: buildings, miscellaneous manmade structures, roads, tracks (port/dirt/footpath/trail), trees, crops, waterways, standing water, and both large and small vehicles. Participants had to develop an algorithm or software that automatically detected and identified as many of these objects as possible.

Innovation on tap

The competition attracted over 5,000 submitted solutions, which equates to over £2.5m in research or a sevenfold return on the cost of running the competition. Following analysis of the results, Dstl is positive that the quality of the responses shows that crowdsourcing data has the potential to become an invaluable tool in tackling complex challenges relating to the UK’s defence and security.

"We are very pleased with the results."

“We are very pleased with the results,” the spokesperson said. “The winning solutions have identified new and innovative ways of achieving the objectives that we set - this could have taken us several years to achieve if we hadn’t gone down this route.”

Dstl said that the winning algorithms and software will be used in a number of defence and security applications where the ability to rapidly and accurately analyse data sets is important, although the spokesperson was unable to comment further on this.

Data science

“Our experience of working with Kaggle has proved the value of crowdsourcing, and we are keeping our crowdsourcing options open for the future, being through Kaggle or otherwise,” the Dstl spokesperson said. ”We have recently launched the Data Science Challenge, and the first challenges – detecting and classifying vehicles from aerial imagery and the classification of documents by themes – are now open to entrants.’

This challenge looks to utilise the value of data science – the discipline of extracting useful insight from large amounts of collected data – to bolster the UK’s defence and security, while also promoting its economic security. In the defence and security sector data science plays a role in allowing effective decisions to be made using the staggering amounts of data generated during any humanitarian crisis or international conflict. Creating new ways to understand which information is relevant, how accurate it is and whether it can be relied on is central to assessing situations, engendering intervention, and acting for the best possible outcome. The series of competitions aims to bring together the best minds in the sector to solve these real-world challenges.

The results of these forays into using crowdsourcing as a valuable research tool are proving a promising new avenue for agencies such as Dstl to satisfy requirements with quality solutions. And as this toe dipped into the data scientist community shows, the sky is the limit when it comes to pushing this resource to its full potential.