How far should one’s front door be from the road to minimise the risks of burglary? Aviva, an insurance company, say this “goldilocks distance” (not too much, not too little) is six metres. Any less may invite opportunistic vandalism; any more and determined ne’er do wells could get up to mischief unseen. It is one example of how big data are increasingly being used to predict events in the real world, but there are others. Three recent developments show that targeted analysis of data and developments in mobile communications and social media offer crime-busting opportunities too.
In a paper released at the International Conference on Multimodal Interaction (ICMI) in Istanbul on November 15th, a team from the University of Trento in Italy and Telefonica Digital, a Spanish telecommunications firm, have used mobile phone data to predict future crime locations. The research originated from the Datathon for Social Good, a public competition kicked off in 2013 by Telefonica Digital, The Open Data Institute and MIT. With London as the test-case city, three sets of data were made available to participants, each collected over three separate weeks between December 2013 and January 2014.
The first data set consisted of anonymised and aggregated mobile phone data from O2 users, broken down by gender, age groups (from ‘up to 20’ to ‘over 60’) and whether the person was a resident, worker or visitor to that cell tower location. But this information was not used to track individual phones. Instead it highlighted the “social churn” of population movement over time, says Alex Pentland of MIT, who sponsored the work. The second batch of data contained 68 different metrics about the population, such as statistics on demographics, business survival, jobs density and teenage pregnancies. Finally, all reported crimes by location were recorded, categorised as one of 11 possible types (anti-social behaviour, burglary, violent crime etc.).
The three sets of data were grouped geographically and mapped to mobile phone tower coverage. The high population density in central London meant these areas were as small as 200 square metres in some places. In total there were nearly 125,000 such areas, full of juicy data for a clever algorithm to munch through.
The machine learning algorithm in question was let loose on a training data set consisting of 80% of the total information available. It used various combinations of the mobile phone and population data to best account for the reported crimes. These combinations (numbering many thousands) were reduced to the few dozen reckoned to be the most important for accurately predicting future crime locations. When a second test was conducted against the remaining data, the programme achieved an impressive 70% accuracy rate.
But is such a system practical? Expecting mobile phone service providers to allow law enforcement agencies unfettered access to their data is unlikely to get a good hearing in the court of public opinion. And anyway, mapping the movement of people through mobile phones is unnecessary, according to Commander Simon Letchford, the head of the predictive policing system for the real-world London. Crowd-rich environments, attractive to those with criminal intent, are generally easy to spot, he says.
London’s system, on trial until December 2014, is based around three pieces of information: the time, location and type of crime. These are enriched with other metrics where available, such as type of property burgled or whether transport links bisect housing estates (possibly encasing criminal activity one side or the other). This makes it possible for the machine learning algorithm working through the data to identify those 250 square metre boxes around London of most interest to the bobbies on the streets.
Commander Letchford believes such methods will nab the “optimal foragers” among the criminal fraternity: those that habitually operate in the same area or commit many crimes over a short time period. He is also able to put into practice the theory of the Koper Curve, devised by Christopher Koper of the Centre for Evidence-Based Crime Policy at George Mason University in America. His theory suggests police officers should spend no more than 15 minutes every couple of hours in an area to be most effective. Any longer and the shock factor on local scallywags diminishes rapidly. But clever tech can’t feel collars. Identifying boxes likely to be at risk of high crime is only part of effective policing, explains Commander Letchford, “you still need to be a good cop in that box”.
London’s police are not unique in using big data this way; many American forces are fans too. They agree that criminals, once successful, often literally return to the scene of a crime. Home owners do not usually upgrade security systems in the wake of a burglary and similarly constructed buildings can be annoyingly easy to break into one you know how. As a spokesperson for PredPol, a California-based company responsible for supplying many of the American systems warns, “in crime, lightning does strike twice”.
As good as data-crunching super sleuths can be, they are always one step behind reality. But in a rapidly-developing and chaotic situation such as the recent shootings in Ottawa, it is far better to work in the here and now. Even allowing for the fact that eyewitness accounts often contain inaccuracies, social media has shown it can respond faster than official security channels. Which is where the Real-time Detection, Tracking, Monitoring and Interpretation of Events in Social Media (clunkily reduced to ReDites) comes in. A collaboration in Britain between the Engineering and Physical Sciences Research Council (EPSRC), the Defence Science and Technology Laboratory and academia, ReDites analyses 1% of the daily Twitter feed (about four million tweets a day) for pre-selected keywords from specified locations.
Tackling the sheer volume of tweets was worth the effort to tap into a resource tuned to report events immediately. The system is built to “pluck out the nuggets and ignore the false positives,” according to Miles Osbourne of the University of Edinburgh. It does so by favouring repetition of related words in separate tweets over precise information contained in fewer postings. ReDites automatically weeds out the chaff using a purpose-built Violence Detection Model. Words such as ‘gun’, ‘police’ or ‘explosion’ flag up tweets of interest, which Alex Hulkes of the EPSRC calls “adding the clever to the quick”. The system then gives a gist of the breaking news, grouped by location, to the user.
Unlike predictive policing systems, ReDites is a concept demonstrator and has yet to be used operationally by a police force. But big data are already leaving clues as to what the future of law enforcement may look like.