A recent update to a publicly downloadable database maintained by the National Institute of Standards and Technology (NIST) will make it easier to sift through computers, cellphones and other electronic equipment seized in police raids, potentially helping law enforcement catch sexual predators and other criminals.
The database, called the National Software Reference Library (NSRL), plays a frequent role in criminal investigations involving electronic files, which can be evidence of wrongdoing. In the first major update to the NSRL in two decades, NIST has increased the number and type of records in the database to reflect the widening variety of software files that law enforcement might encounter on a device. The agency has also changed the format of the records to make the NSRL more searchable.
“There are hardly any major crimes that don’t have connections to digital technology, because criminals use cellphones,” said Doug White, a NIST computer scientist who helps maintain the NSRL. “Only some of the data on a phone or other device might be relevant to an investigation, though. The update should make it easier for police to separate the wheat from the chaff.”
Both criminal and civil investigations frequently involve digital evidence in the form of software and files from seized computers or cellphones. Investigators need a way to filter out the large quantities of data that are irrelevant to the investigation so they can focus attention on finding relevant evidence.
“Let’s say you’ve got a computer that might contain incriminating photos or financial records, but it also has a few video games,” White said. “Games often come with a lot of graphics files. You want to run your investigation as quickly and efficiently as possible, so what you need is a way to get rid of all the video game images. Then you can run your more computationally expensive analysis on the files that remain.”
The update comes at a time when investigators must contend with a rapidly expanding universe of software, most of which produces numerous files that are stored in memory. Each of these files can be identified by a sort of electronic fingerprint called a hash, which is the key to the sifting process. The sophistication of the sifting process can vary depending on the type of investigation being performed. The NSRL’s reference dataset doubled in size from half a billion hash records in August of 2019 to more than a billion in March 2022, and White says he anticipates its rapid growth to continue.
This growth makes the NSRL a vitally important tool for digital forensics labs, which specialize in this sort of file review. Such work has become a crucial part of investigations: There are about 11,000 digital forensics labs in the United States (compared with about 400 crime labs). While digital evidence plays a role in many types of crime, it is particularly useful for catching child predators, who often have sexual abuse imagery stored in a phone or computer’s memory.
While the number of NSRL entries is growing both numerically and by file type — White anticipates adding entries from Internet of Things (IoT) devices such as smart speakers in the near future — the recent update to the database should help investigators handle the burden. The previous 2.0 version, which dates back 20 years, offered its hashes as basic text files that could be imported into a spreadsheet. Searching the list was possible but cumbersome compared with modern search engine functions. The update, which is NSRL version 3.0, uses the SQLite format, which makes it easier for users to create custom filters to sort through files and find what they need for a particular investigation.
Another advantage is that the NSRL managers will be able to distribute future changes to the dataset as comparatively small updates rather than sending out the entire dataset anew, saving time and effort for users. White also said the NSRL would continue to be available in its old format for the benefit of users who may need time to adjust to the changes.
“We will continue to publish the dataset in both the 2.0 and 3.0 formats through December 2022,” White said. “After that, there is a relatively easy query that users can run to generate the 2.0 dataset if it proves necessary.”
The dataset and more information on the update are available via the NIST website.