Metadata Matters

Published: Tuesday, March 24, 2009

health22

The NC Health Info website (www.nchealthinfo.org) is a goldmine of health information for North Carolina consumers, a one-stop shop to search for healthcare services and reputable information on diseases, treatments, prevention and more.

Like most websites that contain large amounts of information, NC Health Info uses metadata, or cataloging information that describes an entry on the site, to make the entries discoverable through a search of the site. Because metadata matters so much on large, database-driven sites, keeping it accurate and up to date is crucial. It also promises to be much easier in the future, thanks to a collaboration between RENCI, the UNC School of Information and Library Science/Metadata Research Center (SILS/MRC), and NC Health Info, a Web resource developed by the Health Sciences Library at UNC-Chapel Hill.

Last year Jane Greenberg, Francis Carroll McColl Term Professor in SILS and director of the SILS/MRC, and Christie Silbajoris, director of NC Health Info, received funding from the National Network of Libraries in Medicine to research effective ways to automatically maintain the metadata that describes resources on the NC Health Info website. They partnered with RENCI to develop the software that would turn what has always been a time consuming manual task into a semi-automated system.

“NC Health Info aggregates information from more than 7,000 websites,” said Nassib Nassar, the RENCI senior research software developer who worked on the project. “Until now, if there was a change on one of those websites that affected the quality or accuracy of the metadata, it required a staff member to manually search all those sites, find all the relevant new information and then update the metadata.”

Nassar worked with the research team to develop a prototype software system that tracks changes on thousands of websites and highlights them. A cataloguer can then easily evaluate the new information and determine how to change metadata related to a specific site. For example, if a health center included on the NC Health Info site adds a new doctor specializing in sports medicine, the software detects the change on the center’s website and highlights the new information. A cataloguer can then easily update the metadata to reflect this new expertise provided by the health center. As a result, a consumer searching for a sports medicine doctor would be able to find this new expert through a search of the NC Health Info website soon after the doctor joins the medical staff.

“”It is prohibitively expensive to manually generate and maintain metadata,” said Greenberg. “This collaboration has allowed us to develop and test an approach that takes advantage of automatic techniques and that should translate into more cost effective metadata maintenance in NC Health Info.”

The automated system is still experimental, but the researchers believe it could become a model for other health information websites in the Medline Plus Go Local system of sites, which includes 32 health information sites including NC Health Info.

“The larger significance of this project is that it can help a whole network of Go Local sites decrease the staff time needed to keep information accurate and thereby allow them to pay more attention to growing their databases and bringing more health information to North Carolinians and citizens nationwide,” Silbajoris said. “RENCI’s expertise in software development relevant to managing large databases was a great asset for us. We hope to continue to collaborate with RENCI as we turn our prototype automated cataloging system into a production system.”

About RENCI
The Renaissance Computing Institute (RENCI), a multi-institutional organization, brings together multidisciplinary experts and advanced technological capabilities to address pressing research issues and to find solutions to complex problems that affect the quality of life in North Carolina, our nation and the world. RENCI expertise and resources span the span the fields of high performance computing, visualization, networking and data technologies. Founded in 2004 as a major collaborative venture of Duke University, North Carolina State University, the University of North Carolina at Chapel Hill and the state of North Carolina, RENCI is a statewide virtual organization.


no comment so far

Leave a reply