Singapore Library Uses Analytics and Big Data Technology to Ease Users’ Search-UNPAN - United Nations Public Administration Network

Home > United Nations Online Network in Public Administration and Finance (UNPAN)

1. Global: Global
2. Africa: Africa
3. Arab States: Arab States
4. Asia & Pacific: Asia & Pacific
5. Europe: Europe
6. Latin America & Caribbean: Latin America & Caribbean
7. North America: North America

UNPAN Asia & Pacific

Register | Login


Home About UNPAN UNPAN Coordinator (DPADM) UNPAN Partners UNPAN Newsletter UNPAN User's Survey UNPAN CMS UNPAN Portal Guide FAQs DPADM Intranet My UNPAN RSS feeds Contact Us Regions Global Africa Arab States Asia & Pacific Europe Latin America & Caribbean North America Standards/Codes International Code of Conduct Regional Public Service Charter National Codes of Conducts Standard of Excellence in PA Education&Training; Ethics, Transparency and Accountability E-Learning Online Training Centre Featured Learning Materials Learning Materials News Public Administration News Governance World Watch Governance Asia-Pacific Watch RCOCI Informatization Bulletin Library Search Documents Advanced Search Major Publications UN E-Government Surveys World Public Sector Report Digital Governance in Municipalities Worldwide DPADM E-Library Resolutions UN E-Government Development Database Knowledge Base of E-Government Practices Knowledge Base of ICT for Public Service Knowledge Base of UN Public Service Awards Thematic Project Websites Technical Project Highlights UNPAN Photos UNPAN Videos Events The Committee of Experts on Public Administration DPADM Monthly Bulletin UN Public Service Day UN Public Service Awards Browse Events by Calendar Browse Events by UNPAN Partner Global Forum Directories Practitioners/Experts Glossary Employment Opportunities Blog Contact Us

Public Administration News

Singapore Library Uses Analytics and Big Data Technology to Ease Users’ Search

Source:	futuregov.asia
Source Date:	Wednesday, September 11, 2013
Focus:	ICT for MDGs
Country:	Singapore
Created:	Sep 18, 2013

National Library Board (NLB) used data mining and text analytics techniques, as well as big data technologies, to connect its structured and unstructured content so the most relevant information can be pushed to users automatically on its web sites and portals. “With so much data available, how can busy users find the right bits of information? For example, a typical search on an Internet search engine returns thousands of results. A user has to sieve through articles after articles until he finds what he needs,” explains Lim Chee Kiam (pictured), Senior Solution Architect at NLB. “Instead of having users repeat the tedious search and sieve process, we should push the most relevant information packages to them. To do this, we must connect our content.” Connecting Structured Data The group of 25 public libraries and one National Library houses over a million physical titles, which generate over 30 million loans a year. Using data mining techniques on past loan transactions and bibliography records of books, the library successfully connected their titles and launched a title recommendation service on its websites and portals since 2009. “Besides showing the book that you’ve searched for, a section on the side shows you a list of books other patrons who have borrowed this book also borrowed. Collaborative filtering mines the reading patterns within hundreds of millions of loan records in the last three years to make recommendations,” he elaborated. The system also rely on content-based filtering using bibliographic records and generates another list of recommended books, under ‘similar titles you may also enjoy’. Because fiction titles are more frequently loaned, the system can generate recommendations for 89 per cent of the fiction titles, compared to only 53 per cent of non fiction titles. NLB is currently working on title recommendations on new arrivals. Once rolled out, when a patron is looking at a particular title, he or she will be able to see if there are related new arrivals of interest. Connecting Unstructured Data Unstructured data makes up a huge and growing portion of content that NLB holds. It has successfully used text analytics on Infopedia (a micro site with less than 2000 articles), the Singapore Memory portal, and 58,000 newspaper articles on NewspaperSG. “The results were very promising. Interestingly, when we organise the recommended articles in a chronological order, we can discover the progression of an event and see how the story unfolds,” said Lim. Lim’s team is now working on using text analytics on the above collection and 6 million newspaper articles. “This gave us our first real scalability issue. The processing ran for more than a week before we ran out of disk storage,” he recounted. Processing the older issues of the newspapers surfaced another challenge. Newspaper microfilms were digitised by using Optical Character Recognition software, but errors were common. There errors introduced ‘noise’ into the data set, and also significantly increased the complexity of the computation, leading to lengthy processing and the need for huge amount of intermediate disk storage. “To address this issue, we tuned the parameters for the text analytics algorithm to ignore infrequent word tokens. We also set up a full Apache Hadoop cluster with 13 virtual serves on three virtual machine hosts so that we have a reliable, scalable and distributed computing platform,” continued Lim. While the team has successfully reduced the time needed to process the data, they are still working towards processing the 6 million articles. Looking ahead, Lim hopes to enrich the content with semantic information so that content becomes connected semantically instead of just textually. He also wants to enrich the content with language translations to explore the possibilities of connecting content in different languages, particularly useful given that Singapore has four official languages.

Rate:

0 ratings

Views:

48

Comments:

0

Favorited:

0

Bookmarked:

0

Tagged:

0

0 Comments | Login to add comment

| | |

Copyright 2008-2010 by UNPAN - United Nations Public Administration Network