UPSI Digital Repository (UDRep)
Start | FAQ | About
Menu Icon

QR Code Link :

Type :article
Subject :P Philology. Linguistics
ISSN :0094243X
Main Author :Tan, Kian Lam
Additional Authors :Lim, Chen Kim
Title :Language model: extension to solve inconsistency, incompleteness, and short query in cultural heritage collection
Year of Publication :2017

Abstract :
With the explosive growth of online information such as email messages, news articles, and scientific literature, many institutions and museums are converting their cultural collections from physical data to digital format. However, this conversion resulted in the issues of inconsistency and incompleteness. Besides, the usage of inaccurate keywords also resulted in short query problem. Most of the time, the inconsistency and incompleteness are caused by the aggregation fault in annotating a document itself while the short query problem is caused by naive user who has prior knowledge and experience in cultural heritage domain. In this paper, we presented an approach to solve the problem of inconsistency, incompleteness and short query by incorporating the Term Similarity Matrix into the Language Model. Our approach is tested on the Cultural Heritage in CLEF (CHiC) collection which consists of short queries and documents. The results show that the proposed approach is effective and has improved the accuracy in retrieval time.

References

1. Amati, G., Van Rijsbergen, C.J. Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness (2002) ACM Transactions on Information Systems, 20 (4), pp. 357-389 2. Bendersky, M., Croft, W.B. Discovering key concepts in verbose queries (2008) ACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings, pp. 491-498. Cited 185 times. ISBN: 978-160558164-4 doi: 10.1145/1390334.1390419 3. Berger, A., Lafferty, J. Information retrieval as statistical translation (1999) Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, pp. 222-229. Cited 372 times. ISBN: 1581130961; 978-158113096-6 doi: 10.1145/312624.312681 4. Carmel, D., Zwerdling, N., Yogev, S. Entity oriented search and exploration for cultural heritage collections: The EU Cultura project (2012) WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web Companion, pp. 227-230. Cited 8 times. ISBN: 978-145031230-1 doi: 10.1145/2187980.2188015 5. Chevallet, J.-P. X-iota: An open xml framework for ir experimentation (2005) Lecture Notes in Computer Science, 8, pp. 263-280. Cited 2 times. 6. Clough, P., Ford, N., Stevenson, M. Personalizing access to cultural heritage collections using pathways (2011) International Workshoon Personalized Access to Cultural Heritage. Cited 3 times. 7. Crestani, F. Exploiting the similarity of non-matching terms at retrieval time (2000) Information Retrieval, 2 (1), pp. 25-45. Cited 27 times. 8. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R. Indexing by latent semantic analysis (1990) Journal of the American Society for Information Science, 41 (6), pp. 391-407. Cited 6837 times. doi: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 9. Jing, Y., Croft, W.B. An association thesaurus for information retrieval (1994) RIAO Conference Proceedings, pp. 146-160. Cited 175 times. 10. Karimzadehgan, M., Zhai, C. Estimation of statistical translation models based on mutual information for ad hoc information retrieval (2010) SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 323-330. Cited 60 times. ISBN: 978-160558896-4 doi: 10.1145/1835449.1835505 11. Krovetz, Robert Viewing morphology as an inference process (1993) Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Infofmation Retrieval, pp. 191-202. Cited 333 times. ISBN: 0897916050 12. Lavrenko, V., Bruce Croft, W. Relevance-based language models (2001) SIGIR Forum (ACM Special Interest Group on Information Retrieval), pp. 120-127. Cited 929 times. 13. Manning, C.D., Raghavan, P., Schutze, H. (2008) Introduction to Information Retrieval. Cited 8392 times. Cambridge University Press, New York 14. Markines, B., Benz, D., Cattuto, C., Hotho, A., Menczer, F., Stumme, G. Evaluating similarity measures for emergent semantics of social tagging (2009) WWW'09 - Proceedings of the 18th International World Wide Web Conference, pp. 641-650. Cited 181 times. ISBN: 978-160558487-4 doi: 10.1145/1526709.1526796 15. Peat, H.J., Willett, P. The limitations of term co‐occurrence data for query expansion in document retrieval systems (1991) Journal of the American Society for Information Science, 42 (5), pp. 378-383. Cited 175 times. doi: 10.1002/(SICI)1097-4571(199106)42:5<378::AID-ASI8>3.0.CO;2-8 16. Peng, F., Ahmed, N., Li, X., Lu, Y. Context sensitive stemming for web search (2007) Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07, pp. 639-646. Cited 52 times. ISBN: 1595935975; 978-159593597-7 doi: 10.1145/1277741.1277851 17. Ponte, Jay M., Croft, W.Bruce Language modeling approach to information retrieval (1998) SIGIR Forum (ACM Special Interest Group on Information Retrieval), pp. 275-281. Cited 1652 times. 18. Porter, M.F. (1997) Readings in Information Retrieval: An Algorithm for Suffix Stripping, pp. 313-316. Cited 70 times. Morgan Kaufmann Publishers Inc 19. Rijsbergen, C.J.V. (1979) Information Retrieval. Cited 5319 times. Butterworth-Heinemann, Newton, MA, USA, 2nd edition 20. Robertson, S.E. Overview of the OKAPI projects (1997) Journal of Documentation, 53 (1), pp. 3-7. Cited 130 times. http://www.emeraldinsight.com/info/journals/jd/jd.jsp doi: 10.1108/EUM0000000007186 21. Salton, G. (1971) The SMART Retrieval System Experiments in Automatic Document Processing. Cited 1341 times. Prentice Hall 22. Salton, G. The smart project in automatic document retrieval (1991) International ACM SIGIR Conference, pp. 356-358. Cited 4 times. 23. Salton, G., Buckley, C. Term-weighting approaches in automatic text retrieval (1988) Information Processing and Management, 24 (5), pp. 513-523. Cited 4659 times. doi: 10.1016/0306-4573(88)90021-0 24. Srinivas, G.R.J., Tandon, N., Varma, V. A weighted tag similarity measure based on a collaborative weight model (2010) International Conference on Information and Knowledge Management, Proceedings, pp. 79-86. Cited 11 times. ISBN: 978-145030386-6 doi: 10.1145/1871985.1871999 25. Xu, Jinxi, Croft, W.Bruce Query expansion using local and global document analysis (1996) SIGIR Forum (ACM Special Interest Group on Information Retrieval), pp. 4-11. Cited 814 times. 26. Zhao, L., Callan, J. Automatic term mismatch diagnosis for selective query expansion (2012) SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 515-524. Cited 18 times. ISBN: 978-145031658-3 doi: 10.1145/2348283.2348354


This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials.
You may use the digitized material for private study, scholarship, or research.

Back to previous page

Installed and configured by Bahagian Automasi, Perpustakaan Tuanku Bainun, Universiti Pendidikan Sultan Idris
If you have enquiries, kindly contact us at pustakasys@upsi.edu.my or 016-3630263. Office hours only.