On the Performance of Latent Semantic Indexing-based Information Retrieval

Kumar, Aswani; Srinivas, S.
September 2009
Journal of Computing & Information Technology;Sep2009, Vol. 17 Issue 3, p259
Academic Journal
Conventional vector-based Information Retrieval (IR) models: Vector Space Model (VSM) and Generalized Vector Space Model (GVSM) represents documents and queries as vectors in a multidimensional space. This high dimensional data places great demands on computing resources. To overcome these problems, Latent Semantic Indexing (LSI), a variant of VSM, projects the documents into a lower dimensional space. It is stated in IR literature that LSI model is 30% more effective than classical VSM models. However, statistical significance tests are required to evaluate the reliability of such comparisons. Focus of this paper is to address this issue. We discuss the tradeoffs of VSM, GVSM, LSI and evaluate the difference in performance on four testing document collections. Then we analyze the statistical significance of these performance differences.


Related Articles

  • Lower Diagonal Bilinear Moving Average Vector Models. Usoro, Anthony E.; Omekara, C. O. // Advances in Applied Mathematical Analysis;2008, Vol. 3 Issue 1, p49 

    This research work seeks to identify a case from the general Bilinear Moving Average Vector (BMAV) models. This is a case in which the parameters of the matrices of the non-linear part are restricted to the lower diagonal coefficients. The models, 'Lower Diagonal Bilinear Moving Average (LDBMAV)...

  • On the regularization of vector integer quadratic programming problems. Emelichev, V.; Gurevskii, E. // Cybernetics & Systems Analysis;Mar2009, Vol. 45 Issue 2, p274 

    For a vector integer quadratic programming problem, a regularizing operator is proposed that acts on a vector criterion and transforms a possibly unstable initial problem into a series of perturbed stable problems with the same Pareto set. The technique of ε-regularization is developed that...

  • The Third Order Helicity of Magnetic Fields via Link Maps. Komendarczyk, R. // Communications in Mathematical Physics;Dec2009, Vol. 292 Issue 2, p431 

    We introduce an alternative approach to the third order helicity of a volume preserving vector field B, which leads us to a lower bound for the L2-energy of B. The proposed approach exploits correspondence between the Milnor $${\bar{\mu}_{123}}$$ -invariant for 3-component links and the homotopy...

  • Universal Pattern Set for Arithmetic Circuits. Kumar, Ashok; Choudhary, Rahul Raj; Bhardwaj, Pooja; Dhaka, M. S.; Choudhary, Rajkumar // International Journal of Computer Applications;Feb2012, Vol. 40, p47 

    The exponential increase in test cost is one of the new challenges being posed by technology scaling. This Paper has been aimed to deal with the issue of testing cost which adds to the chip cost. Here we propose a new pattern set for testing the arithmetic circuits which contains a minimum...

  • Molecular Field Calculation of Magnetization on NdRh2Ge2 Single Crystal. Himori, A.; Hattori, K.; Shigeoka, T. // Research Letters in Physics;2008, p1 

    Calculation of magnetization of the ternary single crystal compound NdRh2Ge2 has been carried out by using the wave-like molecular field model to explain the complex magnetic behavior. The field-induced magnetic structures having the propagation vectors, Q2 = (0, 0, 39/40), Q3 = (0, 0, 35/40),...

  • Searching for Text in Vector Space. Dovidio, Nicholas A.; Chartier, Timothy P. // UMAP Journal;Winter2008, Vol. 29 Issue 4, p417 

    The article focuses on vector space models. It states that the model offers a choice for searching databases for certain information about a specific topic. It mentions that vector space model has its own shortcomings but, it still can produce useful data. It also discusses the computational...

  • Support Vector Machines For Understanding Lane Color and Sidewalks. Hoon Lee; Soonyoung Park; Kyoungho Choi // Proceedings of World Academy of Science: Engineering & Technolog;Feb2009 Supplement, Vol. 50, p1053 

    Understanding road features such as lanes, the color of lanes, and sidewalks in a live video captured from a moving vehicle is essential to build video-based navigation systems. In this paper, we present a novel idea to understand the road features using support vector machines. Various feature...

  • Application of serum tumor markers and support vector machine in the diagnosis of oral squamous cell carcinoma. ZHONG Lai-ping; ZHOU Xiao-jian; WEI Kui-jie; YANG Xiao; MA Chun-yue; ZHANG Chen-ping; ZHANG Zhi-yuan // Shanghai Journal of Stomatology;Oct2008, Vol. 17 Issue 5, p457 

    PURPOSE: To investigate the clinical application value of serum tumor markers detection combined with support vector machine (SVM) model in the diagnosis of oral squamous cell carcinoma. METHODS: Serum levels of neuron-specific enolase (NSE), cancer antigen 242 (CA242), cancer antigen 19-9...

  • Text Clustering Based on Domain Ontology and Latent Semantic Analysis. Li Yaxiong; Pan Deng // Applied Mechanics & Materials;2014, Issue 556-562, p3536 

    One key step in text mining is the categorization of texts, i.e., to put texts of the same or similar contents into one group so as to distinguish texts of different contents. However, traditional word-frequency-based statistical approaches, such as VSM model, failed to reflect the complicated...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics