Printer Friendly
The Free Library
22,719,120 articles and books

Assessment outcome coherence using LSA scoring.


This paper adapts latent semantic analysis Latent semantic analysis (LSA) is a technique in natural language processing, in particular in vectorial semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms.  (LSA LSA - Link State Advertisement ) as an automated, statistically reliable metric for comparing desired outcomes or objectives across educational programs. After a general introduction to LSA, demonstrating the ability of LSA to compare texts contexts, an LSA metric will be sketched (assessment outcomes/objectives scoring tool) for measuring the relatedness of program outcomes and objectives across a university. This metric quickly identifies programs with outcome/objective sets that are very divergent from normative objective/outcome sets.


Cognitive scientists Below are some notable researchers in cognitive science.

Computer science
  • Rodney Brooks
  • Douglas Hofstadter
  • David Kirsh
  • Janet Kolodner
  • Marvin Minsky
  • Seymour Papert
  • Roger Schank
  • Herbert Simon
  • Alan Turing

 and computational psycholinguistics psycholinguistics, the study of psychological states and mental activity associated with the use of language. An important focus of psycholinguistics is the largely unconscious application of grammatical rules that enable people to produce and comprehend intelligible  have developed Latent Semantic Analysis (LSA) as a statistical model for comparing semantic similarity Semantic similarity, is a concept whereby a set of documents or terms within term lists are assigned a metric based on the likeness of their meaning / semantic content.  of units of text to each other. LSA provides an automatic method for comparing units of textual information to each other in order to determine their semantic relatedness

Main article: Semantic similarity

Computational Measures of Semantic Relatedness are publically available means for approximating the relative meaning of words/documents.
. It was originally designed to improve information retrieval information retrieval

Recovery of information, especially in a database stored in a computer. Two main approaches are matching words in the query against the database index (keyword searching) and traversing the database using hypertext or hypermedia links.
 methods by performing retrieval based on derived "semantic" content of words in a query (e.g. "Googling") as opposed to performing direct word matching. Recent applications have gone beyond information retrieval to include automatic essay grading, plagiarism detection With the advent of the Internet, it has never been easier for students to plagiarize the work of others. Many teachers are looking for efficient ways to fight plagiarism. A few solutions exist. , automatic summarization Automatic summarization is the creation of a shortened version of a text by a computer program. The product of this procedure still contains the most important points of the original text.  of texts, tutoring of writing, intelligence monitoring, and analysis of sequential key message retention. (Foltz, 1996; Landauer, 2002; Martin, 2004) For an online bibliography of various applications, see Lemaire and Dessus (2004). This exploratory study adapts LSA for a metric to quickly review program assessment efforts in an educational environment.

The metric is applied to a web-based program assessment tool used by all academic and support programs in a mid-Atlantic university for reporting mission, outcomes/objectives, measures, findings and actions. The tool allows real-time review, consolidated reporting, tracking of actions, and other analysis of all program assessment efforts. The university is complex, enrolling 26,000 students in more than 170 programs in the arts, sciences and humanities and 112 support units on multiple campuses, including an international location. Because of the variety of programs and the desires of faculty and administrators to write outcomes/objectives in the way that best fit their individual program, the web-based program assessment tool allows a great deal of flexibility for individual programs to construct their own narratives. At the same time, however, the institution must keep on top of the process.

Previous Development of LSA

The theoretical assumption of LSA is there is some underlying or "latent" structure in the pattern of word usage across documents. Meanings of a word modify as we use a word in different contexts. Similarly, different words sometimes have the same meaning depending on contexts. The idea of latent semantic analysis is that the aggregate of all the word contexts in which a particular word does, and does not, appear provides a set of mutual constraints that reflects the similarity of meaning of words to each other. (Foltz, 1996) LSA represents the contextual-usage meaning of words in a text with statistical computations. (Landauer and Dumais, 1997). LSA is more completely described in Landauer, Foltz and Laham (1998) and the seminal article of Deerwester, Dumais, Fumas, Landauer and Harshman, (1990).

The development of LSA was at Bellcore (now Telcordia) in the late 1980s. The United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area.  Patent 4,839,853 was granted on June 13, 1989 to inventors Scott C. Deerwester, Susan T. Dumais, George W. Furnas, Richard A. Harshman, Thomas K. Landauer, Karen E. Lochbaum, Lynn A. Streeter. A cross-language information retrieval Cross-language information retrieval (CLIR) is a subfield of information retrieval dealing with retrieving information written in a language different from the language of the user's query.  patent followed (US Patent 5,301,109, April 5, 1994). The specific application of LSA for informational retrieval is now called latent semantic indexing (LSI LSI: see integrated circuit.

(Large Scale Integration) Between 3,000 and 100,000 transistors on a chip. See SSI, MSI, VLSI and ULSI.
). What is the difference between LSA and LSI? Simply, LSI refers to using the approach for indexing or information retrieval; LSA refers to all other applications. The Telcordia LSI reference site provides excellent background materials. ( Additionally, a very comprehensible com·pre·hen·si·ble  
Readily comprehended or understood; intelligible.

[Latin compreh
 discussion of LSI and the information retrieval process is from the National Institute of Technology in Liberal Education. ( In short, this measure is based on a powerful statistical analysis of direct and indirect relations among words and passages in a large text corpus In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically stored and processed). They are used to do statistical analysis, checking occurrences or validating linguistic rules on a specific universe.  and can capture the extent to which two text units are discussing semantically related information.

Research Questions

Can LSA be adapted to help some aspects of program assessment construction and implementation? The beginning exploration asks: (1) Can LSA provide a metric to indicate relatedness of outcomes/objective sets across an institution? (2) Do LSA metrics provide a range to discriminate semantic relatedness of outcomes/objectives sets? (3) Can the LSA cosine cosine: see trigonometry.

See sine.

COSINE - Cooperation for Open Systems Interconnection Networking in Europe. A EUREKA project.
 ranges be evaluative--indicating strong, medium, weak or minimal levels?


In this study, we use latent semantic analysis to determine the relatedness of sample sets of outcomes/objectives for programs across the university. The university uses a web-based program assessment software that allows program flexibility for construction of outcomes/objects, with related measurements, findings and actions, while providing an overall picture of all units for internal and external reporting. Forty-two program outcomes/program sets are used in this exploratory study, including an equal number of bachelor's and master's degree master's degree
An academic degree conferred by a college or university upon those who complete at least one year of prescribed study beyond the bachelor's degree.

Noun 1.
 programs and a variety of programs in the sciences, humanities and social sciences. The method is described in Foltz (1996) and more completely summarized in readings at As previously mentioned, LSA is an automatic statistical technique for inferring relations for expected contextual usage of words in passages of text. This application of LSA takes as its input only raw text parsed into words, defined as unique character strings and separated into groupings (sets of outcomes/objectives.) It uses no humanly hu·man·ly  
1. In a human way.

2. Within the scope of human means, capabilities, or powers: not humanly possible.

 constructed dictionaries, semantic networks (data) semantic network - A graph consisting of nodes that represent physical or conceptual objects and arcs that describe the relationship between the nodes, resulting in something like a data flow diagram. , grammars, syntactic Dealing with language rules (syntax). See syntax.  parsers, etc. Inference of semantic relatedness is from contextual usage. The first step is to select the programs for comparison. Using LSA software, the second step is to generate a matrix of occurrences of each word in each document. Text is represented as a matrix with each row standing for a unique word and each column stands for a text passage. Each cell contains the frequency with which the word of its row appears in the passage denoted by its column.

Next, the cell entries undergo a preliminary transformation with each cell frequency weighted by a function that expresses both the word's importance in the particular passage and the degree to which the word type carries information in the discourse domain in general. LSA applies singular value decomposition In linear algebra, the singular value decomposition (SVD) is an important factorization of a rectangular real or complex matrix, with several applications in signal processing and statistics.  (SVD (Simultaneous Voice and Data) The concurrent transmission of voice and data by modem over a single analog telephone line. The first SVD technologies on the market were Multi-Tech's MSP, Radish's VoiceView, AT&T's VoiceSpan and the all-digital DSVD, endorsed by ) to the matrix. This is similar to factor analysis. In SVD, the rectangular matrix is decomposed de·com·pose  
v. de·com·posed, de·com·pos·ing, de·com·pos·es
1. To separate into components or basic elements.

2. To cause to rot.

 into the product of three other matrices. One component matrix describes the original row entities as vectors of derived orthogonal At right angles. The term is used to describe electronic signals that appear at 90 degree angles to each other. It is also widely used to describe conditions that are contradictory, or opposite, rather than in parallel or in sync with each other.  factor values, another describes the original column entities in the same way, and the third is a diagonal matrix Noun 1. diagonal matrix - a square matrix with all elements not on the main diagonal equal to zero
square matrix - a matrix with the same number of rows and columns

scalar matrix - a diagonal matrix in which all of the diagonal elements are equal
 containing scaling values. SVD decomposes the word-by-document matrix into a set of k orthogonal factors (generally 100 to 300) from which the original matrix can be approximated by a least-squares best fit. The result of the SVD analysis is a k-dimensional vector space vector space

In mathematics, a collection of objects called vectors, together with a field of objects (see field theory), known as scalars, that satisfy certain properties.
 containing a vector for each term and each document. The vector location reflects the correlations in their use across documents. The location of document vectors reflects correlations in the terms used in the documents. In this space, the cosine between vectors corresponds to estimate semantic similarity. The theoretical cosine range can be from +1.00 to-1.00, although in practicality, even two completely different English language English language, member of the West Germanic group of the Germanic subfamily of the Indo-European family of languages (see Germanic languages). Spoken by about 470 million people throughout the world, English is the official language of about 45 nations.  texts will, at most, fall into the teens. Our concern is generally at the other end of the spectrum. A higher cosine (closer to 1.00) means higher shared meanings. For a visual representation of the cosine and vector relationship, see In short, determining the cosine of vectors of two pieces of textual information allows us to determine the semantic similarity between them.


We select outcomes/objective listings from the academic programs in the university assessment database to conduct several LSA explorations into outcomes/objectives relatedness.

Research question 1 ask: Can LSA provide a metric to indicate relatedness of outcomes/objective sets? Yes. The exploration indicates the LSA metric can show relatedness of outcomes/objectives sets across a variety of university academic programs. The cosign cosign v. to sign a promissory note or other obligation in order to share liability for the obligation.  of each program with every other program in the sample averaged .48. The range of the average cosigns is .58 to .15. The cosine represents the angles of the vectors and corresponds to estimate of semantic similarity. A higher cosine (closer to 1.00) means higher shared meanings. In this exploration, LSA identifies relatedness of the outcomes sets to other programs. On the upper end of the spectrum, Computer Science has an average cosign of .58 with all other programs and a range of .78 to a low of .17. The cosign is above .60 with 21 of the 41 other programs; the cosign is above .50 with 32 of the programs. Only 3 programs have a cosign of below .40. Also on the upper end of the spectrum, Urban Studies has an average cosign of .56 with all other programs and a range of .77 to a low of. 15. The cosign above .60 occurs with 18 of the 41 other programs. In a similar pattern, only 2 programs have a cosign of below .40. On the lowest end of the spectrum, African American Studies African American studies (also known as Black studies and/or Africana studies) is an interdisciplinary academic field devoted to the study of the history, culture, and politics of African Americans.  (BA) and Biology (BA) have each have average cosigns of .15--little relatedness to other program outcomes/objectives. African American Studies ranged from .07 to .22; Biology ranged from .07 to .21. Overall, 25 (59.5%) of the programs have average cosign relatedness of .50 or above; 13 (30.5%) in the range of .40 to .49; 2 (10%) in the range of .30 to .39; and 2 (10% in the range of .10 to .19. This exploration shows the LSA metric can show relatedness of outcomes/objectives sets across a variety of university academic programs.

Research question 2 asks: Do LSA metrics provide a wide enough range to discriminate semantic relatedness of outcomes/objectives sets? Yes. As previously mentioned, the LSA metric can provide a range to discriminate semantic relatedness of outcomes/objectives sets of individual academic programs. The ranges not only show comparisons among individual programs, but can also indicate varying levels of categories of programs. For example, when comparing the average mean cosigns of master's and bachelor's programs, the master's level is .51 and the bachelor's is .45. 67% of the master's programs have relatedness measures to each other above .50 compared with 33% of the bachelor's programs. A second example compares relatedness among science, humanities and social science programs. The average mean cosign of social sciences is .53 (with 73% of the cosigns above .50), for humanities .49 (with 67% above .50) and sciences .46 (44% above .50).

Research question 3 asks: Can the LSA cosine ranges be evaluative--indicating strong, medium, weak or minimal levels? Yes. The exploration indicates the LSA cosign ranges can be evaluative indicating strong, medium, weak or minimal levels. Beyond using average cosigns, the evaluative ranges of semantic relatedness can be shown by looking at the cosigns comparing individual master's and bachelor's programs. Those are: History (.96); Math (.92); Accounting (.85); Criminal Justice (.83); Economics (.78); Art History (.77); Computer Science (.77); Information Science (.73); Interior Design (.68); English (.37); Biology (.16). By looking at the narrative Math outcomes/objectives, it is clear the semantic relatedness to each other is great.


1. Students will be able to think creatively through conjecturing, problem solving problem solving

Process involved in finding a solution to a problem. Many animals routinely solve problems of locomotion, food finding, and shelter through trial and error.
 and/or computer simulations both independently and in directed research

2. Students will be able to analyze and write mathematical arguments and proofs with professional competence and refine symbolic calculation skills beyond the basic undergraduate level

3. Students will be able to read and interpret mathematical literature, including technical articles within a particular mathematical sub-field and to write or orally present mathematics with professional competence

4. Students will be able to use technology, including specialized computational and graphics software, to test validity of certain conjectures This is an incomplete list of mathematical conjectures. They are divided into four sections, according to their status in 2007.

See also:
  • Erdős conjecture, which lists conjectures of Paul Erdős and his collaborators
  • Unsolved problems in mathematics
, for solving problems and for mathematical experimentation and research

5. Graduates will be able to pursue goals or careers in education or industry that are consistent with and benefit from one's mathematical education at the Master's level


1. Students will be able to improve their creative thinking through problem solving and/or computer simulations

2. Students will develop basic mathematical skills such as symbolic calculations and the ability to write proofs that can be effective in the analysis and solution of problems

3. Students will be able to read and interpret mathematical literature and to write/present mathematics commensurate with their level of technical sophistication so·phis·ti·cate  
v. so·phis·ti·cat·ed, so·phis·ti·cat·ing, so·phis·ti·cates
1. To cause to become less natural, especially to make less naive and more worldly.


4. Students will learn to use technology effectively in solving problems and for mathematical experimentation where appropriate

5. Students will be able to pursue goals or career in secondary education or in industry commensurate with the preparation that comes with a BS degree in mathematics, or to proceed to graduate study for increased career opportunities in mathematical and scientific fields in academia, government or in the industry.

Similarly, the History outcomes/objectives, written in a very different style from Mathematics, also demonstrate high semantic relatedness to each other (.96)

HIST interj. 1. Hush; be silent; - a signal for silence.  (MA)

1. Ability to comprehend Historical Arguments and Interpretations

2. Ability to Comprehend Scholarly Popular Works

3. Development of Research and Writing Skills

4. Enhanced Historical Consciousness

5. Familiarity with Research Methodologies


1. Ability to comprehend Historical Arguments and Interpretations

2. Ability to Comprehend, Summarize, Analyze, Evaluate Scholarly Popular Works

3. Methodology, Development of Research Skills, Interpretation of Evidence

4. Development of Writing Skills and Historical Consciousness

On the other hand, Biology has low relatedness. A review of the narratives indicate problems-previously identified with the LSA metric.

BIOL Biological

1. A combination of coursework coursework

work done by a student and assessed as part of an educational course

Noun 1. coursework - work assigned to and done by a student during a course of study; usually it is evaluated as part of the student's
 and thesis research will ensure that students are thoroughly trained within their chosen sub-discipline of Biology.

2. Upon completion of the graduate degree, students will be competitive for employment in laboratories, consulting firms Noun 1. consulting firm - a firm of experts providing professional advice to an organization for a fee
consulting company

business firm, firm, house - the members of a business organization that owns or operates one or more establishments; "he worked for a
, and local/state/federal agencies. Further, students will be competitive for admission to professional programs and advanced degree programs with graduates from our peer institutions and other Virginia universities Virginia University could refer to one of several universities in the Commonwealth of Virginia, a state in the southern United States:
  • Virginia University of Lynchburg, a historically black university in Lynchburg.

3. Our MS program students will complete their MS degree in a timely manner and with a high degree of academic success.

4. Students will publish their theses in peer-reviewed scientific journals.

5. Students will present their thesis research at regional and national scientific conferences.


1. Students will have a broad foundation in Biology and the other Life Sciences.

2. Students will be able to enter professional and graduate programs.

3. Students will be able to think critically and analytically from their exposure to research and scholarship.

4. Students' learning will be facilitated through the provision of flexible scheduling.

5. Students will be able to use the urban environment as a resource rich laboratory.

6. Students will have ready access to information on scheduling and curriculum planning.

If we look at the LSA literature, Martin (2004) found threshold levels Noun 1. threshold level - the intensity level that is just barely perceptible
intensity, intensity level, strength - the amount of energy transmitted (as by acoustic or electromagnetic radiation); "he adjusted the intensity of the sound"; "they measured the
 of mid-40s and below for a known "unrelated" news release compared with news articles. At the higher end Coordinates:
For other places with the same name, see Billinge.
Higher End or Billinge Higher End is a district of the Metropolitan Borough of Wigan, in Greater Manchester, England.
, the range of LSA cosines ran from strong impact (.86), to medium impact (.72) and to weak coherence (.63). Blackmon, Kitajima and Polson (2003) set a .60 cosine minimum level indicating relatedness of navigational tags on web sites. The exploration initially demonstrates that although two outcomes/objectives sets are written very differently, LSA can provide a reliable metric measurement of relatedness of the sets.


The LSA metric can take a large number of items, such as all program outcomes/objectives, and quickly identify those which may be outliers for focus and review. 59.5% of the programs tested have outcome/objective sets with strong semantic similarity. Another 30.5% are somewhat similar. Overall, the LSA metric identifies 10% as very low in semantic relatedness. When reviewing narratives for the programs with stronger and weaker semantic relatedness, outcomes/objectives sets in need of revision are reflected with the cosign numbers. It is clear from the exploratory study that LSA has the ability to differentiate levels of relatedness based on the context of words, regardless of words in charter strings. Because of the limitations of an exploratory study, however, additional research will be needed to verify and clarify LSA usage, especially the gradation gradation: see ablaut.  of cosine levels.


Blackmon, Marilyn Hughes, Kitajima, Muneo and Polson, Peter G. (2003) "Repairing Usability Problems Identified by the Cognitive Walkthrough The Cognitive Walkthrough method is a usability inspection method used to identify usability issues in a piece of software or web site, focusing on how easy it is for new users to accomplish tasks with the system.  for the Web." CHI letters 5:1 p. 498. English/PAPERS(E)/CHI2002WSBKPL.html

Deerwester, Scott, Dumais, Susan T., Furnas, George W., Landauer, Thomas K., and Harshman, Richard (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41 (6), 391-407.

Foltz, Peter W. (1996). Latent semantic analysis for text-based research. Behavior research methods, instruments and computers. 28(2), 197-202-

Foltz, Peter W., Kintsch, Walter and Landauer, Thomas K. (1998). The measurement of textual coherence with Latent Semantic Analysis. Discourse Processes, 25, 285-307.

IES Math Education and Technology. Visited July 1, 2004.

Landauer, Thomas K., and Dumais, Susan T. (1997) A solution to Plato's problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211-240.

Landauer, Thomas K. (2002). Applications of Latent Semantic Analysis. Paper presented at 24th Annual Meeting of Cognitive Science cognitive science

Interdisciplinary study that attempts to explain the cognitive processes of humans and some higher animals in terms of the manipulation of symbols using computational rules.
 Society, August 9, 2002.

Landauer, Thomas K., Foltz, Peter W., and Laham, Darrell (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259-284.

Lemaire, Benoit and Dessus, Philippe (2004). Readings in Latent Semantic Analysis for Cognitive Science and Education. Visited August 16, 2004.

Martin, Jr., Ernest F. (2004). News Release Flow-Through: News Release/News Article LSA Metric. Paper presented at Association for Education in Journalism and Mass Communication The Association for Education in Journalism and Mass Communication, or AEJMC, is a major international membership organization for academics in the field, offering regional and national conferences and refereed publications. , Public Relations public relations, activities and policies used to create public interest in a person, idea, product, institution, or business establishment. By its nature, public relations is devoted to serving particular interests by presenting them to the public in the most  Division, Toronto.

National Institute of Technology in Liberal Education. Visited July 1, 2004.

SALSA salsa (säl`sə, sôl`–), American popular music developed largely in New York City during the 1970s; its name is derived from the Spanish word for hot sauce.  Lab, University of Colorado University of Colorado may refer to:
  • University of Colorado at Boulder (flagship campus)
  • University of Colorado at Colorado Springs
  • University of Colorado at Denver and Health Sciences Center
  • University of Colorado system
 at Boulder.

Turner, Althea A. and Edith Greene (1987). The propositional analysis system (Tech. Rep. No. 87-02). Boulder: University of Colorado, Institute of Cognitive Science.

Martin, Ph.D. is Associate Professor of Public Relations in the School of Mass Communications.
COPYRIGHT 2004 Rapid Intellect Group, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2004, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion




Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:latent semantic analysis
Author:Martin, Ernest F., Jr.
Publication:Academic Exchange Quarterly
Date:Sep 22, 2004
Previous Article:Chicana/o education and service learning.
Next Article:Co-teaching: collaboration at the middle level.

Related Articles
Assessing Students' Course-Related Attitudes Using Keller's Model of Academic Motivation.
The psychometric benefits of soft-linked items: a reply to Pope and Harley.
The relationship between sense of coherence and career thoughts. (Articles).
Educators are constantly searching for effective ways to improve student learning and enhance student outcomes.
Modules and information retrieval facilities of the Human Use Regulatory Affairs Advisor (HURAA).
DMSO designated as DoD's lead standardization activity for modeling and simulation.
PERSO: towards an adaptive e-learning system.
Topophilia and the quality of life.
Joint tactical targeting for base security in Iraq.

Terms of use | Copyright © 2014 Farlex, Inc. | Feedback | For webmasters