Computing tools to glean data efficiently.In our highly digitized world, the average person generates enough information to fill several CDs a year. Now think how much data a large organization, such as a governmental agency, produces on a daily basis. The amounts of emails, presentations, spreadsheets and other products multiply at an exponential rate.
All of that information is stored, but if you ever had to go find a particular bit of data, how would you begin to sift through the meaningless zeroes and ones to get to the proverbial needle in a haystack For the epidode of the TV series House, see .
A needle in a haystack is an English idiom that refers to an object (or a person) that is difficult to find because it is lost, mixed in, or buried within a much larger space, mass, crowd, or group of some other objects. ?
The problem is not a new one, but it is becoming more critical as the amount of information being produced, collected and stored far exceeds its capabilities to be processed and analyzed. Developments in data mining software to help analysts sort through the avalanche of information cannot keep pace with innovations in data storage devices that can accommodate thousands of giga-bytes. In the intelligence sector and the defense weapons testing communities in particular, the lack of analytical tools to search through and understand the sea of information is being felt pointedly. Both industries sop up voluminous quantities of data daily and have similar challenges in searching for the diamonds in the rough.
Military testers, analysts and engineers encounter haystacks Haystacks can be:
"The human is definitely the choke point In military strategy, a choke point (or chokepoint) is a geographical feature (such as a valley or defile) which forces an army to go into a narrower formation (greatly decreasing combat power) in order to pass through it. ," says Dr. James A. Wall, director of the computing and information technology division in the Texas Center for Applied Technology--the research arm of Texas A&M University's Texas Engineering Experiment Station The Texas Engineering Experiment Station (TEES) is the engineering research agency of the State of Texas and a member of The Texas A&M University System. .
"We can collect and store information at rates we never have. But if we can't take advantage of it, it's a limiting factor A factor or condition that, either temporarily or permanently, impedes mission accomplishment. Illustrative examples are transportation network deficiencies, lack of in-place facilities, malpositioned forces or materiel, extreme climatic conditions, distance, transit or overflight rights, ," he says.
Before building new weapons technologies, the Defense Department virtually constructs and tests concepts in simulations. In a recent test event for the Future Combat Systems, the Army's Operational Test Command at Fort Hood Fort Hood, U.S. army post, 209,000 acres (84,580 hectares), central Tex., near Killeen; est. 1942 on the site of old Fort Gates and named for Confederate Gen. John Hood. It is one of the army's largest installations and a major employer of the area. collected more than 23 terabytes of network data.
A terabyte is 1,000 gigabytes. Or put another way, a terabyte of the letter "A" typed consecutively in 12-point Courier font would form a chain long enough to circumnavigate cir·cum·nav·i·gate
tr.v. cir·cum·nav·i·gat·ed, cir·cum·nav·i·gat·ing, cir·cum·nav·i·gates
1. To proceed completely around: circumnavigating the earth.
2. the Earth's equator 63 times, says Wall.
When FCS--the Army's digitally connected fleet of combat systems--goes into full testing, it could generate up to 100 terabytes of data each month.
"That's a lot of data," says Wall. Before the Army can begin to construct the system, it must analyze the test information to look for design flaws and other problems. Culling culling
removal of inferior animals from a group of breeding stock. The removal is premature, i.e. before completion of its life span, disposal of an animal from a herd or other group. through so much data with available software could take years--a luxury the service does not have.
To help solve the problem, a team in Wall's division is working with the Army to build a framework for collecting data, organizing it and tying it to new data visualization See information visualization. methods to glean more information.
"By improving the capability to navigate and interactively manipulate and explore data, we can enable the analyst to engage in a 'discourse' with the data," write J.J. Thomas and K.A. Cook in an article on visual analytics, a science of analytical reasoning supported by visual, and often interactive, interfaces.
Supporters of visual analytics argue that by improving the state of data visualization tools in these areas, scientists can increase the likelihood that important pieces of information buried within massive databases will be recognized in time, they say.
Wall's team is in its initial year of a potentially three-year project with the test command to deal with the data proliferation Data proliferation refers to the unprecedented amount of data, structured and unstructured, that business and government continue to generate at an unprecedented rate and the usability problems that result from attempting to store and manage that data. and mining issue. It will provide the testing community with software tools and an architecture that will allow for the insertion of visual analytic software as needed as needed prn. See prn order. in the future.
"This software will not only allow data collectors and analysts to identify and integrate new visualization methods customized for individual test requirements, but will also provide an environment in which users can collaborate by sharing visual products from a given dataset," says the team's report.
Initially, the software will provide the command with interactive visualization Interactive visualization is a branch of graphic visualization in computer science that studies how humans interact with computers to create graphic illustrations of information and how this process can be made more efficient. of the large amounts of network traffic data generated during the testing of FCS FCS - Frame Check Sequence components.
"This technology has great applicability to any high data volume environment," the report says.
In the future, quantum computing may solve many of the large data processing issues, says Wall. While quantum technology has only seen limited applications, a few rudimentary small-scale prototypes have been built to prove some of the claims, he adds.
Please email your comments to GJean@ndia.org