Return-Path: <sentto-279987-5137-1028726718-fc=all.net@returns.groups.yahoo.com> Delivered-To: fc@all.net Received: from 204.181.12.215 [204.181.12.215] by localhost with POP3 (fetchmail-5.7.4) for fc@localhost (single-drop); Wed, 07 Aug 2002 06:28:07 -0700 (PDT) Received: (qmail 25774 invoked by uid 510); 7 Aug 2002 13:24:07 -0000 Received: from n24.grp.scd.yahoo.com (66.218.66.80) by all.net with SMTP; 7 Aug 2002 13:24:07 -0000 X-eGroups-Return: sentto-279987-5137-1028726718-fc=all.net@returns.groups.yahoo.com Received: from [66.218.66.94] by n24.grp.scd.yahoo.com with NNFMP; 07 Aug 2002 13:25:19 -0000 X-Sender: fc@red.all.net X-Apparently-To: iwar@onelist.com Received: (EGP: mail-8_0_7_4); 7 Aug 2002 13:25:18 -0000 Received: (qmail 57513 invoked from network); 7 Aug 2002 13:25:16 -0000 Received: from unknown (66.218.66.216) by m1.grp.scd.yahoo.com with QMQP; 7 Aug 2002 13:25:16 -0000 Received: from unknown (HELO red.all.net) (12.232.72.152) by mta1.grp.scd.yahoo.com with SMTP; 7 Aug 2002 13:25:15 -0000 Received: (from fc@localhost) by red.all.net (8.11.2/8.11.2) id g77DPSj28508 for iwar@onelist.com; Wed, 7 Aug 2002 06:25:28 -0700 Message-Id: <200208071325.g77DPSj28508@red.all.net> To: iwar@onelist.com (Information Warfare Mailing List) Organization: I'm not allowed to say X-Mailer: don't even ask X-Mailer: ELM [version 2.5 PL3] From: Fred Cohen <fc@all.net> X-Yahoo-Profile: fcallnet Mailing-List: list iwar@yahoogroups.com; contact iwar-owner@yahoogroups.com Delivered-To: mailing list iwar@yahoogroups.com Precedence: bulk List-Unsubscribe: <mailto:iwar-unsubscribe@yahoogroups.com> Date: Wed, 7 Aug 2002 06:25:28 -0700 (PDT) Subject: [iwar] [fc:NSF,.Intelligence.Community.Work.on.Data-Mining.Research] Reply-To: iwar@yahoogroups.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, hits=3.2 required=5.0 tests=RISK_FREE,FREE_MONEY,DIFFERENT_REPLY_TO version=2.20 X-Spam-Level: *** NSF, Intelligence Community Work on Data-Mining Research By Jay Wrolstad NewsFactor Network August 02, 2002 <A HREF="http://sci.newsfactor.com/perl/story/18872.html"<a href="http://sci.newsfactor.com/perl/story/18872.html">http://sci.newsfactor.com/perl/story/18872.html> The NSF is working with the CIA's technology branch to develop data-mining techniques in order to analyze communications and hopefully prevent terrorist activity. The work will involve detection of specific keywords and topics across a variety of media. Prompted by homeland security issues brought to the fore by the September 11th terrorist attacks, the U.S. intelligence community and the <A HREF="http://www.nsf.gov/"National Science Foundation</A (NSF) are researching innovative data-mining techniques designed primarily to aid law enforcement agencies at various levels. Some US$8 million from the Intelligence Technology Innovation Center (ITIC), which is under the Central Intelligence Agency's administrative umbrella but is funded separately, will be spent to develop data-mining techniques that can extract underlying patterns -- and create predictive abilities -- from massive sets of data, such as television broadcasts and Web pages. Real-Time Pattern Recognition Gary Strong, program officer for NSF's Directorate for Computer & Information Sciences and Engineering (CISE), told NewsFactor that the research will involve experts in computer science and will focus on two areas: data streams and data sharing. "With audio and video streaming there is little hope of saving information because the databases are constantly in flux and you have to make real-time decisions on what to save," said Strong. Consequently, researchers will work on "mining" underlying patterns and trends while pinpointing changes in those patterns. This work will involve both topic and word "spotting," or detecting specific words or word clusters. Data-Sharing Policies Because the intelligence community and law enforcement agencies have traditionally lacked the capacity or legal authority to share data, this research will evaluate new policies for sharing that incorporate "probable cause" conditions, said Strong. Efforts to use government-owned databases in a coordinated way currently present problems because of incompatibility among the databases, not to mention privacy restrictions. Developing data-mining techniques within these constraints is a challenge regardless of national security implications, he added. "We now have an opportunity to develop a way to allow searches of protected information, such as medical records, while protecting privacy of the data," Strong noted. Cooperative Agreement Besides national security, other applications for the research range from natural disaster response to bioinformatics, which involves searching through large numbers of documents to manage biological functions. Cooperation between the ITIC and the CIA is made possible through the interagency Knowledge Discovery and Dissemination (KDD) program. Through KDD, the NSF identifies projects and programs in which research might be related to national security and then consults the research community to focus its efforts, where appropriate, in that direction. An NSF-sponsored workshop held in December identified some 40 potential data-mining projects of interest to the intelligence community. Of those, 15 were chosen to receive funding over the next three years as part of the cooperative venture. Projects Outlined In one chosen project, SRI International will investigate ways to enable machines to recognize individuals by the way they talk, a sophisticated capability that goes far beyond existing voice-recognition technology. Strong said this research includes "talk printing," or identifying the specific ways in which individuals talk, including pauses or speech inflections. In another project, researchers at Columbia University are working on a system to track patterns in data types -- such as broadcast news programs, online chat rooms, e-mail and voice mail -- and then automatically generate a summary of information about a specific event. "They will take large numbers of messages and produce short summaries that take into consideration both time factors and changing news reports to determine the most accurate information," Strong said. Meanwhile, scientists at IBM's T.J. Watson Research Center hope to create a topic-spotting method that can search for a specific area of interest in all languages. ------------------------ Yahoo! Groups Sponsor ---------------------~--> Free $5 Love Reading Risk Free! http://us.click.yahoo.com/09Lw8C/PfREAA/Ey.GAA/kgFolB/TM ---------------------------------------------------------------------~-> ------------------ http://all.net/ Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
This archive was generated by hypermail 2.1.2 : 2002-10-01 06:44:32 PDT