This page is updated regularly, please send your suggestions to: demchenko@terena.nl
Search Engines News
http://searchenginewatch.com/news.html
Current Search Engine Report
http://searchenginewatch.com/sereport/current.html
Search Engine Size
http://www.searchenginewatch.com/reports/sizes.html
News at Web Site Search Tools
http://www.searchtools.com/info/news.html
Results from our Site Search Tools Survey!
http://www.searchtools.com/surveys/survey-results-01.html
First results from our search tools survey are in, and they're interesting!
Most web administrators who haven't installed a site search say it's because
they don't have time or the applications are too complex. Those who have
cite improved navigation as their number one reason, by far. More surprising
results come from sites aimed towards information professionals (many don't
have search), and sites with three or more languages (they have search).
Websearch.miningco.com weekly
http://websearch.miningco.com/library/weekly/topicmenu.htm?pid=2825&cob=home
New largest Search Engine Alltheweb.com launched by FAST Search & Transfer
August 2, 1999 FAST (Fast Search & Transfer) has launched a new
site called Alltheweb ("FAST Search: All the Web, All the Time") http://www.alltheweb.com/.
The announced size of their index is more than 200 millions pages that
is estimated as 25% of all web. See more
information.
NREN Search and Index Services
German Web Index
http://www.fireball.de/
Metagenerator - http://www.fireball.de/metagenerator.html
Metadata scheme - http://www.fireball.de/meta_daten.html
Fireball was developed by FLP/KIT - http://flp.cs.tu-berlin.de/
KIT - http://flp.cs.tu-berlin.de/kit/kit.html
Swiss search service
http://www.search.ch/
Allows metadata search - http://www.search.ch/help.html.en
Nordic Web index
http://nwi.ub2.lu.se/?lang=en
Special purposes Search Engines
US Government Search Engine Launched
A new search engine that focuses on information from US government
sources was opened in May. Called Gov.Search, the service is jointly produced
by search engine Northern Light and the U.S. Commerce Department's National
Technical Information Service through a five-year agreement.
The service is unusual for the web in that searching is not free. Those
wishing to use it must pay for access, which ranges from US $15 for a day
pass, $30 for a monthly pass or $250 for a year. Special pricing is also
available to companies and organizations that require multiple accounts.
Northern Light has now indexed about 4 million web pages located on
more than 20,000 US government servers, which also include military and
some educational sites. In addition to this information, it has also indexed
about 2 million specialty records from the NTIS.
http://searchenginewatch.com/sereport/99/06-govsearch.html
Gov.Search
http://www.usgovsearch.com
Google US Government Search
http://www.google.com/unclesam
Google has its own US government search service. Test queries show
it to be much smaller than Northern Light's index, yielding only 10 to
50 percent of Northern Light's counts. But the relevancy of some of the
matches was impressive. Definitely worth a visit.
Cora Search Engine
http://www.cora.justresearch.com/about.html
Cora is a special-purpose search engine covering computer science research
papers.
Northern Light Adds Research Options
Northern Light now also operates a "research" version of its service,
where the default is to search within its Special Collection index. This
index has information from over 5,400 publications, much of which is not
available on the web. Searching is for free, and then documents can be
purchased for between $1 and $4.
Titles can be downloaded from http://www.northernlight.com/docs/specoll_help_download.html
http://searchenginewatch.com/sereport/99/06-northernlight.html
Northern Light Research Version
http://www.nlresearch.com/
(http://www.northernlight.com/research.html
)
Northern Light Special Editions
http://special.northernlight.com/
Research Service at HotBot
http://r.hotbot.com/r/hb_also_rsrch/http://www.elibrary.com/s/hotbot/
"Invisible Web" Revealed
Lycos and IntelliSeek have teamed up to produce an index of search
databases to help users find information that is invisible to search engines.
The "Invisible Web Catalog" provides links to more than 7,000 specialty
search resources. Users can browse listings or search Lycos index base.
http://searchenginewatch.com/sereport/99/07-invisible.html
Lycos Invisible Web Catalog
http://dir.lycos.com/Reference/Searchable_Databases/
IntelliSeek
http://www.intelliseek.com/
Direct Search
http://gwis2.circ.gwu.edu/~gprice/direct.htm
Catalog of specialty databases. Search inside particular database.
WebData
http://www.webdata.com/
Guide to searchable databases. Browse or search through listings.
Northern Light Adds clustering
This is to prevent domination of results from one site.
In addition to pages index NL provides list of Custom Search Folders
™ created/generated of clustered search data by group of servers
of type of pages.
http://www.northernlight.com/docs/search_help_folders.html
Navigate web smarter and easier with Alexa
http://www.alexa.com/
Netscape's keywords service
http://home.netscape.com/escapes/keywords/
Report on the 1999 Search Engines Meeting
by Avi Rappoport, Search Tools Consulting
http://www.searchtools.com/info/meetings/searchenginesmtg/index.html
The main questions discussed:
Natural Language Processing & Information Retrieval (NLPIR) group
of ITL NIST (http://www.itl.nist.gov/iaui/894.02/)
Valuable information. Publications http://www.itl.nist.gov/iaui/894.02/works.html
Information on DARPA TIPSTER Text Program http://www.itl.nist.gov/iaui/894.02/related_projects/tipster/
http://www.itl.nist.gov/iaui/894.02/related_projects/tipster_summac/final_rpt.html
IBM Patents Network -
http://www.patents.ibm.com/
Lycos holds patent 5,748,954
(http://www.patents.ibm.com/details?pn=US05748954__&s_clms=1#clms
), which covers roughly any kind of web spider that heuristically downloads
"better" documents before "worse" documents, and explicitly includes a
reference to looking at how often a document is linked as a goodness heuristic.
TUSTEP (TUebingen System of Text Processing Programs)
Munltilingual Textdata Processing and Fuzzy Searching
http://www.uni-tuebingen.de/zdv/tustep/tdv_eng.html
Web Site Search Tools
http://www.searchtools.com/
Web Site Search Tools - Related Topics
Search Tools Product Listings
http://www.searchtools.com/tools/tools.html
Free Indexing and Searching Software
Harvest-NG
Harvest, an open-source project, has been re-implemented in Perl and
can summarize documents in SOIF (Summary Object Interchange Format). This
version saves the data in a database file and does not include a Broker
or search engine, but it is entirely extensible.
http://www.tardis.ed.ac.uk/harvest/ng/
http://www.tardis.ed.ac.uk/harvest/ng/develop.shtml
The Combine System for disributed indexing
http://www.lub.lu.se/combine/
http://www.ub.lu.se/~tsao/combine/
Zebra Information Server
Powerful free-text indexing and retrieval system, combined with a Z39.50
server. The Zebra server is freely available for noncommercial applications.
http://www.indexdata.dk/zebra/
Framework for Advanced Search (ASF)
http://asf.gils.net/framework.html
ASF Freeware
http://asf.gils.net/freeware/index.html
OCLC Z39.50 freely reusable code (C and Java)
http://www.oclc.org/z39.50/#api
Perlfect Search 3.01
http://perlfect.com/freescripts/search/
PLWeb Turbo has released a new version, 3.0 for Windows NT with improved
performance, customization, web-crawling capability, and a browser-based
interface.
PLWeb and all PLS products are now freeware from AOL.
http://www.pls.com/plweb.htm
http://www.searchtools.com/tools/plweb.html
AltaVista (Windows NT and Unix search tool) has just introduced a free
version of AltaVista Search Intranet, Entry Level, which will index up
to 3,000 pages.
http://k2.altavista-software.com/intranet/3000_version/3000_overview.htm
Ultraseek on Linux
The Ultraseek search engine and the Content Classification Engine now
run on Linux Redhat Linux 5.1 on a PC, Kernel 2.0.34 or better, or glibc
2.0.7-19 or better. Commercial
http://software.infoseek.com/products/ultraseek/ultratop.htm
Download free trial version
http://software.infoseek.com/download/download.htm
http://www.searchtools.com/tools/ultraseek.html
Ultraseek Content Classification Engine Product Information
Commercial.
http://software.infoseek.com/products/cce/ccetop.htm
http://software.infoseek.com/products/cce/ccekey.htm
Super Site Searcher Perl CGI works with other modules to create searchable
site directory. Commercial.
http://www.hassan.com/site_searcher/
http://www.searchtools.com/tools/supersitesearcher.html
Extense - a powerful search engine developed in France which uses the
syntactic declination of French words (masculine/feminine and singular/plural).
Commercial.
http://www.searchtools.com/tools/extense.html
Inxight LinguistX code library - provides language identification, stemming
and tokanization, among other features.
http://www.searchtools.com/tools/inxight.html
http://www.inxight.com/
A collection of componants for many languages that provide word and
phrase analysis, stemming, tokanization, parts of speech analysis, noun
phrase extraction, language identification, summarization, etc.
Platform: Windows 95 and NT, Solaris Sparc (will port to other Unix
systems). Commercial.
Verify products
http://www.verity.com/products/index.html
Knowledge Retrieval products
http://www.verity.com/products/knowret1.html
Search Engines links
http://searchenginewatch.com/links/
Contains such sections:
Search Tips and Tricks Advanced Searching
http://websearch.tqn.com/msub21.htm?pid=2825&cob=home
http://websearch.miningco.com/msub21.htm?pid=2825&cob=home
Information Retrieval systems
http://www.mri.mq.edu.au/%7Eeinat/web_ir/software.html
Top search words and terms
http://www.searchenginewatch.com/facts/searches.html
Ask Jeeves Peak Through The Keyhole http://www.askjeeves.com/docs/peek/
Weekly Search Engine Keyword Statistics For Web and Internet Marketing
http://www.mall-net.com/se_report/
Dogpile Top 200 Search Words
http://www.eyescream.com/dogpiletop200.htm
Top words from the meta-search engine Dogpile from January to July
1997. Unfortunately, the actual keyword phrases are not shown.
Search Spy
http://www.searchspy.com/
This is a database of search terms available for desktop use. You enter
a term, and the program scans to find matches. You can sort results by
count or by keyword. Data is gathered from various live search displays.
Life on the Internet, Finding Things
http://www.screen.com/start/guide/searchengines.html
useit.com: Jakob Nielsen's Website
http://www.useit.com/
He formulated new approach in SE - LSD: Logo, Search, Directory.
IBM's CLEVER Searching
http://www.almaden.ibm.com/cs/k53/clever.html
Web Archeology Project at Digital Research
http://www.research.digital.com/SRC/personal/Krishna_Bharat/WebArcheology/
Contains sections:
The MetaWeb Project
The aim of the Metadata Tools and Services project - known as MetaWeb
- is to develop indexing services, tools, and metadata element sets in
order to promote the use of, and exploitation of metadata on the Internet.
http://www.dstc.edu.au/Research/Projects/metaweb/
DFN Indexing and Searching projects - http://www.dfn.de/links/suchen.html
MetaGer (subject meta search), MESA (email address meta search), Level3
(search service for the DFN-Expo project), Search.de and Entry.de)
X.500 Directory E-mail Addresses Search (AMBIX-D) - http://ambix.uni-tuebingen.de:8889
Research Papers related to Google!
http://google.stanford.edu/google_papers.html
Research Papers related to IBM CLEVER
Searching Project
http://www.almaden.ibm.com/cs/k53/clever.html
TREC Publications
TREC (the Text REtrieval Conference) sponsored by NIST provides a set
of realistic test collections, uniform scoring, unbiased evaluators and
a chance to see the changes and improvements of search engines over time.
Results are in materials of Annual Conferences at http://trec.nist.gov/pubs.html
Retrieval Performance in FERRET: A Conceptual Information Retrieval
System
Michael L. Mauldin
Appeared at The 14th International Conference on Research and Development
in Information Retrieval, Chicago, October 1991, ACM SIGIR.
http://www.fuzine.com/mlm/sigir91.html
Enhancing the World Wide Web
Social Software for the Evolution of Knowledge
http://www.islandone.org/Foresight/WebEnhance/index.html
Learning Webs by J. Bollen, & F. Heylighen,
http://pespmc1.vub.ac.be/LEARNWEB.html
Hebbian learning can be implemented on the web, by changing the strength
of links depending on how often they are used. paper is exploring the "brain"
metaphor for making the web more intelligent. The basic idea is that web
links are similar to associations in the brain, as supported by synapses
connecting neurons. The strength of the links, like the connection strength
of synapses, can change depending on the frequency of use of the link.
This allows the network to "learn" automatically from the way it is used.
Identification, location and versioning of web-resources. URI Discussion
paper. Version 1.0. 12 March 1999
Titia van der Werf-Davelaar
http://www.konbib.nl/donor/rapporten/URI.html
This document is a discussion document for use in developing a consensus
on practical approaches to be pursued for better information management
techniques and methods on the Web.
This work is done in the context of the following projects: DONOR,
DESIRE, NEDLIB.
Report on the WWW8 conference by Nicky Ferguson
http://www.ilrt.bris.ac.uk/~ecnf/www8.html
Semantic Web vision paper
Alexander Chislenko. - Version 0.28 - 29 June, 1997
http://www.lucifer.com/~sasha/articles/SemanticWeb.html
Lycos GENERAL TERMS AND CONDITIONS -
http://www.lycos.com/lycosinc/legal.html