This page is updated regularly, please send your suggestions to: demchenko@terena.nl.
W3C Web Content accessibility initiative (WAI)
Web Content accessibility Guidelines
http://www.w3.org/TR/WAI-WEBCONTENT
Web Architecture: Describing and Exchanging Data
W3C Note 7 June 1999
http://www.w3.org/1999/04/WebData
Building a space where automated agents can contribute - just beginning
to build the Semantic Web. The RDF Schema design and XML Schema design
began independently, proposed common model where they fit together as interlocking
pieces of the semantic web technology.
Composite Capability/Preference Profiles (CC/PP): A user side framework
for content negotiation
W3C Note 27 July 1999
http://www.w3.org/TR/NOTE-CCPP/
In this note we describe a method for using RDF, the Resource Description
Format of the W3C, to create a general, yet extensible framework for describing
user preferences and device capabilities. This information can be provided
by the user to servers and content providers. The servers can use this
information describing the user's preferences to customize the service
or content provided. The ability of RDF to reference profile information
via URLs assists in minimizing the number of network transactions required
to adapt content to a device, while the framework fits well into the current
and future protocols being developed a the W3C and the WAP Forum.
International Layout
W3C Working Draft 26-July-1999
http://www.w3.org/TR/WD-i18n-format/
The following specification extends CSS to support East Asian and Bi-directional
text formatting.
Platform for Privacy Preferences (P3P) Specification
W3C Working Draft 7 April 1999
http://www.w3.org/TR/WD-P3P/
This document describes the Platform for Privacy Preferences (P3P).
P3P enables Web sites to express their privacy practices and enables users
to exercise preferences over those practices.
POIX: Point Of Interest eXchange Language Specification
W3C Note - 24 June 1999
http://www.w3.org/TR/poix/
The "POIX" proposed here defines a general-purpose specification language
for describing location information, which is an application of XML (Extensible
Markup Language). POIX is a common baseline for exchanging location data
via e-mail and embedding location data in HTML and XML documents. This
specification can be used by mobile device developers, location-related
service providers, and server software developers.
Annotation of Web Content for Transcoding
W3C Note 10 July 1999
http://www.w3.org/TR/annot/
This proposal presents annotations that can be attached to HTML/XML
documents to guide their adaptation to the characteristics of diverse information
appliances. It also provides a vocabulary for transcoding, and syntax of
the language for annotating Web content. Used in conjunction with device
capability information, style sheets, and other mechanisms, these annotations
enable a high quality user experience for users who are accessing Web content
from information appliances.
XML Schema Part 1: Structures
W3C Working Draft 6-May-1999
http://www.w3.org/TR/xmlschema-1/
XML Schema: Structures is part one of a two part draft of the specification
for the XML Schema definition language. This document proposes facilities
for describing the structure and constraining the contents of XML 1.0 documents.
The schema language, which is itself represented in XML 1.0, provides a
superset of the capabilities found in XML 1.0 document type definitions
(DTDs.).
XML Schema Part 2: Datatypes
World Wide Web Consortium Working Draft 06-May-1999
http://www.w3.org/TR/xmlschema-2/
This document specifies a language for defining datatypes to be used
in XML Schemas and, possibly, elsewhere.
XHTML™ 1.0: The Extensible HyperText Markup Language
A Reformulation of HTML 4.0 in XML 1.0
W3C Working Draft 5th May 1999
http://www.w3.org/TR/xhtml1/
This specification defines XHTML 1.0, a reformulation of HTML 4.0 as
an
XML 1.0 application, and three DTDs corresponding to the ones defined by
HTML 4.0. The semantics of the elements and their attributes are defined
in the W3C Recommendation for HTML 4.0. These semantics provide the foundation
for future extensibility of XHTML. Compatibility with existing HTML user
agents is possible by following a small set of guidelines.
Document Object Model (DOM) Level 2 Specification
Version 1.0
W3C Working Draft 19 July, 1999
This specification defines the Document Object Model Level 2, a platform-
and language-neutral interface that allows programs and scripts to dynamically
access and update the content, structure and style of documents. The Document
Object Model Level 2 builds on the Document Object Model Level 1 (http://www.w3.org/TR/REC-DOM-Level-1
).
This release of the Document Object Model Level 2 has all of the interfaces
that the final version is expected to have. It contains interfaces for
creating a document, importing a node from one document to another, supporting
XML namespaces, associating stylesheets with a document, the Cascading
Style Sheets object model, the Range object model, filters and iterators,
and the Events object model. The DOM WG wants to get feedback on these,
and especially on the two options presented for XML namespaces, so that
final decisions can be made for the DOM Level 2 specification.
IBM online XML education courses
http://www2.software.ibm.com/developer/education.nsf/xml-onlinecourse-bytitle
IETF Work: Common Indexing Protocol
RFC 2651: The Architecture of the Common Indexing Protocol (CIP)
J. Allen, M. Mealling
ftp://ftp.isi.edu/in-notes/rfc2651.txt
This document describes the CIP framework, including its architecture
and the protocol specifics of exchanging indices.
RFC 2652: MIME Object Definitions for the Common Indexing Protocol (CIP)
J. Allen, M. Mealling
ftp://ftp.isi.edu/in-notes/rfc2652.txt
This document describes the definitions of those objects as well as
the methods and requirements needed to define a new index type.
RFC 2653: CIP Transport Protocols
J. Allen, P. Leach, R. Hedberg
ftp://ftp.isi.edu/in-notes/rfc2653.txt
This document specifies three protocols for transporting CIP requests,
responses and index objects, utilizing TCP, mail, and HTTP.
RFC 2654: A Tagged Index Object for use in the Common Indexing Protocol
R. Hedberg, B. Greenblatt, R. Moats, M. Wahl
ftp://ftp.isi.edu/in-notes/rfc2654.txt
This document defines a mechanism by which information servers can
exchange indices of information from their databases by making use of the
Common Indexing Protocol (CIP). This document defines the structure of
the index information being exchanged, as well as the appropriate meanings
for the headers that are defined in the Common Indexing Protocol.
RFC 2655: CIP Index Object Format for SOIF Objects
T. Hardie, M. Bowman, D. Hardy, M. Schwartz, D. Wessels
ftp://ftp.isi.edu/in-notes/rfc2655.txt
This document describes SOIF, the Summary Object Interchange Format,
as an index object type in the context of the CIP framework.
RFC 2656: Registration Procedures for SOIF Template Types
T. Hardie
ftp://ftp.isi.edu/in-notes/rfc2656.txt
The registration procedure described in this document is specific to
SOIF template types.
RFC 2657: LDAPv2 Client vs. the Index Mesh
R. Hedberg
ftp://ftp.isi.edu/in-notes/rfc2657.txt
LDAPv2 clients as implemented according to RFC 1777 have no notion
on referral. The integration between such a client and an Index Mesh, as
defined by the Common Indexing Protocol, heavily depends on referrals and
therefore needs to be handled in a special way. This document defines one
possible way of doing this.
Uniform Object Locator - UOL
J. Boynton
http://www.ietf.org/internet-drafts/draft-boynton-uol-00.txt
A Uniform Object Locator (UOL) provides a hierarchical "human-readable"
format for describing the location of any single attribute within any data
object. A UOL emulates the internal structure of a data object by dividing
a partial URL into two re-usable components; An object constructor and
an object name.
The UOL format is particularly suited for retrieval and storage of
parameter values through multiple object layers. Its basic construction
allows it to be combined with a URL; without modification. Possible uses
include distributed object management, XML, and e-business development.
Context and Goals for Common Name Resolution
Larry Masinter, Michael Mealling, Nicolas Popp, Karen Sollins
http://www.ietf.org/internet-drafts/draft-popp-cnrp-goals-00.txt
This document establishes the context and goals for a Common Name Resolution
Protocol.
Internationalized Uniform Resource Identifiers (IURI),
Larry Masinter, Martin Duerst
http://www.ietf.org/internet-drafts/draft-masinter-url-i18n-04.txt
Tags for the Identification of Languages
H. Alvestrand
http://www.ietf.org/internet-drafts/draft-alvestrand-lang-tags-v2-00.txt
This document describes a language tag for use in cases where it is
desired to indicate the language used in an information object. It also
defines a Content-language: header, for use in the case where one desires
to indicate the language of document.
RFC 2611: URN Namespace Definition Mechanisms
L. Daigle, D. van Gulik, R. Iannella, P. Faltstrom
ftp://ftp.isi.edu/in-notes/rfc2611.txt
i18n and Multilingual support in Internet mail. Standards Overview.
Yuri Demchenko
http://www.terena.nl/libr/tech/mldoc-review.html
Search Engine Standards Project
http://www.searchenginewatch.com/standards/
Domain Restriction Proposal
http://www.searchenginewatch.com/standards/proposals.html
Standard for Robot Exclusion
http://info.webcrawler.com/mak/projects/robots/norobots.html
Robots META Tag
http://www.searchtools.com/info/robots/robots-meta.html
RFC-2413 Dublin Core Metadata for Resource Discovery
http://www.ietf.org/rfc/rfc2413.txt
Encoding Dublin Core Metadata in HTML
Internet Draft
http://www.ietf.org/internet-drafts/draft-kunze-dchtml-01.txt
Guidance on expressing the Dublin Core within the Resource Description
Framework (RDF)
http://www.ukoln.ac.uk/metadata/resources/dc/datamodel/WD-dc-rdf/
Resource Description Framework - RDF
http://www.ukoln.ac.uk/metadata/resources/rdf/
W3C Resource Description Framework (RDF) Model and Syntax - recommendation
http://www.w3.org/TR/REC-rdf-syntax/
W3C Resource Description Framework (RDF) Schemas - proposed recommendation
http://www.w3.org/TR/PR-rdf-schema/
Resource Description Framework (RDF)
http://www.w3.org/RDF/
Metadata and Resource Description
http://www.w3.org/Metadata/
Dublin Core
http://purl.org/metadata/dublin_core/
Dublin Core Metadata Element Set: Reference Description
http://purl.oclc.org/DC/about/element_set.htm
User Guide Working Draft 1998-07-31
http://purl.oclc.org/DC/documents/working_drafts/wd-guide-current.htm
1999-07-02: Dublin Core Elements, Version 1.1 moves to Proposed Recommendation
The Dublin Core Directorate is pleased to announce that a set of revised
element definitions (Dublin Core Elements, Version 1.1) has been completed
and is for public review and comment as a Proposed Recommendation of the
Dublin Metadata Initiative.
http://purl.org/dc/documents/proposed_recommendations/pr-dces-19990702.htm
CEN/ISSS Workshop on MMI (Metadata for Multimedia Information)
http://www.cenorm.be/isss/Workshop/MMI/Default.htm
CEN/ISSS Metadata Framework, edited by Stewart Granger
http://dialspace.dial.pipex.com/town/way/gkh12/frame/main.html
CEN/ISSS' The European XML/EDI Pilot Project
http://www.cenorm.be/isss/workshop/ec/xmledi/isss-xml.html
The Role of the XML/EDI Guidelines
http://www.cenorm.be/isss/workshop/ec/xmledi/xmlbook.htm
Guidelines for using XML for Electronic Data Interchange, Version 0.05,
25th January 1998
http://www.xmledi.net/guide.htm
The Global Repository Initiative
http://www.xmledi.com/repository/
White Paper on XML Repositories for XML/EDI
http://www.xmledi.com/repository/xml-repWP.htm
Dublin Core/MARC/GILS Crosswalk
Network Development and MARC Standards Office
http://www.loc.gov/marc/dccross.html
Character Set and Language Negotiation (2) in Z39.50
http://lcweb.loc.gov/z3950/agency/defns/charsets.html
Registry of Z39.50 Object Identifiers
http://lcweb.loc.gov/z3950/agency/defns/oids.html
Metadata.Net - Metadata Tools and Services
http://metadata.net/
Meta Data Coalition
http://www.mdcinfo.com/
An Introduction to the Meta Data Coalition's Initiatives
http://www.MDCinfo.com/papers/intro.html
Open Information Model
MDC OIM Version 1.0 review draft, April 1999
http://www.mdcinfo.com/OIM/OIM10.html
OIM proposed models
Knowledge Description Model
http://www.mdcinfo.com/OIM/models/KDM.html
Meta Data Interchange Specification MDIS Version 1.1
http://www.mdcinfo.com/MDIS/MDIS11.html
Metadata/RDF Resources and Publications
Metadata Resources at UKOLN
http://www.ukoln.ac.uk/metadata/resources/
Prototype Metadata Registry for DESIRE project
http://homes.ukoln.ac.uk/~lisrmh/reginfo-v1.htm
RDF Tools - Briefing document
http://www.ukoln.ac.uk/web-focus/events/seminars/what-is-rdf-may1998/rdf-briefing.html
DC News, 1999-08-18
CIMI Announces the release of the Guide to Best Practice: Dublin Core.
The document is one important result of the Dublin Core Testbed, an on-going
effort to explore the usability, simplicity, and technical feasibility
of Dublin Core for museum information. The Guide addresses Dublin Core
1.0 as documented in RFC 2413.
http://www.cimi.org/documents/meta_bestprac_final_ann.html
New Metadata Handbook from European Schoolnet
1st December 1998
http://www.en.eun.org/eng/metadatabook-en.html
Describes extended Metadata element set has been extended with a range
of additional local (sub)elements from other metadata initiatives including
the IMS (http://www.imsproject.org/
- Instructional Management System) and the ARIADNE set (http://ariadne.unil.ch/
- Alliance of Remote Instructional Authoring and Distribution Network for
Europe).
The EUN metadata harmonisation is happening in close co-operation with
EUC (European Universal Classroom) which has been studying DBS/GER (http://dbs.schule.de/indexe.html
- Deutscher Bildungs-Server / German Educational Resources), GEM (http://gem.syr.edu
- The Gateway to Educational Materials) and EdNA (http://www.edna.edu.au/-
Education Network Australia). In the following you will find a guideline
to create and publish metadata, a presentation of the syntax and a thorough
description of each of the EUN elements.
Dave Beckett's Resource Description Framework (RDF) Resources
http://www.cs.ukc.ac.uk/people/staff/djb1/research/metadata/rdf.shtml
Automatic RDF Metadata Generation for Resource Discovery
Charlotte Jenkins, Mike Jackson, Peter Burden, Jon Wallis
http://www.scit.wlv.ac.uk/~ex1253/rdf_paper/
Classifier/matadata generator Demo
http://www.scit.wlv.ac.uk/~ex1253/metadata.html
Mapping Entry Vocabulary to Unfamiliar Metadata Vocabularies
Michael Buckland, with Aitao Chen, Hui-Min Chen, Youngin Kim, Byron
Lam, Ray Larson, Barbara Norgard, and Jacek Purat
http://www.dlib.org/dlib/january99/buckland/01buckland.html
Building a XML-based Metasearch Engine on the Server
http://xml.com/pub/1999/07/metasearch/metasearch2.html
GoXML Search Engine
http://www.goxml.com/
GoXML.com v1.0 - BETA is an XML Context-based Search Processor. Online
documentation (http://www.goxml.com/about/supported.xsp
) and Demonstration (http://www.goxml.com/help_srch.xsp
). The Goxml Project was launched to create a new breed of Search Vehicle
which can index, store and allow accurate searching of XML data. The primary
focus is to allow XML developers a tool to locate XML documents on the
internet.
Search Engines News
http://searchenginewatch.com/news.html
Current Search Engine Report
http://searchenginewatch.com/sereport/current.html
Search Engine Size
http://www.searchenginewatch.com/reports/sizes.html
News at Web Site Search Tools
http://www.searchtools.com/info/news.html
Results from our Site Search Tools Survey!
http://www.searchtools.com/surveys/survey-results-01.html
First results from our search tools survey are in, and they're interesting!
Most web administrators who haven't installed a site search say it's because
they don't have time or the applications are too complex. Those who have
cite improved navigation as their number one reason, by far. More surprising
results come from sites aimed towards information professionals (many don't
have search), and sites with three or more languages (they have search).
Websearch.miningco.com weekly
http://websearch.miningco.com/library/weekly/topicmenu.htm?pid=2825&cob=home
New largest Search Engine Alltheweb.com launched by FAST Search & Transfer
August 2, 1999 FAST (Fast Search & Transfer) has launched a new
site called Alltheweb ("FAST Search: All the Web, All the Time") http://www.alltheweb.com/.
The announced size of their index is more than 200 millions pages that
is estimated as 25% of all web.
FATS Search server has the following benefits:
Moderate data volume (up to approx. 1 million pages) accommodate support
for approximate pattern matching according to a patented metric.
Implementations: Biggest Search Engine having approx. 200 millions
indexed pages, Lycos FTP search (http://ftpsearch.lycos.com/).
FAST has special agreement with Dell, their Search Engine Alltheweb is powered by Dell PowerEdge Servers.
How to test.
Just try this mentioned phrase "to be or not to be" with quotation
and without quotation in Alltheweb, Altavista, Google. You'll see big difference.
Another high-end technology provided by FAST is FAST Image Transfer that has better compression with the same quality comparing to JPEG format, it's specially oriented on web applications and has embedded thumbnail functionality and progressive multi-resolution image display. Plugings are available for Adobe Photoshop, MS IE and Netscape Navigator. File extension is .fst.
FAST Aims For Largest Index
http://searchenginewatch.com/sereport/99/05-fast.html
All The Web http://www.alltheweb.com/
FAST http://www.fast.no/
http://www-new.fast.no/company.html
http://www.fastweb.no/
FAST FTP Search - http://www.fastftp.lycos.com/
FAST Search Server
http://www.fast.no/product/fsserver.html
FAST SW Search
http://www.fast.no/product/fastsearch.html
Search Engine Size
http://www.searchenginewatch.com/reports/sizes.html
NREN Search and Index Services
German Web Index
http://www.fireball.de/
Metagenerator - http://www.fireball.de/metagenerator.html
Metadata scheme - http://www.fireball.de/meta_daten.html
Fireball was developed by FLP/KIT - http://flp.cs.tu-berlin.de/
KIT - http://flp.cs.tu-berlin.de/kit/kit.html
Swiss search service
http://www.search.ch/
Allows metadata search - http://www.search.ch/help.html.en
Nordic Web index
http://nwi.ub2.lu.se/?lang=en
Special purposes Search Engines
US Government Search Engine Launched
A new search engine that focuses on information from US government
sources was opened in May. Called Gov.Search, the service is jointly produced
by search engine Northern Light and the U.S. Commerce Department's National
Technical Information Service through a five-year agreement.
The service is unusual for the web in that searching is not free. Those
wishing to use it must pay for access, which ranges from US $15 for a day
pass, $30 for a monthly pass or $250 for a year. Special pricing is also
available to companies and organizations that require multiple accounts.
Northern Light has now indexed about 4 million web pages located on
more than 20,000 US government servers, which also include military and
some educational sites. In addition to this information, it has also indexed
about 2 million specialty records from the NTIS.
http://searchenginewatch.com/sereport/99/06-govsearch.html
Gov.Search
http://www.usgovsearch.com
Google US Government Search
http://www.google.com/unclesam
Google has its own US government search service. Test queries show
it to be much smaller than Northern Light's index, yielding only 10 to
50 percent of Northern Light's counts. But the relevancy of some of the
matches was impressive. Definitely worth a visit.
Cora Search Engine
http://www.cora.justresearch.com/about.html
Cora is a special-purpose search engine covering computer science research
papers.
Northern Light Adds Research Options
Northern Light now also operates a "research" version of its service,
where the default is to search within its Special Collection index. This
index has information from over 5,400 publications, much of which is not
available on the web. Searching is for free, and then documents can be
purchased for between $1 and $4.
Titles can be downloaded from http://www.northernlight.com/docs/specoll_help_download.html
http://searchenginewatch.com/sereport/99/06-northernlight.html
Northern Light Research Version
http://www.nlresearch.com/
(http://www.northernlight.com/research.html
)
Northern Light Special Editions
http://special.northernlight.com/
Research Service at HotBot
http://r.hotbot.com/r/hb_also_rsrch/http://www.elibrary.com/s/hotbot/
"Invisible Web" Revealed
Lycos and IntelliSeek have teamed up to produce an index of search
databases to help users find information that is invisible to search engines.
The "Invisible Web Catalog" provides links to more than 7,000 specialty
search resources. Users can browse listings or search Lycos index base.
http://searchenginewatch.com/sereport/99/07-invisible.html
Lycos Invisible Web Catalog
http://dir.lycos.com/Reference/Searchable_Databases/
IntelliSeek
http://www.intelliseek.com/
Direct Search
http://gwis2.circ.gwu.edu/~gprice/direct.htm
Catalog of specialty databases. Search inside particular database.
WebData
http://www.webdata.com/
Guide to searchable databases. Browse or search through listings.
Northern Light Adds clustering
This is to prevent domination of results from one site.
In addition to pages index NL provides list of Custom Search Folders
™ created/generated of clustered search data by group of servers
of type of pages.
http://www.northernlight.com/docs/search_help_folders.html
Navigate web smarter and easier with Alexa
http://www.alexa.com/
Netscape's keywords service
http://home.netscape.com/escapes/keywords/
Report on the 1999 Search Engines Meeting
by Avi Rappoport, Search Tools Consulting
http://www.searchtools.com/info/meetings/searchenginesmtg/index.html
Portalization and Other Search Trends (by Danny Sullivan of SearchEngineWatch).
Main trends underlined: turning into portals; increasing relance of
common searches like "travel" or "microsoft"; clustering and directory,
etc.
Quantifiable Results: Testing at TREC
The valuable testing was done at TREC (the Text REtrieval Conference)
sponsored by NIST. TREC provides a set of realistic test collections, uniform
scoring, unbiased evaluators and a chance to see the changes and improvements
of search engines over time.
The TREC test collection consists of about 2 GB of combined newspaper
articles and government reports.
Testing includes a few tracks: Adhoc, Cross-Language, Filtering, High
Precision, Interactive, Query, Spoken Document Retrieval (SDR).
Results are in materials of Annual Conferences at http://trec.nist.gov/pubs.html
Summarization
Summarization attempts to reduce document text to its most relevant
content based on the task and user requirements.
Results indicated that many documents can be summarized successfully,
better results are with variable-length summaries. The Information Retrieval
methods applied to this task work well for query-focused summarization,
because the topic focuses the summarization effort.
Valuable information on this issue can be found at Natural Language
Processing & Information Retrieval (NLPIR) group of ITL NIST (http://www.itl.nist.gov/iaui/894.02/).
In May 1998, the U.S. government completed the TIPSTER Text Summarization
Evaluation (SUMMAC), which was the first large-scale, developer-independent
evaluation of automatic text summarization systems. Results are available
for TREC subscribers, final report can be downloaded from http://www.itl.nist.gov/iaui/894.02/related_projects/tipster_summac/final_rpt.html
Results Clustering and Topic Categorization
Clustering of the found documents into useful groups is a fruitful
approach to improving results presentation.
Some search engines perform automatic clustering and categorization
on result sets, so they are divided into groups by topic. The NorthernLight
Search Engine, for example, cluster its results into Custom Folders that
have partly predefined categories.
The academic case made by James Callen of the University of Massachusetts
shown that full text search with modern relevance rankings is the best
approach for information retrieval.
Consensus of the panel, and the meeting, is that automation can help
humans, and automated categorization is the best when humans can provide
a reality check on the systems.
Cross-Language Information Retrieval (CLIR)
CLIR means querying in one language for documents in many languages.
It's becoming more important due to internationalisation of the web. Approaches
include Machine-readable dictionaries, parallel and comparable corpora,
a generalized vector space model, latent semantic indexing, similarity
thesauruses and interlinguas.
Presentation by TextWise (http://www.textwise.com/)
described their Conceptual Interlingua approach, which uses a concept space
where terms from multiple languages are mapped into a language-independent
schema. This technique is used for both indexing and querying, and does
not require pairwise translation.
Improvements to Relevance Ranking of Results
Two presentations were done by Byron Dom from IBM's CLEVER project
(http://www.almaden.ibm.com/cs/k53/clever.html) and Gary Cullis, the chairman
of Direct Hit (http://www.directhit.com/).
Directories and Question-Answering
This section dealt with current move of SE to provide directory and
Subject Gateway altogether with ordinary or advanced searches. Presentation
were given by LookSmart (http://www.looksmart.com/)
and AskJeeves (http://www.askjeeves.com/).
Knowledge Management
Both Daniel Hoogterp of Retrieval Technologies and Rick Kenny of PCDocs
/ Fulcrum described how search fits into corporate knowledge management.
Text Mining
Data mining means evaluating large amounts of stored data and looking
for useful patterns, like relation between product and age of customers.
Text mining uses techniques from information retrieval and other fields
to analyze internal structure, parse the content, provide results, clustering,
summarization, and so on. With automatic event identification, conditional
responses, reuse of analysis, and graphic presentation of results, the
user can skim the best of the information easily.
Filtering and Routing and Intelligent Agents
Filtering and Routing allow individuals to set up criteria for incoming
data (news feeds, email, press releases, etc.), and only be notified or
sent those items that match their interests. Such task are performed by
Intelligent Agents that travel a network or the Internet to locate data
or track web site changes, evaluating the items using relevance judgments
like those of search engines.
Searching Multimedia
Main discussion was about spoken documents and video retrieval.
Search Realities faced by end users and professional searchers
Carol Tenopir gave presentation on the history of user-centered research
on searching, and current work in testing user experiences.
Visualization
There are some attempt to visualise search results based on document
similarity. It was suggested that the success of this approach depends
very strongly on the needs and experience of the searcher.
Natural Language Processing & Information Retrieval (NLPIR) group
of ITL NIST (http://www.itl.nist.gov/iaui/894.02/)
Valuable information. Publications http://www.itl.nist.gov/iaui/894.02/works.html
Information on DARPA TIPSTER Text Program http://www.itl.nist.gov/iaui/894.02/related_projects/tipster/
http://www.itl.nist.gov/iaui/894.02/related_projects/tipster_summac/final_rpt.html
IBM Patents Network -
http://www.patents.ibm.com/
Lycos holds patent 5,748,954
(http://www.patents.ibm.com/details?pn=US05748954__&s_clms=1#clms
), which covers roughly any kind of web spider that heuristically downloads
"better" documents before "worse" documents, and explicitly includes a
reference to looking at how often a document is linked as a goodness heuristic.
TUSTEP (TUebingen System of Text Processing Programs)
Munltilingual Textdata Processing and Fuzzy Searching
http://www.uni-tuebingen.de/zdv/tustep/tdv_eng.html
Web Site Search Tools
http://www.searchtools.com/
Web Site Search Tools - Related Topics
Search Tools Product Listings
http://www.searchtools.com/tools/tools.html
Free Indexing and Searching Software
Harvest-NG
Harvest, an open-source project, has been re-implemented in Perl and
can summarize documents in SOIF (Summary Object Interchange Format). This
version saves the data in a database file and does not include a Broker
or search engine, but it is entirely extensible.
http://www.tardis.ed.ac.uk/harvest/ng/
http://www.tardis.ed.ac.uk/harvest/ng/develop.shtml
The Combine System for disributed indexing
http://www.lub.lu.se/combine/
http://www.ub.lu.se/~tsao/combine/
Zebra Information Server
Powerful free-text indexing and retrieval system, combined with a Z39.50
server. The Zebra server is freely available for noncommercial applications.
http://www.indexdata.dk/zebra/
Framework for Advanced Search (ASF)
http://asf.gils.net/framework.html
ASF Freeware
http://asf.gils.net/freeware/index.html
OCLC Z39.50 freely reusable code (C and Java)
http://www.oclc.org/z39.50/#api
Perlfect Search 3.01
http://perlfect.com/freescripts/search/
PLWeb Turbo has released a new version, 3.0 for Windows NT with improved
performance, customization, web-crawling capability, and a browser-based
interface.
PLWeb and all PLS products are now freeware from AOL.
http://www.pls.com/plweb.htm
http://www.searchtools.com/tools/plweb.html
AltaVista (Windows NT and Unix search tool) has just introduced a free
version of AltaVista Search Intranet, Entry Level, which will index up
to 3,000 pages.
http://k2.altavista-software.com/intranet/3000_version/3000_overview.htm
Ultraseek on Linux
The Ultraseek search engine and the Content Classification Engine now
run on Linux Redhat Linux 5.1 on a PC, Kernel 2.0.34 or better, or glibc
2.0.7-19 or better. Commercial
http://software.infoseek.com/products/ultraseek/ultratop.htm
Download free trial version
http://software.infoseek.com/download/download.htm
http://www.searchtools.com/tools/ultraseek.html
Ultraseek Content Classification Engine Product Information
Commercial.
http://software.infoseek.com/products/cce/ccetop.htm
http://software.infoseek.com/products/cce/ccekey.htm
Super Site Searcher Perl CGI works with other modules to create searchable
site directory. Commercial.
http://www.hassan.com/site_searcher/
http://www.searchtools.com/tools/supersitesearcher.html
Extense - a powerful search engine developed in France which uses the
syntactic declination of French words (masculine/feminine and singular/plural).
Commercial.
http://www.searchtools.com/tools/extense.html
Inxight LinguistX code library - provides language identification, stemming
and tokanization, among other features.
http://www.searchtools.com/tools/inxight.html
http://www.inxight.com/
A collection of componants for many languages that provide word and
phrase analysis, stemming, tokanization, parts of speech analysis, noun
phrase extraction, language identification, summarization, etc.
Platform: Windows 95 and NT, Solaris Sparc (will port to other Unix
systems). Commercial.
Verify products
http://www.verity.com/products/index.html
Knowledge Retrieval products
http://www.verity.com/products/knowret1.html
Search Engines links
http://searchenginewatch.com/links/
Contains such sections:
Search Tips and Tricks Advanced Searching
http://websearch.tqn.com/msub21.htm?pid=2825&cob=home
http://websearch.miningco.com/msub21.htm?pid=2825&cob=home
Information Retrieval systems
http://www.mri.mq.edu.au/%7Eeinat/web_ir/software.html
Top search words and terms
http://www.searchenginewatch.com/facts/searches.html
Ask Jeeves Peak Through The Keyhole http://www.askjeeves.com/docs/peek/
Weekly Search Engine Keyword Statistics For Web and Internet Marketing
http://www.mall-net.com/se_report/
Dogpile Top 200 Search Words
http://www.eyescream.com/dogpiletop200.htm
Top words from the meta-search engine Dogpile from January to July
1997. Unfortunately, the actual keyword phrases are not shown.
Search Spy
http://www.searchspy.com/
This is a database of search terms available for desktop use. You enter
a term, and the program scans to find matches. You can sort results by
count or by keyword. Data is gathered from various live search displays.
Life on the Internet, Finding Things
http://www.screen.com/start/guide/searchengines.html
useit.com: Jakob Nielsen's Website
http://www.useit.com/
He formulated new approach in SE - LSD: Logo, Search, Directory.
IBM's CLEVER Searching
http://www.almaden.ibm.com/cs/k53/clever.html
Web Archeology Project at Digital Research
http://www.research.digital.com/SRC/personal/Krishna_Bharat/WebArcheology/
Contains sections:
The MetaWeb Project
The aim of the Metadata Tools and Services project - known as MetaWeb
- is to develop indexing services, tools, and metadata element sets in
order to promote the use of, and exploitation of metadata on the Internet.
http://www.dstc.edu.au/Research/Projects/metaweb/
DFN Indexing and Searching projects - http://www.dfn.de/links/suchen.html
MetaGer (subject meta search), MESA (email address meta search), Level3
(search service for the DFN-Expo project), Search.de and Entry.de)
X.500 Directory E-mail Addresses Search (AMBIX-D) - http://ambix.uni-tuebingen.de:8889
Research Papers related to Google!
http://google.stanford.edu/google_papers.html
Research Papers related to IBM CLEVER
Searching Project
http://www.almaden.ibm.com/cs/k53/clever.html
TREC Publications
TREC (the Text REtrieval Conference) sponsored by NIST provides a set
of realistic test collections, uniform scoring, unbiased evaluators and
a chance to see the changes and improvements of search engines over time.
Results are in materials of Annual Conferences at http://trec.nist.gov/pubs.html
Retrieval Performance in FERRET: A Conceptual Information Retrieval
System
Michael L. Mauldin
Appeared at The 14th International Conference on Research and Development
in Information Retrieval, Chicago, October 1991, ACM SIGIR.
http://www.fuzine.com/mlm/sigir91.html
Enhancing the World Wide Web
Social Software for the Evolution of Knowledge
http://www.islandone.org/Foresight/WebEnhance/index.html
Learning Webs by J. Bollen, & F. Heylighen,
http://pespmc1.vub.ac.be/LEARNWEB.html
Hebbian learning can be implemented on the web, by changing the strength
of links depending on how often they are used. paper is exploring the "brain"
metaphor for making the web more intelligent. The basic idea is that web
links are similar to associations in the brain, as supported by synapses
connecting neurons. The strength of the links, like the connection strength
of synapses, can change depending on the frequency of use of the link.
This allows the network to "learn" automatically from the way it is used.
Identification, location and versioning of web-resources. URI Discussion
paper. Version 1.0. 12 March 1999
Titia van der Werf-Davelaar
http://www.konbib.nl/donor/rapporten/URI.html
This document is a discussion document for use in developing a consensus
on practical approaches to be pursued for better information management
techniques and methods on the Web.
This work is done in the context of the following projects: DONOR,
DESIRE, NEDLIB.
Report on the WWW8 conference by Nicky Ferguson
http://www.ilrt.bris.ac.uk/~ecnf/www8.html
Semantic Web vision paper
Alexander Chislenko. - Version 0.28 - 29 June, 1997
http://www.lucifer.com/~sasha/articles/SemanticWeb.html
Lycos GENERAL TERMS AND CONDITIONS -
http://www.lycos.com/lycosinc/legal.html
DESIRE 2 - Development of a European Service for Information on Research
and Education
http://www.desire.org/
ROADS Project
ROADS is a set of software tools to enable the set up and maintenance
of Web based subject gateways. Subject gateways are services which provide
searchable and browsable catalogues of Internet based resources. Subject
gateways will typically focus on a related set of academic subject areas.
http://www.ilrt.bris.ac.uk/roads/
ROADS Software Downloads (Perl code for WHOIS++, Centroids/CIP etc.)
http://www.roads.lut.ac.uk/
The ROADS project exit strategy - Ensuring the future of ROADS for
its users
http://www.ilrt.bris.ac.uk/roads/news/latest/futures/
IMesh at Desire.org
International Collaboration on Internet Subject Gateways
http://www.desire.org/html/subjectgateways/community/imesh/
Project Isaac - A Distributed Architecture for Resource Discovery Using
Metadata
http://scout.cs.wisc.edu/research/index.html
Joint Information System Committee
Established to stimulate and enable the cost effective exploitation
of information systems and to provide a high quality national network infrastructure
for the UK higher education and research councils communities
http://www.jisc.ac.uk/
Publications related to JISC
http://www.jisc.ac.uk/pub/index.html
OCLC - Co-operative Online Resource Catalog (CORC)
http://www.oclc.org/oclc/research/projects/corc/index.htm
CoBRA+ - Computerised Bibliographic Record Actions
http://www.bl.uk/information/cobra.html
CoBRA+ working group on multilingual subject access
http://www.bl.uk/information/finrap3.html
EEVL (Engineering Gateway) Evaluation Reports
http://www.eevl.ac.uk/evaluation/
The Gateway to Educational Materials
The Gateway currently contains 6661 education resources and includes
resources from more than 40 collections, including the AskERIC Virtual
Library, Math Forum, Microsoft Encarta, North Carolina Department of Public
Instruction, and U.S. Department of Education.
http://www.thegateway.org
Networked Digital Library of Theses and Dissertations
http://www.ndltd.org/
German Digital library project Global Info
http://www.global-info.org/index.html.en
D-lib Magazine
D-Lib Magazine is a monthly magazine about digital libraries for researchers,
developers, and the intellectually curious. New issues are published on
the 15th of each month.
http://www.dlib.org/
Modeling Users' Successive Searches in Digital Environments: A National
Science Foundation/British Library Funded Study
Amanda Spink, Tom Wilson, David Ellis , Nigel Ford
http://www.dlib.org/dlib/april98/04spink.html
Legal Issues on the Internet: Hyperlinking and Framing
Maureen A. O'Rourke
http://www.dlib.org/dlib/april98/04orourke.html
Cross-Searching Subject Gateways: The Query Routing and Forward Knowledge
Approach, John Kirriemuir, Dan Brickley, Susan Welsh
Jon Knight, Martin Hamilton
http://www.dlib.org/dlib/january98/01kirriemuir.html
Using Automated Classification for Summarizing and Selecting Heterogeneous
Information Sources
R. Dolin, D. Agrawal, A. El Abbadi, J. Pearlman
http://www.dlib.org/dlib/january98/dolin/01dolin.html
Networked Digital Library of Theses and Dissertations: An International
Effort Unlocking University Resources
Edward A. Fox, John L. Eaton, Gail McMillan, Neill A. Kipp, Paul Mather,
Tim McGonigle, William Schweiker, and Brian DeVane
http://www.dlib.org/dlib/september97/theses/09fox.html
The Internet Knowledge Manager, Dynamic Digital Libraries, and Agents
You Can Understand
Adrian Walker, IBM Research Division
http://www.dlib.org/dlib/march98/walker/03walker.html
An Introduction to the Resource Description Framework
Eric Miller, OCLC
http://www.dlib.org/dlib/may98/miller/05miller.html
A Distributed Architecture for Resource Discovery Using Metadata
Michael Roszkowski and Christopher Lukas, Scout project, University
of Wisconsin-Madison
http://www.dlib.org/dlib/june98/scout/06roszkowski.html
Multilingual Federated Searching Across Heterogeneous Collections
James Powell and Edward A. Fox
http://www.dlib.org/dlib/september98/powell/09powell.html
The Joint NSF/JISC International Digital Libraries Initiative
Norman Wiseman, Joint Information Systems Committee; Chris Rusbridge,
Electronic Libraries Programme; and Stephen M. Griffin, National Science
Foundation
http://www.dlib.org/dlib/june99/06wiseman.html
D-Lib Ready Reference: Subject Area Gateways
http://www.dlib.org/reference.html#subject
A Common Model to Support Interoperable Metadata. Progress report on
reconciling metadata requirements from the Dublin Core and INDECS/DOI Communities
David Bearman, Eric Miller, Godfrey Rust, Jennifer Trant, Stuart Weibel
http://www.dlib.org/dlib/january99/bearman/01bearman.html
A Multilingual Electronic Text Collection of Folk Tales for Casual Users
Using Off-the-Shelf Browsers
Myriam Dartois, Akira Maeda, Tetsuo Sakaguchi, Takehisa Fujita, Shigeo
Sugimoto, Koichi Tabata
D-Lib Magazine, October 1997
http://www.dlib.org/dlib/october97/sugimoto/10sugimoto.html
Multi-Media, Multi-Cultural, and Multi-Lingual Digital Libraries, Or
How Do We Exchange Data In 400 Languages?
Christine L. Borgman
University of California, Los Angeles
D-Lib Magazine, June 1997
http://www.dlib.org/dlib/june97/06borgman.html
IKEM Toolkit
http://bikit.rug.ac.be:80/ikem/
IKEM Toolkit is a hybrid knowledge-based platform for thesaurus-oriented
electronic document management. The project was sponsored by IWT. IKEM
Toolkit contains various tools to manage your hybrid documents in an intelligent
and user-oriented way.
Willpower Information. Information Management Consultants
www.willpower.demon.co.uk
Thesauri and vocabulary control: Principles and practice
http://www.willpower.demon.co.uk/thesprin.htm
Software for building and editing thesauri
http://www.willpower.demon.co.uk/thessoft.htm
CMU Text Learning Group
http://www.cs.cmu.edu/afs/cs/project/theo-4/text-learning/www/index.html
Goal is to develop new machine learning algorithms for text and hypertext
data. Applications of these algorithms include information filtering systems
for the Internet, and software agents that make decisions based on text
information.
CMU World Wide Knowledge Base (WebKB) project
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
Goal is to develop a probabilistic, symbolic knowledge base that mirrors
the content of the world wide web. If successful, this will make text information
on the web available in computer-understandable form, enabling much more
sophisticated information retrieval and problem solving.
Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification
and Clustering
Bow (or libbow) is a library of C code useful for writing statistical
text analysis, language modeling and information retrieval programs. The
current distribution includes the library, as well as front-ends for document
classification (rainbow), document retrieval (arrow) and document clustering
(crossbow).
The library and its front-ends were designed and written by Andrew
McCallum.
http://www.cs.cmu.edu/~mccallum/bow/rainbow/
Homepage of Andrew McCallum
http://www.cs.cmu.edu/~mccallum/
Contains a lot of information on Learning Classification algorithms for text recognition.
Reinforcement Learning with Selective Perception and Hidden State.
PhD Thesis, by Andrew Kachites McCallum
http://www.cs.rochester.edu/u/mccallum/phd-thesis/
Method uses memory-based learning and a robust statistical test on
reward in order to learn a structured policy representation that makes
perceptual and memory distinctions only where needed for the task at hand.
It can also be understood as a method of Value Function Approximation.
The model learned is an order-n partially observable Markov decision process.
It handles noisy observation, action and reward.
WWW -- Wealth, Weariness or Waste: Controlled vocabulary and thesauri
in support of online information access
David Batty
http://www.dlib.org/dlib/november98/11contents.html
Using Automated Classification for Summarizing and Selecting Heterogeneous
Information Sources
R. Dolin, D. Agrawal, A. El Abbadi, J. Pearlman
http://www.dlib.org/dlib/january98/dolin/01dolin.html
XANADU(R) ZIGZAG(TM) Hyperstructure Kit
http://www.xanadu.net/zigzag/
TRANSPUBLISHING: A SIMPLE CONCEPT
http://www.sfc.keio.ac.jp/~ted/TPUB/TPUBsum
OSMIC. THEORY: MODELS OF TIME, VERSIONS AND BACKTRACK
http://www.sfc.keio.ac.jp/~ted/OSMIC/osmicTime.html
Ted Nelson Home page
http://www.sfc.keio.ac.jp/~ted/
ROG-O-MATIC: A Belligerent Expert System
MICHAEL L. MAULDIN, GUY JACOBSON, ANDREW APPEL and LEONARD HAMEY
http://www.fuzine.com/mlm/rgm84.html
Alta Vista sold to CMGI (http://www.cmgi.com/) - Internet venture holding. In the deal Compaq and CMGI established strategic partership.
AltaVista's free Internet access offers integrated search, news, quotes,
and much more.
AltaVista FreeAccess http://microav.com/
Inktomi Launches European Search Center
Inktomi has opened an index of European web sites that will serve its
partners who are based in Europe. The 50 million page index is based in
the United Kingdom and mostly populated by content from European web servers.
Inktomi partners such as UKMax (http://ukmax.com)
and Dagens Nyheter (http://dn.se/) are expected
to begin using the index soon.
Results of test - UKMax -bad, dn.se have not implemented search yet.
http://searchenginewatch.com/sereport/99/06-inktomi.html
Infoseek adds new search features
Infoseek has introduced search term highlighting in its results, a
related searches prompter, and increased its index size to about 70 million
web pages.
Infoseek added a "Similar searches". Similar Searches display popular
queries that are related to your original search. F.E., if your looking
for "gardening" you will be also proposed "water gardening", "flower gardening",
etc.
http://www.infoseek.com/
http://searchenginewatch.com/sereport/99/06-infoseek.html
Infoseek is to be completely acquired by Disney and merged into a new
company called Go.com.
http://www.internetnews.com/bus-news/article/0,1087,3_159481,00.html
Dell started their own portal Dellnet.com which actually resides at
dellnet.snap.com
It also includes DellAuction.com (http://www.dellauction.com/), Gigabuys.com
(http://gigabuys.us.dell.com/store/index.asp
)
FAST ASA Announces Signing of MOU and Conditional Share Placing/Option
Agreement with Dell Computer Corporation
http://www-new.fast.no/company/press/dell02081999.html
NBC's Snap.com and GlobalBrain.net unveil sophisticated new technology
and services to harness the brain power of Internet users
NBC and CNET's Snap.com Internet portal and GlobalBrain.net today unveiled
an exclusive multi-year technology licensing and development agreement.
Snap.com will integrate GlobalBrain's revolutionary new Internet popularity
ranking technology that improves the relevancy of search results by learning
user preferences and prioritizing search results accordingly.
http://www.globalbrain.net/html/release.html
http://searchenginewatch.com/sereport/9811-globalbrain.html
Snap Picture Finder
http://home.snap.com/search/picture/form/0,584,-0,00.html
Snap is now featuring an image search capability, powered by Ditto.com.
Previously known as ArribaVista, Ditto.com also offers image searching
directly via its web site. The company is embarking on a new strategy of
powering image search for other sites.
http://www.ditto.com/
Netscape Search Service
Netscape has launched a revamped Netscape Search service that uses
information from the Open Directory and technology from Google.
http://search.netscape.com/
Direct Hit Debuts at MSN Search, Lycos
Both MSN Search and Lycos are now featuring Direct Hit results, and
the company itself has just received $26 million in financing from a variety
of venture firms.
http://www.directhit.com/
LookSmart Live Looks-Up Answers
Looking for an answer? Look no further than LookSmart, which is providing
custom research to frustrated searchers through its new LookSmart Live
program. The request gets passed on to one of 80 editors involved in the
project, and within 24 hours, you get an email back with your answer.
http://www.looksmart.com/
LookSmart Live
http://www.looksmart.com/live/
America Online, Excite@Home, Yahoo!,
others are working on adopting their portal service to handheld computers.
IBM, Novell, Oracle, DCL, Lotus Development and ISO COR rally industry
to advance market for Open Directory applications
Members of the Directory Interoperability Forum plan to:
Go Beta Tests User-Assisted Directory
ttp://www.go.com/
Go Guides Beta
http://beta.guides.go.com/