Wordnet
WordNet is a lexical database for the English language, created and maintained at the Cognitive Science Laboratory of Princeton University3. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets.
Dico provides a wordnet
module for reading WordNet lexical
database files. The module relies on libWN, the support
library distributed with the WordNet database.
There is a point worth noticing if you plan to use the WordNet
library. Normally, the libWN is compiled as a static library
with position-dependent code, which makes it difficult (or impossible,
on 64-bit architectures) to use from the dynamically-loaded libraries,
such as dicod
modules. So, first of all you will need to
rebuild WordNet so that it contains position-independent code. To do
so, change to the WordNet source directory and reconfigure it as
follows:
./configure CFLAGS=-fPIC [other_options]
where other_options stands for any other options you might wish to pass to configure.
If you are going to run this command in a source directory that has been previously configured, it is advisable to run ‘make distclean’ beforehand.
Debian-based systems provide a package ‘wordnet-dev’, which contains a properly built shared library. However, this library is named ‘libwordnet.so’, instead of the expected ‘libWN.so’. On such systems you will have to use the --with-libWN option to configure, in order to inform it about the change:
./configure --with-libWN=wordnet
Argument to this option is the new basename for the libWN library, without file suffix. Optionally, the ‘lib’ prefix is allowed,
The wordnet
module is compiled automatically if the
configure script was able to find the library and its header file
wn.h. If it was not, use the --with-wordnet configure
option to specify the location where these files can be found. For
example, if WordNet was installed using the default procedure, then
the following option will do the job:
./configure --with-wordnet=/usr/local/WordNet-3.0
This command tells Dico to look for WordNet library files in /usr/local/WordNet-3.0/lib and for include files in /usr/local/WordNet-3.0/include.
A compiled module is loaded using the following statement:
load-module wordnet { command "wordnet [parameters]"; }
Optional parameters are:
Base directory for WordNet files. This is the directory where WordNet
was installed. For the wordnet
module to work, it must
contain the dict subdirectory with WordNet dictionary files.
If you installed WordNet to /usr/local/WordNet-3.0, so that
running ls
on that directory shows you:
$ ls /usr/local/WordNet-3.0/ bin/ dict/ doc/ include/ lib/ man/
then you would use
load-module wordnet { command "wordnet wnhome=/usr/local/WordNet-3.0"; }
Directory in which the WordNet database has been installed.
Normally, these values are set at compile time and you won’t need to override them. The use of these parameters may, however, be necessary if the database was moved or installed in a non-standard location.
One or more WordNet database instances can be defined. They all will be sharing the same database. The reason for having several database instances is that they may have different output options. For example, you may configure one database to return word definitions and another one to act as a thesaurus.
Dico version 2.11.90 defines the following database parameters:
Select part of speech to be displayed by this database. By default, all parts of speech are displayed. Valid values are:
Display all parts of speech. This is the default.
Display only nouns.
Display only verbs.
Display only adjectives.
Display only adverbs.
Display only satellites.
When specified, this parameter instructs the WordNet database to merge all definitions with the same part of speech into a single definition, which will be returned in the usual dictionary fashion, e.g.:
sail n. 1. a large piece of fabric (usually canvas fabric) by means of which wind is used to propel a sailing vessel Synonyms: {canvas}, {canvass}, {sheet} 2. an ocean trip taken for pleasure Synonyms: {cruise} 3. any structure that resembles a sail v. 1. traverse or travel on (a body of water); "We sailed the Atlantic"; "He sailed the Pacific all alone" 2. move with sweeping, effortless, gliding motions
By default, each definition is returned as a separate entry.
As an example, the following is the database definition the author uses on his server:
database { name "WordNet"; handler "wordnet merge-defs"; languages-from "en"; languages-to "en"; description "WordNet dictionary, version 3.0"; }
See http://wordnet.princeton.edu/wordnet/, for a detailed information, including links to download.