Search::Estraier - pure perl module for Hyper Estraier node API

-= Search::Estraier perl module with example scripts.
Please, use latest Hyper Estraier when using latest Search::Estraier.
Search-Estraier-0.08.tar.gz 38 Kb
-=> Latest source is always available from Subversion repository

Index

NAME

Top

Search::Estraier - pure perl module to use Hyper Estraier search engine

SYNOPSIS

Top

Simple indexer

	use Search::Estraier;

	my $node = new Search::Estraier::Node(
		url => 'http://localhost:1978/node/test',
		user => 'admin',
		passwd => 'admin',
		create => 1,
		label => 'Label for node',
		croak_on_error => 1,
	);

	my $doc = new Search::Estraier::Document;

	$doc->add_attr('@uri', "http://estraier.gov/example.txt");
	$doc->add_attr('@title', "Over the Rainbow");

	$doc->add_text("Somewhere over the rainbow.  Way up high.");
	$doc->add_text("There's a land that I heard of once in a lullaby.");

	die "error: ", $node->status,"\n" unless (eval { $node->put_doc($doc) });

Simple searcher

	use Search::Estraier;

	my $node = new Search::Estraier::Node(
		url => 'http://localhost:1978/node/test',
		user => 'admin',
		passwd => 'admin',
		croak_on_error => 1,
	);

	my $cond = new Search::Estraier::Condition;

	$cond->set_phrase("rainbow AND lullaby");

	my $nres = $node->search($cond, 0);

	if (defined($nres)) {
		print "Got ", $nres->hits, " results\n";

		for my $i ( 0 ... $nres->doc_num - 1 ) {
			my $rdoc = $nres->get_doc($i);
			print "URI: ", $rdoc->attr('@uri'),"\n";
			print "Title: ", $rdoc->attr('@title'),"\n";
			print $rdoc->snippet,"\n";
		}
	} else {
		die "error: ", $node->status,"\n";
	}

DESCRIPTION

Top

This module is implementation of node API of Hyper Estraier. Since it's perl-only module with dependencies only on standard perl modules, it will run on all platforms on which perl runs. It doesn't require compilation or Hyper Estraier development files on target machine.

It is implemented as multiple packages which closly resamble Ruby implementation. It also includes methods to manage nodes.

There are few examples in scripts directory of this distribution.

Inheritable common methods

Top

This methods should really move somewhere else.

_s

Remove multiple whitespaces from string, as well as whitespaces at beginning or end

 my $text = $self->_s(" this  is a text  ");
 $text = 'this is a text';

Search::Estraier::Document

Top

This class implements Document which is single item in Hyper Estraier.

It's is collection of:

attributes

'key' => 'value' pairs which can later be used for filtering of results

You can add common filters to attrindex in estmaster's _conf file for better performance. See attrindex in Hyper Estraier P2P Guide.

vectors

also 'key' => 'value' pairs

display text

Text which will be used to create searchable corpus of your index and included in snippet output.

hidden text

Text which will be searchable, but will not be included in snippet.

new

Create new document, empty or from draft.

  my $doc = new Search::HyperEstraier::Document;
  my $doc2 = new Search::HyperEstraier::Document( $draft );

add_attr

Add an attribute.

  $doc->add_attr( name => 'value' );

Delete attribute using

  $doc->add_attr( name => undef );

add_text

Add a sentence of text.

  $doc->add_text('this is example text to display');

add_hidden_text

Add a hidden sentence.

  $doc->add_hidden_text('this is example text just for search');

add_vectors

Add a vectors

  $doc->add_vector(
  	'vector_name' => 42,
	'another' => 12345,
  );

set_score

Set the substitute score

  $doc->set_score(12345);

score

Get the substitute score

id

Get the ID number of document. If the object has never been registred, -1 is returned.

  print $doc->id;

attr_names

Returns array with attribute names from document object.

  my @attrs = $doc->attr_names;

attr

Returns value of an attribute.

  my $value = $doc->attr( 'attribute' );

texts

Returns array with text sentences.

  my @texts = $doc->texts;

cat_texts

Return whole text as single scalar.

 my $text = $doc->cat_texts;

dump_draft

Dump draft data from document object.

  print $doc->dump_draft;

delete

Empty document object

  $doc->delete;

This function is addition to original Ruby API, and since it was included in C wrappers it's here as a convinience. Document objects which go out of scope will be destroyed automatically.

Search::Estraier::Condition

Top

new

  my $cond = new Search::HyperEstraier::Condition;

set_phrase

  $cond->set_phrase('search phrase');

add_attr

  $cond->add_attr('@URI STRINC /~dpavlin/');

set_order

  $cond->set_order('@mdate NUMD');

set_max

  $cond->set_max(42);

set_options

  $cond->set_options( 'SURE' );

  $cond->set_options( qw/AGITO NOIDF SIMPLE/ );

Possible options are:

SURE

check every N-gram

USUAL

check every second N-gram

FAST

check every third N-gram

AGITO

check every fourth N-gram

NOIDF

don't perform TF-IDF tuning

SIMPLE

use simplified query phrase

Skipping N-grams will speed up search, but reduce accuracy. Every call to set_options will reset previous options;

This option changed in version 0.04 of this module. It's backwards compatibile.

phrase

Return search phrase.

  print $cond->phrase;

order

Return search result order.

  print $cond->order;

attrs

Return search result attrs.

  my @cond_attrs = $cond->attrs;

max

Return maximum number of results.

  print $cond->max;

-1 is returned for unitialized value, 0 is unlimited.

options

Return options for this condition.

  print $cond->options;

Options are returned in numerical form.

set_skip

Set number of skipped documents from beginning of results

  $cond->set_skip(42);

Similar to offset in RDBMS.

skip

Return skip for this condition.

  print $cond->skip;

set_distinct

  $cond->set_distinct('@author');

distinct

Return distinct attribute

  print $cond->distinct;

set_mask

Filter out some links when searching.

Argument array of link numbers, starting with 0 (current node).

  $cond->set_mask(qw/0 1 4/);

Search::Estraier::ResultDocument

Top

new

  my $rdoc = new Search::HyperEstraier::ResultDocument(
  	uri => 'http://localhost/document/uri/42',
	attrs => {
		foo => 1,
		bar => 2,
	},
	snippet => 'this is a text of snippet'
	keywords => 'this\tare\tkeywords'
  );

uri

Return URI of result document

  print $rdoc->uri;

attr_names

Returns array with attribute names from result document object.

  my @attrs = $rdoc->attr_names;

attr

Returns value of an attribute.

  my $value = $rdoc->attr( 'attribute' );

snippet

Return snippet from result document

  print $rdoc->snippet;

keywords

Return keywords from result document

  print $rdoc->keywords;

Search::Estraier::NodeResult

Top

new

  my $res = new Search::HyperEstraier::NodeResult(
  	docs => @array_of_rdocs,
	hits => %hash_with_hints,
  );

doc_num

Return number of documents

  print $res->doc_num;

This will return real number of documents (limited by max). If you want to get total number of hits, see hits.

get_doc

Return single document

  my $doc = $res->get_doc( 42 );

Returns undef if document doesn't exist.

hint

Return specific hint from results.

  print $res->hint( 'VERSION' );

Possible hints are: VERSION, NODE, HIT, HINT#n, DOCNUM, WORDNUM, TIME, LINK#n, VIEW.

hints

More perlish version of hint. This one returns hash.

  my %hints = $res->hints;

hits

Syntaxtic sugar for total number of hits for this query

  print $res->hits;

It's same as

  print $res->hint('HIT');

but shorter.

Search::Estraier::Node

Top

new

  my $node = new Search::HyperEstraier::Node;

or optionally with url as parametar

  my $node = new Search::HyperEstraier::Node( 'http://localhost:1978/node/test' );

or in more verbose form

  my $node = new Search::HyperEstraier::Node(
  	url => 'http://localhost:1978/node/test',
	user => 'admin',
	passwd => 'admin'
	create => 1,
	label => 'optional node label',
	debug => 1,
	croak_on_error => 1
  );

with following arguments:

url

URL to node

user

specify username for node server authentication

passwd

password for authentication

create

create node if it doesn't exists

label

optional label for new node if create is used

debug

dumps a lot of debugging output

croak_on_error

very helpful during development. It will croak on all errors instead of silently returning -1 (which is convention of Hyper Estraier API in other languages).

set_url

Specify URL to node server

  $node->set_url('http://localhost:1978');

set_proxy

Specify proxy server to connect to node server

  $node->set_proxy('proxy.example.com', 8080);

set_timeout

Specify timeout of connection in seconds

  $node->set_timeout( 15 );

set_auth

Specify name and password for authentication to node server.

  $node->set_auth('clint','eastwood');

status

Return status code of last request.

  print $node->status;

-1 means connection failure.

put_doc

Add a document

  $node->put_doc( $document_draft ) or die "can't add document";

Return true on success or false on failure.

out_doc

Remove a document

  $node->out_doc( document_id ) or "can't remove document";

Return true on success or false on failture.

out_doc_by_uri

Remove a registrated document using it's uri

  $node->out_doc_by_uri( 'file:///document/uri/42' ) or "can't remove document";

Return true on success or false on failture.

edit_doc

Edit attributes of a document

  $node->edit_doc( $document_draft ) or die "can't edit document";

Return true on success or false on failture.

get_doc

Retreive document

  my $doc = $node->get_doc( document_id ) or die "can't get document";

Return true on success or false on failture.

get_doc_by_uri

Retreive document

  my $doc = $node->get_doc_by_uri( 'file:///document/uri/42' ) or die "can't get document";

Return true on success or false on failture.

get_doc_attr

Retrieve the value of an atribute from object

  my $val = $node->get_doc_attr( document_id, 'attribute_name' ) or
  	die "can't get document attribute";

get_doc_attr_by_uri

Retrieve the value of an atribute from object

  my $val = $node->get_doc_attr_by_uri( document_id, 'attribute_name' ) or
  	die "can't get document attribute";

etch_doc

Exctract document keywords

  my $keywords = $node->etch_doc( document_id ) or die "can't etch document";

etch_doc_by_uri

Retreive document

  my $keywords = $node->etch_doc_by_uri( 'file:///document/uri/42' ) or die "can't etch document";

Return true on success or false on failture.

uri_to_id

Get ID of document specified by URI

  my $id = $node->uri_to_id( 'file:///document/uri/42' );

This method won't croak, even if using croak_on_error.

_fetch_doc

Private function used for implementing of get_doc, get_doc_by_uri, etch_doc, etch_doc_by_uri.

 # this will decode received draft into Search::Estraier::Document object
 my $doc = $node->_fetch_doc( id => 42 );
 my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42' );

 my $doc = $node->_fetch_doc( id => 42, etch => 1 );
 my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', etch => 1 );

 my $doc = $node->_fetch_doc( id => 42, attr => '@mdate' );
 my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', attr => '@mdate' );

 my $id = $node->_fetch_doc(
 	uri => 'file:///document/uri/42',
	path => '/uri_to_id',
	chomp_resbody => 1
 );

name

  my $node_name = $node->name;

label

  my $node_label = $node->label;

doc_num

  my $documents_in_node = $node->doc_num;

word_num

  my $words_in_node = $node->word_num;

size

  my $node_size = $node->size;

Search documents which match condition

  my $nres = $node->search( $cond, $depth );

$cond is Search::Estraier::Condition object, while <$depth> specifies depth for meta search.

Function results Search::Estraier::NodeResult object.

cond_to_query

Return URI encoded string generated from Search::Estraier::Condition

  my $args = $node->cond_to_query( $cond, $depth );

shuttle_url

This is method which uses LWP::UserAgent to communicate with Hyper Estraier node master.

  my $rv = shuttle_url( $url, $content_type, $req_body, \$resbody );

$resheads and $resbody booleans controll if response headers and/or response body will be saved within object.

set_snippet_width

Set width of snippets in results

  $node->set_snippet_width( $wwidth, $hwidth, $awidth );

$wwidth specifies whole width of snippet. It's 480 by default. If it's 0 snippet is not sent with results. If it is negative, whole document text is sent instead of snippet.

$hwidth specified width of strings from beginning of string. Default value is 96. Negative or zero value keep previous value.

$awidth specifies width of strings around each highlighted word. It's 96 by default. If negative of zero value is provided previous value is kept unchanged.

set_user

Manage users of node

  $node->set_user( 'name', $mode );

$mode can be one of:

0

delete account

1

set administrative right for user

2

set user account as guest

Return true on success, otherwise false.

admins

 my @admins = @{ $node->admins };

Return array of users with admin rights on node

guests

 my @guests = @{ $node->guests };

Return array of users with guest rights on node

cacheusage

Return cache usage for a node

  my $cache = $node->cacheusage;

master

Set actions on Hyper Estraier node master (estmaster process)

  $node->master(
  	action => 'sync'
  );

All available actions are documented in http://hyperestraier.sourceforge.net/nguide-en.html#protocol

PRIVATE METHODS

Top

You could call those directly, but you don't have to. I hope.

_set_info

Set information for node

  $node->_set_info;

_clear_info

Clear information for node

  $node->_clear_info;

On next call to name, label, doc_num, word_num or size node info will be fetch again from Hyper Estraier.

EXPORT

Top

Nothing.

SEE ALSO

Top

http://hyperestraier.sourceforge.net/

Hyper Estraier Ruby interface on which this module is based.

Hyper Estraier now also has pure-perl binding included in distribution. It's a faster way to access databases directly if you are not running estmaster P2P server.

AUTHOR

Top

Dobrica Pavlinusic, <dpavlin@rot13.org>

Robert Klep <robert@klep.name> contributed refactored search code

COPYRIGHT AND LICENSE

Top

ChangeLog

2008-01-20 16:51:47 dpavlin r199

/trunk/lib/Search/Estraier.pm: version bump [0.09]

2008-01-20 16:50:59 dpavlin r198

/trunk/t/5_Node.t: fix RT #32457: Victim of Test-Simple 0.74

2007-01-05 22:19:01 dpavlin r197

/trunk/scripts/est-spider: don't exit from sub with next

2006-11-26 12:06:08 dpavlin r196

/trunk/scripts/est-spider: added --skip-images option

2006-11-14 16:39:08 dpavlin r195

/trunk/scripts/dbi-indexer.pl: added new --dbi and --quiet command-line options, saner defaults

2006-11-11 23:34:55 dpavlin r194

/trunk/lib/Search/Estraier.pm, /trunk/MANIFEST, /trunk/lib, /trunk/Makefile.PL, /trunk/Estraier.pm, /trunk/lib/Search: reorg directory structure to lib/Search/Estraier.pm

2006-11-05 16:28:59 dpavlin r193

/cpan/0.08: CPAN release 0.08

2006-11-05 16:28:31 dpavlin r192

/trunk/t/5_Node.t: fix warning

2006-11-05 16:26:57 dpavlin r191

/trunk/Estraier.pm: bump version to 0.08

2006-11-05 16:25:56 dpavlin r190

/trunk/t/5_Node.t: actually wrap $cond->set_distinct in ok() :-)

2006-11-05 16:23:03 dpavlin r189

/trunk/t/5_Node.t: test set_distinct

2006-11-05 16:08:08 dpavlin r188

/trunk/t/2_Condition.t: added tests for set_distinct and distinct

2006-11-05 16:01:36 dpavlin r187

/trunk/t/5_Node.t: added tests for set_score and score

2006-11-05 15:53:13 dpavlin r186

/trunk/Estraier.pm: removed debugging output

2006-11-05 15:53:01 dpavlin r185

/trunk/t/1_Document.t: add tests for set_score and score

2006-11-04 13:10:29 dpavlin r184

/trunk/Estraier.pm: set_distinct and set_score patch from Mikio Hirabayashi <mikio@users.sourceforge.net>

2006-08-31 14:43:06 dpavlin r183

/trunk/scripts/dbi-indexer.pl: separate authorisation for estraier (estuser,estpasswd) and database (dbuser,dbpasswd)

2006-08-26 22:35:15 dpavlin r182

/trunk/scripts/est-spider: --force will now skip checking of mtime (as it should)

2006-08-26 22:33:34 dpavlin r181

/trunk/scripts/est-spider: remove script from html when convertin it to text

2006-08-26 22:30:13 dpavlin r180

/trunk/scripts/est-spider: fix deletection of external binaries

2006-08-25 11:59:04 dpavlin r179

/trunk/scripts/est-spider: parse windows help file index (hhc) if available for page titles

2006-08-15 16:38:06 dpavlin r178

/trunk/scripts/est-spider: sync master at end of indexing

2006-08-06 19:29:28 dpavlin r177

/cpan/0.07: release 0.07 to CPAN

2006-08-06 18:43:58 dpavlin r176

/trunk/Estraier.pm: release 0.07

2006-08-06 18:38:51 dpavlin r175

/trunk/MANIFEST: fix manifest

2006-08-06 18:15:56 dpavlin r174

/trunk/Estraier.pm: fixed docs

2006-08-06 18:15:11 dpavlin r173

/trunk/Estraier.pm, /trunk/t/2_Condition.t, /trunk/t/5_Node.t: added $cond->set_mask

2006-08-06 17:20:09 dpavlin r172

/trunk/t/9_pod-coverage.t: test pod coverage

2006-08-06 17:19:51 dpavlin r171

/trunk/t/9_pod.t, /trunk/t/99_pod.t: rename pod test

2006-08-06 17:15:01 dpavlin r170

/trunk/t/5_Node.t: test $nres->hint()

2006-08-06 16:42:39 dpavlin r169

/trunk/t/5_Node.t: test error handling of $node->get_doc

2006-08-06 16:42:06 dpavlin r168

/trunk/Estraier.pm: test error handling of $node->get_doc

2006-08-06 16:29:34 dpavlin r167

/trunk/t/5_Node.t: test $cond->skip

2006-08-06 12:48:02 dpavlin r166

/trunk/t/1_Document.t, /trunk/Estraier.pm: add_vectors added [0.07_3] and fixed vector handling which was broken

2006-08-06 12:19:37 dpavlin r165

/trunk/scripts/cpanest: create index if it doesn't exist

2006-08-06 12:19:19 dpavlin r164

/trunk/Estraier.pm: documentation improvements

2006-08-02 21:51:31 dpavlin r163

/trunk/Makefile.PL: added cover target to run Devel::Cover

2006-06-27 22:50:25 dpavlin r162

/trunk/Makefile.PL: fine-tune cpan target

2006-06-27 22:38:20 dpavlin r161

/cpan/0.07_2: CPAN pre-release 0.07_2

2006-06-24 15:34:42 dpavlin r160

/trunk/Estraier.pm, /trunk/t/5_Node.t: added cacheusage, version bumped to 0.07_2

2006-05-25 19:18:14 dpavlin r159

/trunk/scripts/dbi-indexer.pl: added command-line options and debug levels with increasing verbosity

2006-05-22 14:48:14 dpavlin r158

/trunk/Makefile.PL: new target to make cpan distribution

2006-05-22 14:43:56 dpavlin r157

/cpan/0.07_1: CPAN release 0.07_1

2006-05-22 14:42:10 dpavlin r156

/trunk/t/5_Node.t, /trunk/scripts/cpanest, /trunk/Estraier.pm: pre-release of 0.07_1

2006-05-18 14:31:42 dpavlin r155

/trunk/Estraier.pm, /trunk/t/5_Node.t: bugfix: set_skip now really work

2006-05-16 16:05:23 dpavlin r154

/trunk/Estraier.pm: send correct Content-type for set_user

2006-05-16 16:01:09 dpavlin r153

/trunk/t/5_Node.t: added usage of EST_USER and EST_PASSWD enviroment variables for credentials if they exists (otherwise, it will fallback to admin:admin)

2006-05-16 12:11:39 dpavlin r152

/trunk/t/5_Node.t: skip tests if estmaster isn't running, optional way to test against remove Hyper Estraier server using:

ESTMASTER_URI=http://estraier.example.com:1978 make test

2006-05-16 11:39:53 dpavlin r151

/trunk/Estraier.pm, /trunk/t/5_Node.t: added _clear_info which is called in cases where comands modify stats about node (which will force re-read of those data from Hyper Estraier on next request), explanded test suite

2006-05-15 22:26:08 dpavlin r150

/trunk/Estraier.pm, /trunk/t/5_Node.t: call _set_info to refresh data about node after calling out_doc*

2006-05-15 22:11:22 dpavlin r149

/trunk/Estraier.pm: refresh _set_info after sync

2006-05-15 22:06:14 dpavlin r148

/trunk/t/5_Node.t: extended node tests

2006-05-10 21:41:35 dpavlin r147

/trunk/scripts/dbi-indexer.pl: added pk_col to config hash

2006-05-10 21:33:32 dpavlin r146

/trunk/scripts/dbi-indexer.pl: display rows/s rate

2006-05-10 21:09:05 dpavlin r145

/trunk/scripts/dbi-indexer.pl: added db_encoding

2006-05-10 20:31:02 dpavlin r144

/trunk/scripts/dbi-indexer.pl: create node if needed, moved config into hash

2006-05-10 16:54:23 dpavlin r143

/cpan/0.06: CPAN release 0.06

2006-05-10 14:57:50 dpavlin r142

/trunk/MANIFEST, /trunk/Estraier.pm: getting ready for 0.06

2006-05-10 14:52:28 dpavlin r141

/trunk/scripts/estcp-mt.pl, /trunk/scripts/estcp.pl: estcp scripts cleanup for creating nodes (they now copy source label too)

2006-05-10 14:08:34 dpavlin r140

/trunk/Estraier.pm, /trunk/t/5_Node.t: fix interaction of create and croak_on_error, added tests for it

2006-05-10 13:45:08 dpavlin r139

/trunk/Estraier.pm, /trunk/t/5_Node.t: added create and label to new Search::Estraier::Node, so that nodes will be automatically created if needed.

2006-05-10 13:34:17 dpavlin r138

/trunk/t/5_Node.t: better explanation of 46 nodes limit (with just 1024 file descriptors) in Hyper Estraier.

2006-05-09 14:05:57 dpavlin r137

/trunk/t/5_Node.t: test whole new ->master API except for shutdown, backup and logrtt (those operations are specific to Hyper Estraier installation and sysadmins might not appreciate tests which mess system services)

2006-05-09 14:03:36 dpavlin r136

/trunk/Estraier.pm: fix parametar handling for ->master, chomp response body so that it doesn't end with lf (so you can include it in messages)

2006-05-09 12:42:39 dpavlin r135

/trunk/Estraier.pm, /trunk/t/5_Node.t: fixes and tweaks for master

2006-05-09 12:21:26 dpavlin r134

/trunk/t/5_Node.t, /trunk/Estraier.pm: added Search::Estraier::Node->master to controll estmaster and beginning of tests for it

2006-05-08 21:34:00 dpavlin r133

/trunk/scripts/estcp.pl: copy with admin priviledges

2006-05-08 21:33:37 dpavlin r132

/trunk/Estraier.pm: document and actually implement (but, used in examples, uf, uf) shortcut to specify user and passwd directly to Search::Estraier::Node instead of calling set_auth afterwards.

2006-05-08 21:05:32 dpavlin r131

/trunk/scripts/dbi-indexer.pl: added example script to index DBI table

2006-05-08 20:47:48 dpavlin r130

/trunk/t/5_Node.t: fix number of tests to skip

2006-05-08 12:01:00 dpavlin r129

/trunk/t/5_Node.t: test search without results

2006-05-08 12:00:43 dpavlin r128

/trunk/Estraier.pm: removed old implementation of search in favor of refactored code contributed by Robert Klep

2006-05-06 22:09:01 dpavlin r127

/trunk/scripts/bench_search.pl: script to benchmark old and new implementation of search

2006-05-06 21:38:14 dpavlin r126

/trunk/Estraier.pm: Better implementation of search by Robert Klep <robert@klep.name>

2006-05-03 14:25:40 dpavlin r125

/cpan/0.05: CPAN release 0.05

2006-05-03 14:24:56 dpavlin r124

/trunk/MANIFEST: remove foo (how did it get there?)

2006-05-03 14:23:55 dpavlin r123

/trunk/MANIFEST, /trunk/Estraier.pm: prepare for relese 0.05

2006-05-02 10:19:47 dpavlin r122

/trunk/Estraier.pm: fixed warning

2006-04-17 10:38:17 dpavlin r121

/trunk/scripts/est-spider: skip directories without --all

2006-04-17 10:34:14 dpavlin r120

/trunk/scripts/est-spider: dump_draft is now triggered by --debug not --verbose

2006-04-17 10:31:11 dpavlin r119

/trunk/scripts/est-spider: added croak_on_error, fixed filename and filetype handling (so indexing now actually store content again), added --all option to index file paths of all files

2006-04-16 23:22:54 dpavlin r118

/trunk/scripts/est-spider: dump statistics at end

2006-03-12 19:43:21 dpavlin r117

/trunk/t/5_Node.t: fix skip count

2006-03-12 18:43:24 dpavlin r116

/trunk/Estraier.pm: implemeted Search::Estraier::Condition set_skip and skip (which requires HyperEstraier 1.1.4)

2006-03-12 18:42:34 dpavlin r115

/trunk/t/5_Node.t: added tests for get_doc

2006-03-12 15:26:32 dpavlin r114

/cpan/0.04: CPAN release 0.04

2006-03-12 15:25:06 dpavlin r113

/trunk/MANIFEST: updated

2006-03-12 15:20:06 dpavlin r112

/trunk/Estraier.pm: version 0.04 ready for CPAN

2006-02-21 15:41:57 dpavlin r111

/trunk/Estraier.pm: store all values from _set_info in $self->{inform}

2006-02-21 15:40:54 dpavlin r110

/trunk/t/5_Node.t: support 0 sa word_num (if HyperEstraier didn't have time to sync to disk)

2006-02-20 21:21:04 dpavlin r109

/trunk/t/5_Node.t: fix test if no test2 exists

2006-02-19 17:13:57 dpavlin r108

/trunk/Estraier.pm: fix typo

2006-02-19 17:01:49 dpavlin r107

/trunk/t/5_Node.t, /trunk/Estraier.pm: added node methods admins, guests and links, set_link now refresh info

2006-02-19 14:26:21 dpavlin r106

/trunk/t/5_Node.t: another fix for last few tests

2006-02-19 13:50:50 dpavlin r105

/trunk/t/5_Node.t: fix tests without node test1

2006-02-19 13:50:36 dpavlin r104

/trunk/Makefile.PL: make ChangeLog from svk

2006-01-28 20:44:15 dpavlin r103

/trunk/Estraier.pm, /trunk/t/5_Node.t: uri_to_id doesn't croak, even when used with croak_on_error

2006-01-28 19:46:20 dpavlin r102

/trunk/scripts/example_searcher.pl, /trunk/Estraier.pm, /trunk/scripts/example_indexer.pl: more documentation update

2006-01-28 19:43:23 dpavlin r101

/trunk/scripts/example_indexer.pl, /trunk/scripts/example_searcher.pl: updated example scripts

2006-01-28 19:41:59 dpavlin r100

/trunk/Estraier.pm, /trunk/t/5_Node.t: added $res->hits to get number of hits from estmaster hints

2006-01-28 19:19:25 dpavlin r99

/trunk/t/1_Document.t: removed debugging output

2006-01-28 19:18:13 dpavlin r98

/trunk/Estraier.pm, /trunk/t/2_Condition.t: improved $cond->set_options to support one or more arguments and reset options on each call, e.g. $cond->set_options('SURE') or $cond->set_options(qw/SURE NOIDF/)

2006-01-28 18:19:47 dpavlin r97

/trunk/Estraier.pm, /trunk/t/1_Document.t: another fix for empty values

2006-01-28 17:58:22 dpavlin r96

/trunk/Estraier.pm: fix for 0 values

2006-01-28 17:55:48 dpavlin r95

/trunk/t/1_Document.t: test handling of attributes with value 0

2006-01-28 17:38:00 dpavlin r94

/trunk/t/5_Node.t: cleanup test

2006-01-28 16:43:45 dpavlin r93

/trunk/Estraier.pm: Hyper Estraier 1.0.6 doesn't like attributes with no value (undef in perl), so we skip them in dump_draft

2006-01-26 15:29:20 dpavlin r92

/trunk/t/5_Node.t: fix number of tests skipped if test node is missing

2006-01-26 01:53:58 dpavlin r91

/trunk/t/5_Node.t, /trunk/Estraier.pm: added hints to return all hints from server

2006-01-26 01:53:29 dpavlin r90

/trunk/scripts/est-spider: created separate filter_to_pages sub, added text/postscript support via pstotext

2006-01-25 23:38:57 dpavlin r89

/trunk/scripts/est-spider: removed dependency on (optional in the first place) native HyperEstraier module

2006-01-21 18:25:09 dpavlin r88

/trunk/scripts/estcp.pl: fix URL extraction, make it less chatty (without -d flag)

2006-01-21 17:37:07 dpavlin r87

/trunk/scripts/estcp-mt.pl: fixed node URL extraction, put -1 marker on queue at end so that threads will finish after all documents are processed

2006-01-19 14:33:33 dpavlin r86

/trunk/scripts/estcp-mt.pl: multi-threaded version of estcp

2006-01-17 15:00:50 dpavlin r85

/trunk/scripts/estcp.pl: create desintaion node if it doesn't exist

2006-01-17 11:43:38 dpavlin r84

/trunk/scripts/estcp.pl: don't parse draft into document and back

2006-01-17 00:41:18 dpavlin r83

/trunk/scripts/estcp.pl: fixed to stop coping and prevent cumulation of results

2006-01-17 00:17:50 dpavlin r82

/trunk/scripts/estcp.pl: much better output of progress (requires Time::HiRes)

2006-01-17 00:03:45 dpavlin r81

/trunk/Estraier.pm: allow null (undef in perl) values. Hyper Estraier seems to store them, so we should also support them.

2006-01-16 23:08:07 dpavlin r80

/trunk/scripts/estcp.pl: copy Hyper Estraier index from one node to another

2006-01-16 21:47:21 dpavlin r79

/trunk/t/5_Node.t: test fix

2006-01-16 21:42:09 dpavlin r78

/trunk/Estraier.pm, /trunk/t/5_Node.t: added croak_on_error

2006-01-16 21:34:14 dpavlin r77

/trunk/scripts/est-spider, /trunk/Estraier.pm: fix warning if called without $node->set_auth (anonymous access)

2006-01-16 21:19:44 dpavlin r76

/trunk/Estraier.pm: return $node->{status} and $node->{status_message} if request wasn't succesful

2006-01-16 21:18:50 dpavlin r75

/cpan/0.03: CPAN release 0.03

2006-01-09 15:28:24 dpavlin r74

/trunk/Estraier.pm: 0.03 final

2006-01-09 15:26:50 dpavlin r73

/trunk/t/5_Node.t: fix to work without Hyper Estraier server installed

2006-01-09 15:22:43 dpavlin r72

/trunk/t/5_Node.t: tests now check if nodes 'test1' and 'test2' exists, and it they doesn't skip tests which connect to HyperEstraier

2006-01-09 15:22:05 dpavlin r71

/trunk/Estraier.pm: fix set_link content-type

2006-01-08 16:50:34 dpavlin r70

/cpan, /cpan/0.03_1: 0.03_1

2006-01-08 16:49:53 dpavlin r69

/trunk/MANIFEST, /trunk/Estraier.pm: 0.03_1 on the way to CPAN

2006-01-08 00:13:09 dpavlin r68

/trunk/scripts/example_searcher.pl, /trunk/Estraier.pm, /trunk/scripts/example_indexer.pl: two simple examples included under synopsis in documentation

2006-01-07 23:50:51 dpavlin r67

/trunk/MANIFEST: add META.yml

2006-01-07 23:48:59 dpavlin r66

/trunk/MANIFEST: added example scripts

2006-01-07 23:48:16 dpavlin r65

/trunk/Estraier.pm: add optional node parametar to new Search::Estraier::Node

2006-01-07 23:46:10 dpavlin r64

/trunk/scripts/cpanest, /trunk/scripts/est-spider, /trunk/scripts: added example scripts to crawl filesystem and index cpan

2006-01-07 16:19:31 dpavlin r63

/trunk/Estraier.pm: fix warning

2006-01-07 02:40:57 dpavlin r62

/trunk/Estraier.pm: requre just uri for ResultDocument, all other parametars are optional

2006-01-07 01:21:28 dpavlin r61

/trunk/Estraier.pm: transfer depth to cond_to_query

2006-01-07 00:00:15 dpavlin r60

/trunk/t/1_Document.t, /trunk/Estraier.pm: added few checks to better handle empty documents, array return is not enforced any more.

2006-01-06 23:29:58 dpavlin r59

/trunk/Makefile.PL, /trunk/Estraier.pm: replaced my broken socket code with LWP::UserAgent (as should I really done from beginning)

2006-01-06 21:05:05 dpavlin r58

/trunk/Estraier.pm: fix _set_info size (multiple nls)

2006-01-06 20:58:26 dpavlin r57

/trunk/Estraier.pm, /trunk/t/5_Node.t: added set_link (not working?) and moved debug to option

2006-01-06 20:45:48 dpavlin r56

/trunk/Estraier.pm, /trunk/t/5_Node.t: added set_user

2006-01-06 20:39:58 dpavlin r55

/trunk/t/5_Node.t, /trunk/Estraier.pm: move set_info to private _set_info, added set_snippet_width

2006-01-06 18:35:53 dpavlin r54

/trunk/Makefile.PL: don't leave Makefile.old after make clean (we really need to remove Makefile before clean target does mv Makefile Makefile.old)

2006-01-06 14:39:45 dpavlin r53

/trunk/t/4_NodeResult.t, /trunk/t/5_Node.t, /trunk/Estraier.pm: search work (Content-type and attributes fix), NodeResult->doc_num now return proper number of hits (and not index of last one which isi doc_num - 1)

2006-01-06 14:10:29 dpavlin r52

/trunk/Estraier.pm: search which works

2006-01-06 13:19:50 dpavlin r51

/trunk/t/5_Node.t, /trunk/Estraier.pm: cond_to_query needed for search (which is under construction)

2006-01-06 12:48:14 dpavlin r50

/trunk/t/5_Node.t, /trunk/Estraier.pm: added uri_escape where needed, fix edit_doc test

2006-01-06 12:40:23 dpavlin r49

/trunk/t/5_Node.t, /trunk/Makefile.PL, /trunk/Estraier.pm: added get_doc_attr and get_doc_attr_by_uri by (again) extending _fetch_doc, fixed etch_doc (typo in name)

2006-01-06 02:07:10 dpavlin r48

/trunk/Estraier.pm, /trunk/t/5_Node.t: added name, label, doc_num, word_num and size properties for which I had to implement set_info.

2006-01-06 01:51:28 dpavlin r47

/trunk/Estraier.pm: more checks, but still no hope for edit_doc

2006-01-06 01:40:04 dpavlin r46

/trunk/t/5_Node.t: small tweaks and corrections to tests

2006-01-06 01:36:09 dpavlin r45

/trunk/t/5_Node.t, /trunk/Estraier.pm: uri_to_id and important fix for _fetch_doc

2006-01-06 01:12:10 dpavlin r44

/trunk/t/5_Node.t, /trunk/Estraier.pm: added etch_doc and etch_doc_by_uri by extending _fetch_doc

2006-01-06 00:04:28 dpavlin r43

/trunk/Estraier.pm, /trunk/t/5_Node.t: better error messages, added get_doc and get_doc_by_uri

2006-01-05 23:38:32 dpavlin r42

/trunk/Estraier.pm, /trunk/t/5_Node.t: edit_doc, add massive amount of vertical whitespace to make source more readable

2006-01-05 23:32:31 dpavlin r41

/trunk/Estraier.pm, /trunk/t/5_Node.t: out_doc, out_doc_by_uri

2006-01-05 23:00:22 dpavlin r40

/trunk/t/5_Node.t, /trunk/Estraier.pm: a lot of tuning and fixes, and put_doc which works!

2006-01-05 22:36:10 dpavlin r39

/trunk/Estraier.pm, /trunk/t/5_Node.t: added optional parametar to Node to turn on debugging (which isn't documented and probably won't be because it spits output using warn)

2006-01-05 22:27:03 dpavlin r38

/trunk/Estraier.pm, /trunk/t/5_Node.t: more fun with http

2006-01-05 22:16:21 dpavlin r37

/trunk/Estraier.pm, /trunk/t/5_Node.t: much better error messages

2006-01-05 21:51:54 dpavlin r36

/trunk/Makefile.PL, /trunk/Estraier.pm: base64 encode basic auth data. Oh, why didn't I just use LWP?

2006-01-05 21:51:29 dpavlin r35

/trunk/t/5_Node.t: shuttle_url test which fails (and it shouldn't)

2006-01-05 21:09:53 dpavlin r34

/trunk/MANIFEST: fix tests

2006-01-05 17:54:18 dpavlin r33

/trunk/Makefile.PL, /trunk/Estraier.pm: implemetation of shuttle_url (using IO::Socket::INET instead of LWP for speed)

2006-01-05 15:38:34 dpavlin r32

/trunk/Estraier.pm, /trunk/t/5_Node.t: status

2006-01-05 15:36:25 dpavlin r31

/trunk/t/5_Node.t, /trunk/Estraier.pm: set_auth

2006-01-05 15:33:48 dpavlin r30

/trunk/Estraier.pm, /trunk/t/5_Node.t: set_timeout

2006-01-05 15:30:35 dpavlin r29

/trunk/t/5_Node.t, /trunk/Estraier.pm: set_url, set_proxy

2006-01-05 15:21:41 dpavlin r28

/trunk/t/5_Node.t: test

2006-01-05 15:21:29 dpavlin r27

/trunk/Estraier.pm: begin work on Search::Estraier::Node

2006-01-05 15:05:58 dpavlin r26

/trunk/t/1_Document.t: cleanup

2006-01-05 15:01:56 dpavlin r25

/trunk/Estraier.pm, /trunk/t/4_NodeResult.t: implemented Search::Estraier::NodeResult

2006-01-05 14:33:05 dpavlin r24

/trunk/Estraier.pm: cleanup

2006-01-05 14:30:42 dpavlin r23

/trunk/t/3_ResultDocument.t, /trunk/Estraier.pm: finished ResultDocument

2006-01-05 13:55:55 dpavlin r22

/trunk/t/1_document.t, /trunk/t/1_Document.t, /trunk/t/2_condition.t, /trunk/t/2_Condition.t: change case of test files

2006-01-05 13:55:17 dpavlin r21

/trunk/t/3_ResultDocument.t: test

2006-01-05 13:55:06 dpavlin r20

/trunk/Estraier.pm: begin work on Search::HyperEstraier::ResultDocument

2006-01-04 23:10:48 dpavlin r19

/trunk/t/2_condition.t, /trunk/Estraier.pm: finished Condition adding orders, attrs, max and options

2006-01-04 22:48:29 dpavlin r18

/trunk/Estraier.pm, /trunk/t/2_condition.t: phrase

2006-01-04 22:46:16 dpavlin r17

/trunk/t/2_condition.t: missing test for set_options (in last commit)

2006-01-04 22:43:24 dpavlin r16

/trunk/Estraier.pm, /trunk/t/2_condition.t: Search::Estraier::Condition, new, set_phrase, set_order, set_max

2006-01-04 22:24:57 dpavlin r15

/trunk/MANIFEST, /trunk/Estraier.pm, /trunk/t/2_condition.t: begin work on Search::Estraier::Condition, _s moved to Search::Estraier which other modules inherit

2006-01-04 21:51:01 dpavlin r14

/trunk/t/1_document.t, /trunk/Estraier.pm: new Document now accepts draft.

2006-01-04 19:37:38 dpavlin r13

/trunk/Estraier.pm: added implementation of dump_draft

2006-01-04 19:28:30 dpavlin r12

/trunk/t/1_document.t, /trunk/Estraier.pm: added cat_texts

2006-01-04 15:50:08 dpavlin r11

/trunk/t/1_document.t, /trunk/Estraier.pm: fix texts

2006-01-04 15:48:00 dpavlin r10

/trunk/t/1_document.t, /trunk/Estraier.pm: demonstrate bug with texts

2006-01-04 15:28:39 dpavlin r9

/trunk/t/1_document.t, /trunk/Makefile.PL, /trunk/Estraier.pm: added texts, fixed add_attr to delete atributes, tests now pass

2006-01-04 15:04:58 dpavlin r8

/trunk/Estraier.pm, /trunk/t/1_document.t: added $doc->attr('name'), fixed $doc->add_attr('name','value');

2006-01-04 14:57:27 dpavlin r7

/trunk/Estraier.pm: added attr_names

2006-01-04 14:48:11 dpavlin r6

/trunk/Estraier.pm, /trunk/t/1_document.t: added id, documentation, rename of vars in test

2006-01-04 14:38:35 dpavlin r5

/trunk/Estraier.pm: add_text, add_hidden_text

2006-01-04 13:33:07 dpavlin r4

/trunk/Estraier.pm, /trunk/t/1_document.t: added $doc->delete and internal _s

2006-01-04 13:13:06 dpavlin r3

/trunk/MANIFEST, /trunk/Makefile.PL: fix for path modifications

2006-01-04 13:11:43 dpavlin r2

/trunk/t, /trunk/t/1_document.t, /trunk/MANIFEST, /trunk/Makefile.PL, /trunk/t/99_pod.t, /trunk/Estraier.pm: begin work on pure perl implementation of HyperEstraier module

2006-01-04 13:11:32 dpavlin r1

/trunk: Directory for svk import.