Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

Your Help on Nutch!

Off-Topic discussions about science, technology, and non Debian specific topics.
Post Reply
Message
Author
tesfayeguta
Posts: 1
Joined: 2010-05-31 06:00

Your Help on Nutch!

#1 Post by tesfayeguta »

Am happy to join the forum and hope have lots of issues to be discussed.

I am doing on a local search engine as a masters thesis. I configured Nutch search engine and able to play with it.
I came to know that Nutch automatically indexs the documents it has crawled. But what I need is to do some
preprocessing on the downloaded pages before indexing. Can someone tell me how to go about? If this is the wrong place
to post the question, could you tell me which group to join?

Thank u in advance!
- Tesfaye
Last edited by tesfayeguta on 2010-06-02 13:11, edited 2 times in total.

User avatar
bugsbunny
Posts: 5354
Joined: 2008-07-06 17:04
Been thanked: 1 time

Re: Your Help on Nutch!

#2 Post by bugsbunny »

1) What does this have to do with Debian?
2) Download the source, see what it's doing when and where. Modify the code to suit your needs.

Post Reply