November 02, 2010
If you are looking for project ideas for your final year course requirements.
These projects require math + linguistic + computer science skills. So, these are quite challenging.
- Build stemmer for Indian Languages. Read about Stemming on Wikipedia. To put simply, stemming reduce a given word to its root form. For example, if word is "books", its stemmed for would be "book". This is very useful in search. Anyone searching for "book" will also get results matching "books." Presently, there is a Hindi stemmer available in Lucene. It would be great to have stemmer for major Indian languages.
- Part-Of-Speech tagger for Indian Languages. Probably this needs ground up work starting from building a tagged corpus. Microsoft has a team working on this. If this is your area of interest, you may land up in that team.