tech blog

Issues found by examining the open source full-text search server Fess and things to consider in the introduction

This time,Open source full-text search server FessI would like to introduce about

The other day, a certain customer sent us a request to our internal Windows file server and portal siteI want you to respond so that you can search using Fess, there was a consultation.

There are two reasons why customers choose to use Fess.

  1. The other party also has some knowledge
  2. I can't raise the construction cost much, so I want to build it cheaply with existing things as much as possible

So, I decided to investigate Fess first.

sanel logo
Sunel Development Department
In the second half, the survey foundIssues and things to consider when introducingis summarized.

What is Fess in the first place?

Fess is a free full-text search engine product. The feature is that "search function for end users" and "function for administrators such as settings" are set.

I will omit the detailed explanation about "full-text search". If you are interested,Elastic siteIt is easy to understand, so please take a look.

ElasticSearch is included inside Fess,It is also possible to specify an external ElasticSearchIt is.

If you have it internally, it will consume a lot of memory, so it seems that the phenomenon that the response will be lost for a while with low specs will occur.This problem is solved by using the original ElasticSearch SaaS service.

just,If you use this service, you will be charged a service fee.

So, I think it's a good idea to consider which one to use in consideration of the operating environment (specs).

Also, if JAVA works, it will work on any OS without problems. Detail is,Fess Official SitePlease refer to the.

In addition, a company called N2 System is commercializing it, and there isdemo siteYou can try searching for Fess at

Elastic Search

An open source full-text search engine developed by Elastic. Documents containing target words can be extracted at high speed from a large number of documents.

What can you do with Fess?

  • Search against various data sources
  • Various search conditions
  • Management function
  • Supports various search target files
  • Dictionary registration function
  • APIs
  • Open Source

Search against various data sources

You can search based on data collected by crawling various data sources.

Specifically, first, you can crawl the website and search within the site.

Next, you can search for files on file servers and local directories, and search for text within files.

Also,You can also search against data sources such as MySQLsoI think it has many uses.

Various search conditions

Fess's search conditions are not only common conditions such as partial match, exact match, excluded characters,In addition, you can search by specifying various conditions in detail.

Specifically, there are the following search methods.

retrieval method content
AND search Find documents that contain all of your multiple search terms
OR search Find documents that contain any of multiple search terms
NOT search Used to find documents that do not contain a word
Search by label (category search) By adding label information for categorizing the documents to be searched, narrowing down the search by specifying the label at the time of searching.
Search by field The results of crawling with Fess are saved for each field such as title and body. Search by specifying those fields
Sort search Sort search results by specifying fields such as search date and time
Wildcard search Search using single or multiple character wildcards within search terms
range search If a range can be specified, such as a number, and the data is stored in a field, a range-specified search is performed for that field.
Boost search (weighted search) If you want to give priority to a specific search term among the search terms, search according to the importance of the search term
Fuzzy search (fuzzy search) Search method for searching words that do not exactly match the search term
Location search By adding latitude and longitude location information to each document when generating an index, searches using location information are performed.
Hidden search criteria Use the ex_q parameter when you want to route specific search conditions without displaying the string of search conditions on the screen. Even if the screen transitions by paging, the conditions can be retained without displaying the conditions on the screen.
role search A search method that can be used by users logging in using the user management function of Fess. After logging in as a user managed by Fess, change role search and user password
Search for special characters You can use it as a search character by escaping special characters such as:
+ - && || ! ( ) { } ^ " ~ * ? : \ /
detailed search Search with more complex conditions from the advanced search screen

Management function

Fess has a management function, and you can set general settings such as crawl targets and schedule functions.

What are the specific settingsFess official site administrator guideCheck it out.

Supports various search target files

A full range of files is supported, including PDF and Microsoft Office Word.

Fess search target file

  • text (txt)
  • XML (xml, xhtml, mm, etc.)
  • HTML (html, htm)
  • MS Office (doc, xls, ppt, docx, xlsx, pptx, etc.)
  • PDF (pdf etc.)
  • Source code (js, c, h, java, etc.)
  • Compressed files (gz, tar, zip, etc.)
  • rich text (rtf)
  • ePub
  • Audio/Image/Video (extract metadata)
  • mbox
  • ai file (PDF compatible)

Dictionary registration function

Since there is a dictionary registration function, for example, with "sanel" and "sunl"If you want to map proper nouns, you can do so by registering dictionaries individually.

APIs

Since there is an API,You can easily call and use a search engine from outside such as a web system.

however,Since it is only GET type, there is no crawl setting or update type API for administrators.

For API specifications,Fess official site API guidePlease refer to the.

Open Source

It's open source, so you can freely modify it.You can drop the source from github.

however,Customization requires a certain knowledge of FessIt is.

(Reference) Recommended specifications for using Fess

Although it is not officially announced, it seems that the specs are quite necessary.

Below, we will introduce the information of the Fess introduction company as a reference.

(Reference) Recommended specifications for using Fess

  • CPU 2 cores (4 cores or more recommended)
  • Memory 8GB (16GB or more recommended)
  • Hard disk 200GB (500GB or more recommended depending on data capacity)

Quote:https://www.designet.co.jp/ossinfo/fess/support/

If you want to quickly verify locally,DockerIt's quick.

Two issues found in this Fess survey

I tried running Fess on a local Docker as a trial, but the following two seem to be issues.

Problem 1 | Powerpoint search does not appear

Powerpoint used for testingIt's recognized and indexed by crawls, but searching for text in files doesn't show up in search results.

Since PowerPoint is supported, there are likely to be certain conditions that are not retrieved, which requires investigation and verification.

Issue 2 | Setting Tuning

for exampleIf a large number of files are searched, the server may go down if the settings are not appropriate considering the server specs.

actually localWhen searching a directory with hundreds of files, it timed out on the way and hung up as it was.

Things to consider when introducing Fess

In introducing Fess,I think it would be better to check and consider at least the following three.

The first isUnderstand the total capacity of the target dataIt is.

If you don't know this, you can't decide the disk capacity to prepare. Also, how much setting tuning is necessary will change.

Next, clarify the types of data and files to be searched, andIt is better to check whether the contents supported by Fess are coveredIt is.

The last is the target data environment.

for example,Whether it is a web system or storage, and whether there is authentication such as AD (Active Directory)The setting contents change depending on the target.

[Summary] Investigation of open source full-text search server Fess

I investigated Fess this time, but as an impression, it seems to be quite convenient if you want to easily enter full-text search for internal file servers and web servers in the on-premises environment.

You don't have to create a GUI.

just,Not very suitable if you have to customize the screen in detailI feel like

Of course, it's open source, so it's customizable, but in the end, I don't think it's a good idea because it entails maintenance costs, such as how to deal with updates.

sanel logo
Sunel Development Department
We will continue to introduce various IT tools and services in the world, so please look forward to it.

Remi - upper body sideways
MieL" was launched with the aim of making "connections" among regions, businesses, and people in Mie Prefecture visible in a tangible form. The site offers a variety of contents useful for business and daily life, including information on gourmet food and stores in the prefecture, San-El's activities, and digital technology.
*Operated by Matsusaka City, Mie Prefecture Sun-L Corporation has been conducted by

-tech blog
-, ,

en_USEnglish

© 2024 MieL