"Ecclesiastes" really shouldn't be matched by "Ecclesiastical"

TalkTalk about LibraryThing

Join LibraryThing to post.

"Ecclesiastes" really shouldn't be matched by "Ecclesiastical"

1ArlieS
Aug 16, 3:50 pm

Searching among "your books" makes some attempt to find similar words, particularly cases like "book" and "books", "swift" and "swiftly". That's not always what I want - I'd love to have an obvious button to request only _exact_ matches, and another to request substring matches (so "swi" would match "swift" and "swiftly"). This is especially true given my habit of searching only for Titles/authors".

But today's silly matching is too good not to share. "Ecclesiastes" is a book of the bible. "Ecclesiastical" is an adjective meaning something like "pertaining to churches" (as organizations, not buildings). Even with the presumed common root, I found this quite surprising. Meanwhile, "eccles" - which I'd have typed in other software search fields if I wanted both - doesn't match anything.

2Taliesien
Edited: Aug 16, 4:17 pm

>1 ArlieS: "Even with the presumed common root, I found this quite surprising. Meanwhile, "eccles" - which I'd have typed in other software search fields if I wanted both - doesn't match anything."

It should not be surprising if you've read the search help page for Your books -> https://wiki.librarything.com/index.php/%22Your_books%22_Search (see the Stemming section)

If you want to search for eccles you would use a wildcard e.g. eccles* and can either search all fields by default or narrow the search to specific fields.
If you want to search for Ecclesiastes you would search for Eccles*es and not get the stemmed result Ecclesiastical.

3MarthaJeanne
Aug 16, 5:27 pm

The computer doesn't care about meaning. It cares about rules concerning strings of characters.

4r.orrison
Aug 16, 6:05 pm

>3 MarthaJeanne: ChatGPT in a nutshell.

5ArlieS
Aug 16, 8:42 pm

>3 MarthaJeanne: >4 r.orrison: Pretty strange rules, at least given some of their results.

>3 MarthaJeanne: Slightly more seriously - the explanation of "stemming" on the help page doesn't say enough. It led to me expecting "cat" to match "cats," "expect" to match "expected", "expects", and "expecting", and most likely "man" not to match "men."

Also, I'm curious about the term "stemming". It's fairly common for searches to match in spite of slight grammatical transformations, but the term "stemming" was so new and unintuitive that I'd forgotten it when I tried to write the description at the start of this thread. It's presumably a term of art from a field I don't follow - but what field?

And while I'm curious, should I expect "cat" to match "cating" or perhaps "catting" (or vice versa)? I.e. has the stemming software got any clue how to distinguish nouns from verbs from adjectives from adverbs?

6ljbryant
Aug 16, 9:40 pm

>5 ArlieS: Stemming is a linguistics term. See here: https://en.wikipedia.org/wiki/Stemming.

7Felagund
Aug 17, 12:45 am

> And while I'm curious, should I expect "cat" to match "cating" or perhaps "catting" (or vice versa)? I.e. has the stemming software got any clue how to distinguish nouns from verbs from adjectives from adverbs?

There can be more or less sophisticated stemming algorithms in software. The ones used most frequently only take the spelling of a word into account, not its grammatical type or meaning. They basically just strip a few characters at the end of a word, which works well enough in many cases (the algorithm is based on the more regular word derivation rules of natural language) - but obviously there will be exceptions that fail spectacularly.
The online help doesn't say which exact algorithm is used in LT's search, but I think we can safely assume that it is a rather simple one. More sophisticated methods could make search slower, and ideally LT would need the same level of sophistication in all translated websites (https://fr.librarything.com , https://www.librarything.de/ , ...) - supporting that wouldn't be easy.

8MarthaJeanne
Edited: Aug 17, 1:33 am

>7 Felagund: The current algorithm does not distinguish between languages, which can sometimes bring up strange results. (I'm not sure how it could considering that some of us have very multilingual libraries.)

9ArlieS
Aug 17, 2:06 am

>7 Felagund: That wikipedia article was very interesting, and explains some of the weirder (to me) results I've seen from various search interfaces, not primarily on LibraryThing.

I think I much prefer my tech tools to behave predictably, and to rely on the user to add sophistication when needed, rather than oscillating between amazing insight and amazing misunderstanding.

But I suspect I'm in somewhat of a minority with that preference. Most people seem far more impressed than I am by "works sometimes" technology.

10thorold
Aug 17, 4:14 am

>9 ArlieS: I find auto-stemming a nuisance too, since I used to do searches for a living in the good old days when everything was exact and predictable, but I can still appreciate that LT needs to be easy to use for those who are used to everything working like Google. It’s right to put the burden of extra complexity on the more demanding user, who is (or should be) competent to find workarounds. If they can’t work out how to find “Eccles”, let them eat cake…

11bnielsen
Aug 17, 5:50 am

>10 thorold: I mostly have the best of two worlds. I export my LT library and turn the file into a database so I can use whatever exact or inexact version of search tool. Some of the time that means that my "local" search has to give me a search string for LT consisting of Book_Id's with OR between, but that's easy to live with.

I think LT uses ElasticSearch. https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
Please correct me if I'm wrong :-)

12anglemark
Aug 17, 6:36 am

To add to what has been said above, search algorithms that are generous and do not require exact (or logical) matches are vital for dyslexics.

13SandraArdnas
Aug 17, 6:47 am

Putting the term in quotation marks looks for the exact match, so should avoid stemming

14MarthaJeanne
Aug 17, 6:55 am

>13 SandraArdnas: That is a very useful tip for the times when the results are too long to be helpful.

15ArlieS
Edited: Aug 17, 12:33 pm

>12 anglemark: I wouldn't expect stemming to help dyslexics as much as algorithms that look for homonyms. (Or more correctly, anything that might sound alike; it doesn't have to be a real word.) Those have been around for a while; the first one I encountered was part of the front end for a phone directory - IIRC it was the employee phone directory for HP, back in the 1990s when HP was huge.

There's also room for algorithms that do fuzzy matching, as with the venerable agrep search program for Unix (https://en.wikipedia.org/wiki/Agrep). You tell it how many missing or non-matching errors to ignore, per potential match, and it broadens its search accordingly. This might be even better for dyslexics.

16ArlieS
Aug 17, 12:39 pm

>10 thorold: <vent> I wish Google would at least document how their search engine works. My experience is that it's simply got less and less useful over time. There *may* be a way to get it to give me what I want, particularly in cases where what I want is not the most popular vaguely relevant thing. But google changes their behaviour routinely and without notice, as well as not documenting anything about it. (Or maybe they do document it, but not in a way that this user can find.) </vent>

17Keeline
Aug 17, 12:43 pm

I don't think LT wants to spend its time developing book-specific stemming rules. Most likely they are using what is available on the search system they are using. I agree that ElasticSearch is likely.

I usually use quotes with my searches and if this limits the stemming, perhaps that is why I have not had as many issues with it. The wildcard suggestions is an interesting technique to get around the stemming and have the word ending you desire. I will keep that in mind. Every system has its own quirks and shares some basic concepts with others.

At least the search is much better than what I am finding with Amazon and Facebook where you can enter four or five words in a quoted phrase and they will treat it as if you have not used quotes at all. Having one of the words (or what they consider to be a synonym) will be enough to display a result. It is vexing at times.

On the Mac I find that I have to use quotes sometimes and not others. In the Finder, if I want to search a phrase inside of files, I need to use quotes. But if I open something like a PDF in Preview or any Microsoft product, then I don't want the quotes to treat something as a phrase. It is tedious since the search field is populated (sometimes) with my phrase with quotes from the Finder search. I have to remove the quotes to make it work.

With LT and LoC, I often found I had to remove apostrophes in the titles. Something like "Anne's House of Dreams" would not work. But if I removed the apostrophe to "Annes House of Dreams" that it would work. But who thinks to do that.

Search is hard. Not everyone is as good as Google on this. Many try. Most fall short.

James

18paradoxosalpha
Aug 17, 3:14 pm

>16 ArlieS:

I assume the opacity of Google's search standards is by design. They have certainly changed over time, with results becoming steadily less useful to me over the last decade.

19timspalding
Aug 17, 3:43 pm

>4 r.orrison:

No ChatGPT involved. ChatGPT could probably distinguish them. This is just "stemming" as others have noticed. It's a hard problem. If we didn't stem, people would complain and the site would be less useful. And we can't be in the business of rewriting a crazy-common search program for a niche need.

20r.orrison
Aug 17, 4:12 pm

>19 timspalding: I wasn't claiming it was, and I don't think anyone took it that way. That was just an aside; >3 MarthaJeanne: is as near-perfect a short simple and understandable description of the problems of ChatGPT as "AI" as I've seen. Even though that wasn't the intention.

21paradoxosalpha
Aug 17, 4:33 pm

>20 r.orrison: as near-perfect a short simple and understandable description of the problems of ChatGPT as "AI"

Just substitute "words" for "letters," and you've got your chat generator. And give it a nice "DO U?" jacket like Melania Trump.