search command - specify branch

(1) By sean (jungleboogie) on 2022-01-26 17:16:36 [link] [source]

Hi,

This sqlite post talks about fossil search, which I didn't know was a command!

Couple questions...

This looks like it searches all check-in timeline commits, regardless of their branch. Is that right?
When the results are shown, it doesn't say which branch it was found on...

$ fossil search diff | more
=== 2022-01-23 ===
20:11:50 [1cb182ac18] Diff algorithm is slightly faster and does a better job of dealing with indentation changes in code. See [forum:/forumpost/7631656a2823338a|forum thread 7631656a2823338a].
19:57:44 [8cd73dda3d] Add a heuristic to the diff generator that helps it do a better job of identifying differences in C code that result from a change in indentation level.
11:29:31 [9aaefcfd0a] Additional alignment debugging information output for "fossil diff --debug".
00:31:52 [fbdbc09b40] Approximately a 5x performance increase for diff with the -w (ignore whitespace) option.

Is there already an existing command to search the timeline and show the branch the commit happened on? If not, might be kind of helpful to have the search command allow you to filter by branch and show you which branch the search string appeared on in the results.

The fossil timeline search on the webUI gives you the user and the branch in most cases.

(2) By Martin Vahi (martin_vahi) on 2022-01-28 07:34:25 in reply to 1 [source]

Just a few quick, hastily written, comments not about Fossil, but about the text search task in general.

Namely, I've been interested in that topic for years in the context of freedom of speech enforcement software (think of Tor, ZeroNet, IPFS,I2P, Beaker, etc.) and freedom of obtaining information for learning (whatever the learning topic, including political issues, math, etc.). In that context I've came up with the following observations and yes, as bad as it sounds, guesses:

x1) There's a nice free book available titled "Search Engines, Information Retrieval in Practice"

x1) Meaning depends on a context and if a search engine receives just one word, "porn", as its search query, then the list of search results that is considered relevant depends on who and why wants to get the search results. For example, some psychologists, who is conducting some research probably is not looking for porn videos, but scientific articles about the relationship between sexuality, porn, possibly other psychology related subjects. The phrase "climate change" should give references to more thorough, more detailed explanations, if some climate scientists is using the search engine, and some very superficial explanations, if some 7-year old child tries to find out, what the "climate change fuss" is all about.

x) The amount of information that it takes to describe the context of the search can be quite large. It would make sense to have something like the classical Web Browser bookmarks for search_contexts. In the case of the Fossil repositories, branch name/ID is one bit of information that describes the context of the search.

x) Index about some piece of information can have larger data volume than the piece of information itself. For example, a haiku, which is a very short poem, might have an index that lists the sources that use or reference the poem. The list of sources might have bigger data volume than the poem itself.

x) The amount of HDD space and compute power that people own at homes is probably way bigger than whatever the 2022 versions of Google and Amazon have at their data centers. Meaning, in theory, distributed, P2P search engines can outperform Google and Bing. There have been multiple attempts to create such P2P search engiens, both closed source P2P search engines and open source P2P search engines, but unlike the P2P social networks that connect with each other over common protocols, the different P2P search engines do not exchange data and therefore one P2P search engine can not benefit from the index/data of another P2P search engine.

x) In order to create an index of some text, the text has to be locally_available. It sounds trivial that search engines actually need to download the web page for indexing the web page. Combining that fact with the efforts to have multiple copies of "forbidden material"(whatever that happens to be at any given place and time, id est "democracy materials" in 2022 China, etc.) stored so that the "forbidden material" can not be totally destroyed, it makes sense that P2P search engine nodes index only locally stored data collections and also serve the data(for example, PDF-files) out as a P2P storage node.

x) The problems that emerge with technical documentation collections is very much the same as with the "forbidden material", even if the document collection is totally "legal"(according to the supermafia/state at that given place and time), specially given the trend that owners of P2P nodes store mostly material that they self find inderesting. HDD/SSD space actually costs quite a lot and there isn't that much of it in modern laptops. On top of that P2P systems have a nasty HDD/SSD usage pattern that slows down the computer_system/laptop as a whole, despite that the network traffic, CPU-time and RAM consumption are not an issue. Basically, any P2P storage system that serves others should run at a dedicated machine, not the main computer that the end user directly uses. It's OK for that computer to be some Raspberry_Pi, as long as it has its own, dedicated storage device (USB HDD or some memory stick, etc.). An "average Joe" does not invest into such a system even if it could be built relatively cheaply from old, discarded, computing equipment that is otherwise not even powered on.

So much about my hasty thoughts here right now, but the main takeaway is the idea of web browser bookmark style context-bookmarks, with the exception that a set of non-conflicting context-bookmarks could/should be used as a new context-bookmark and then a person first describes/selects_from_context_bookmarks_menu the context of the search. In the case of the Fossil the context might be: branch name/ID, commiter-ID, etc.

Thank You for reading my hasty comment.