Mastodawn

Kyle Piira Mar 15, 2020

So my university has shutdown the campus for the remainder of the semester due to Coronavirus concerns and asked all students to attend classes remotely (mainly using Zoom for live-streaming lectures). I went looking for an open source cross platform video conferencing solution with a fast onboarding process and found Jitsi to fit the bill.

It’s free, it’s FOSS, and there are no accounts required to create a chat session on their website. You just need to enter a name for your room, and they give you a link to share for people to join.

The only officially supported web browser is Google Chrome which kinda sucks. But it seems to work okay in Firefox except I couldn’t get it to detect any of my microphones (your usage may vary). Instead, I’m using it in Falkon and it works flawlessly.

Unfortunately, it also doesn’t appear that video chats are end-to-end encrypted which means whoever runs the server can see the raw footage (but you can self-host).

Overall it’s good enough and it looks like the public service is hosted by 8×8, which is a public VoIP company, so I’m not overly concerned about eavesdropping (due to the lack of end-to-end encryption). I’ll keep an eye out for better options but for now I’m sticking with Jitsi.

https://www.kylepiira.com/2020/03/15/jitsi-open-source-video-chat/

Video Conferencing, Web Conferencing, Webinars, Screen Sharing

Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom Rooms is the original software-based conference room solution used around the world in board, conference, huddle, and training rooms, as well as executive offices and classrooms. Founded in 2011, Zoom helps businesses and organizations bring their teams together in a frictionless environment to get more done. Zoom is a publicly traded company headquartered in San Jose, CA.

Zoom

Kyle Piira Mar 3, 2020

Today, I tried out KDE Neon on my PinePhone “Brave Heart” and recorded the following video.

Here is a summary of some of the default apps:

Buho – the default note taking app. Notes can be tagged by color, keyword, and organized into “books”. It can also save URLs.
Discover – the same KDE software center available on the desktop.
Index – the file manager which draws inspiration from Dolphin.
KDE Connect – sync your Plasma Mobile phone with your Plasma Desktop.
Koko – the photo gallery and viewer. Has some issues with thumbnails.
Konsole – the same KDE terminal emulator available on the desktop.
Okular – the PDF reader for Plasma Mobile. It’s a different application from Okular for Plasma Desktop.
Phone Book – stores your contacts phone numbers, emails, etc.
Settings – settings app for Plasma Mobile which is currently missing some categories (ex: battery).
Wave – the default music player which don’t have any sound right now.
Phone – the dialer app for calling numbers and contacts.
Angelfish – the default web browser which has support for tabs, history, bookmarks, etc.
Calindori – the default calendar app but I couldn’t figure out how to add events.

https://www.kylepiira.com/2020/03/03/plasma-mobile-on-the-pinephone/

YouTube

Auf YouTube findest du großartige Videos und erstklassige Musik. Außerdem kannst du eigene Inhalte hochladen und mit Freunden oder mit der ganzen Welt teilen.

Kyle Piira Feb 12, 2020

@PINE64 PinePhone in the mail today.

Kyle Piira Feb 9, 2020

Elisa (KDE music player) is now on the Windows Store! https://www.microsoft.com/en-us/p/elisa/9pb5md7zh8tl

Get Elisa - Microsoft Store

A modern and beautiful music player made with love by KDE.

Kyle Piira Feb 7, 2020

I was recently wondering which of the popular web search engines provided the best results and decided to try to design an objective benchmark for evaluating them. My hypothesis was that Google would score the best followed by StartPage (Google aggregator) and then Bing and it’s aggregators.

Usually when evaluating search engine performance there are two methods I’ve seen used:

Have humans search for things and rate the results
Create a dataset of mappings between queries and “ideal” result URLs

The problem with having humans rate search results is that it is expensive and hard to replicate results. Creating a dataset of “correct” webpages to return for each query solves the repeatability of the experiment problem but is also expensive upfront and depends on the human creating the dataset’s subjective biases.

Instead of using either of those methods I decided to evaluate the search engines on the specific task of answering factual questions from humans asked in natural language. Each engine is scored by how many of its top 10 results contain the correct answer.

Although this approach is not very effective at evaluating the quality of a single query, I believe in aggregate over thousands of queries it should provide a reasonable estimation of how well each engine can answer the users questions.

To source the factoid questions, I use the Stanford Question Answering Dataset (SQuAD) which is a popular natural language dataset containing 100k factual questions and answers from Wikipedia collected by Mechanical Turk workers.

Here are some sample questions from the dataset:

Q: How did the black death make it to the Mediterranean and Europe?

A: merchant ships

Q: What is the largest city of Poland?

A: Warsaw

Q: In 1755 what fort did British capture?

A: Fort Beauséjour

Some of the questions in the dataset are also rather ambiguous such as the one below:

Q: What order did British make of French?

A: expulsion of the Acadian

This is because the dataset is designed to train question answering models that have access to the context that contains the answer. In the case of SQaUD each Q/A pair comes with the paragraph from Wikipedia that contains the answer.

However, I don’t believe this is a huge problem since most likely all search engines will perform poorly on those types of questions and no individual one will be put at a disadvantage.

Collecting data

To get the results from each search engine, I wrote a Python script that connects to Firefox via Selenium and performs searches just like regular users via the browser.

The first 10 results are extracted using CSS rules specific to each search engine and then those links are downloaded using the requests library. To check if a particular result is a “match” or not we simply perform an exact match search of the page source code for the correct answer (both normalized to lowercase).

Again this is not a perfect way of determining whether any single page really answers a query, but in aggregate it should provide a good estimate.

Some search engines are harder to scrape due to rate limiting. The most aggressive rate limiters were: Qwant, Yandex, and Gigablast. They often blocked me after just two queries (on a new IP) and thus there are fewer results available for those engines. Also, Cliqz, Lycos, Yahoo!, and YaCy were all added mid experiment, so they have fewer results too.

I scraped results for about 2 weeks and collected about 3k queries for most engines. Below is a graph of the number of queries that were scraped from each search engine.

Crunching the numbers

Now that the data is collected there are lots of ways to analyze it. For each query we have the number of matching documents, and for the latter half of queries also the list of result links saved.

The first thing I decided to do was see which search engine had the highest average number of matching documents.

Much to my surprise Google actually came in second to Ecosia. I was rather shocked with this since Ecosia’s gimmick is that they plant trees with the money from ads, not having Google beating search results.

Also surprising is the number of Bing aggregators (Ecosia, DuckDuckGo, Yahoo!) that all came in ahead of Bing itself. One reason may be that those engines each apply their own ranking on top of the results returned by Bing and some claim to also search other sources.

Below is a chart with the exact scores of each search engine.

Search EngineScoreCountEcosia2.820871778555523143Google2.653978159126363205DuckDuckGo2.583777012214223193StartPage2.557232704402523180Yahoo!2.512204424103742622Bing2.48093753200Qwant2.32365747460087689Yandex1.926519337016571810Gigablast1.51381215469613905Cliqz1.397241379310342900Lycos1.209626787582842867YaCy0.8980503655564582462

To further understand why the Bing aggregators performed so well I wanted to check how much of their own ranking was being used. I computed the average Levenshtein distance between each two search engines, which is the minimum number of single result edits (insertions, deletions or substitutions) required to change one results page into the other.

Edit distance matrix of different search results

Of the three, Ecosia was the most different from pure Bing with an average edit distance of 8. DuckDuckGo was the second most different with edit distance of 7, followed by Yahoo! with a distance of 5.

Interestingly the edit distances of Ecosia, DuckDuckGo, and Yahoo! seem to correlate well with their overall rankings where Ecosia came in 1st, DuckDuckGo 3rd, and Yahoo! 5th. This would indicate that whatever modifications these engines have made to the default Bing ranking do indeed improve search result quality.

Closing thoughts

This was a pretty fun little experiment to do, and I am happy to see some different results from what I expected. I am making all the collected data and scripts available for anyone who wants to do their own analysis.

This study does not account for features besides search result quality such as instant answers, bangs, privacy, etc. and thus it doesn’t really show which search engine is “best” just which one provides the best results for factoid questions.

I plan to continue using DuckDuckGo as my primary search engine despite it coming in 3rd place. The results of the top 6 search engines are all pretty close, so I would expect the experience across them to be similar.

https://www.kylepiira.com/2020/02/07/which-search-engine-has-the-best-results/

The Stanford Question Answering Dataset

Stanford Question Answering Dataset (SQuAD) is a new reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. With 100,000+ question-answer pairs on 500+ articles, SQuAD is significantly larger than previous reading comprehension datasets.

Kyle Piira Feb 7, 2020

elementary Feb 7, 2020

Help us build the next generation of AppCenter—and get sweet rewards for backing! We’re planning a week-long in-person development sprint. Learn what we’ll be up to, and why. #AppCenterForEveryone https://www.indiegogo.com/projects/appcenter-for-everyone/

AppCenter for Everyone

Taking the indie, open source app store to the next level | Check out 'AppCenter for Everyone' on Indiegogo.

Kyle Piira Jan 25, 2020

PINE64 Jan 24, 2020

A recent article on a big tech news site included this phrase:

"[...] Linux phones like the PinePhone, [...]are full of closed-source firmware from non-open components"

We'd like to clear the record: The #PinePhone has two blobs -- neither runs on the main SoC: One loaded to WiFi/BT module, other enclosed within the cell modem. In the modern world of tech, both blobs are unavoidable.

For an overview from someone with deep knowledge of both the PinePhone and Librem 5: https://tuxphones.com/yet-another-librem-5-and-pinephone-linux-smartphone-comparison/

Yet Another Librem 5 and PinePhone comparison

Let's start off with mentioning that both these new phones are great steps forward for Linux. While they will probably not beat Android and iOS in popularity, they will at least give Linux power users a device that can be called a Linux phone instead of the usual "technically it's

TuxPhones - Linux phones, tablets and portable devices

Kyle Piira Jan 20, 2020

Got news today that my @PINE64 PinePhone has shipped.

Kyle Piira Jan 19, 2020