This post explain how to scrap a PDF https://norvilis.com/not-just-html-scraping-data-from-messy-pdfs/ other thing that I didn't know hahaha
#ruby #pdf #scrapping
Not Just HTML: Scraping Data from Messy PDFs

The “Final Boss” of Data Extraction You’ve mastered Nokogiri. You can navigate a nested <div> structure with your eyes closed. But then, your client or your boss sends you a link to a folder containing 500 PDF invoices, reports, or government filings.

DevBlog by Zil Norvilis
Alberta’s premier consulting on scrapping clock changes, prefers more light at night
Danielle Smith says the Alberta government will consult on doing away with twice-a-year clock changes and says if a switch is made, she'd prefer going to permanent daylight time.
#Canada #AlbertaLegislature #DaylightSavingTime
https://globalnews.ca/news/11718249/alberta-time-change-consultations/
Alberta’s premier consulting on scrapping clock changes, prefers more light at night
Danielle Smith says the Alberta government will consult on doing away with twice-a-year clock changes and says if a switch is made, she'd prefer going to permanent daylight time.
#Canada #AlbertaLegislature #DaylightSavingTime
https://globalnews.ca/news/11718249/alberta-time-change-consultations/
Alberta’s premier consulting on scrapping clock changes, prefers more light at night
Danielle Smith says the Alberta government will consult on doing away with twice-a-year clock changes and says if a switch is made, she'd prefer going to permanent daylight time.
#Canada #AlbertaLegislature #DaylightSavingTime
https://globalnews.ca/news/11718249/alberta-time-change-consultations/
Alberta’s premier consulting on scrapping clock changes, prefers more light at night
Danielle Smith says the Alberta government will consult on doing away with twice-a-year clock changes and says if a switch is made, she'd prefer going to permanent daylight time.
#Canada #AlbertaLegislature #DaylightSavingTime
https://globalnews.ca/news/11718249/alberta-time-change-consultations/
Alberta’s premier consulting on scrapping clock changes, prefers more light at night
Danielle Smith says the Alberta government will consult on doing away with twice-a-year clock changes and says if a switch is made, she'd prefer going to permanent daylight time.
#Canada #AlbertaLegislature #DaylightSavingTime
https://globalnews.ca/news/11718249/alberta-time-change-consultations/
Alberta’s premier consulting on scrapping clock changes, prefers more light at night
Danielle Smith says the Alberta government will consult on doing away with twice-a-year clock changes and says if a switch is made, she'd prefer going to permanent daylight time.
#Canada #AlbertaLegislature #DaylightSavingTime
https://globalnews.ca/news/11718249/alberta-time-change-consultations/

Current status: In "Energy Saving Mode" until further notice. 💤
If you’re the CEO of napping, this Kyoky tee was made for you. No alarms allowed.

Shop the "Too Tired" Grey Cat Tee. Link in bio. 🐾
.
https://www.redbubble.com/shop/ap/178646309
.
#Love, #Cat, #Heart, #Animal, #Animals, #Red, #Cute, #Cats, #Orange, #Adorable, #CuteCats, #Flying, #Angel, #Fanart, #Charm, #Scrapping, #Scrap

Kelly Evans: Goodbye, Google – CNBC

Kelly Evans, Co-Host of CNBC’s Power Lunch. David A. Grogan | CNBC

The Exchange

Kelly Evans: Goodbye, Google

Published Wed, Jan 7 20269:51 AM EST

Kelly Evans@KellyCNBC Share

Kelly Evans, Co-Host of CNBC’s Power Lunch. David A. Grogan | CNBC

Kelly Evans, Co-Host of CNBC’s Power Lunch, David A. Grogan | CNBC

For the first time yesterday, when I went to Google (which I do less and less of anymore), it asked me if I wanted to switch over to AI mode. I figured sure, since I’ve been using its AI summaries anyhow, and it put me into what looked like a full-blown version of Gemini or ChatGPT. No ads. No blue links. 

What this tells me is that the era of Google as we knew it is officially over. And I, for one, certainly do not mourn that. As great as the product was when it was first introduced, is as bad as it became towards the end. Good luck finding any really useful links buried beneath all of their ads. Smaller businesses were justifiably furious at having to pay to compete up top against the deep-pocketed big guys for traffic they were rightly owed. 

The search engine, in other words, had become nowhere near as effective as it used to be. And while regulators drool at the opportunity to jump in and set rules and prosecute offenders and collect big fines, it’s far better for society that monopolies are disrupted because they become less useful and leave an opening for better products to break through. 

Enter ChatGPT. 

Now, the irony here is twofold. One, after a few early missteps, Google has answered ChatGPT with its own excellent chatbot, Gemini, that we are using more and more of in our house. (If you want a chuckle, ask it to give you this job/personality test.) Shares of parent company Alphabet soared 65% last year, for the best performance of all the “Magnificent 7.” 

So while it’s goodbye to Google as a search box, the company itself has pivoted nicely, and regulators can at least breathe somewhat easy that they have a formidable rival now, with ChatGPT’s 900 million active weekly users. 

Secondly, the real question is what happens to the entire internet ecosystem that once relied upon Google search traffic. I’m thinking of recipe bloggers, websites like Vice and Buzzfeed that once had hundreds of employees, and so forth. I’m not sure how much of the former internet economy realizes it’s never coming back, and they will have to dream up entirely new business models. 

I also expect the lawyers are salivating. The class action suits against chatbots that scraped the internet of content that people once made a living off of must surely be coming apace. News providers can survive and even thrive in a world where chatbots have to pay to maintain access to up-to-the-minute information for users, but someone who gave the internet their maple-glazed salmon recipe? Forget about it. 

Continue/Read Original Article Here: Kelly Evans: Goodbye, Google

#AI #artificialIntelligence #BusinessModels #ChatGPT #CNBC #Gemini #GoogleGoodbye #KellyEvans #Scrapping #SearchEngines #SearchInternet #SEO #TooManyAds #WebSites

The New York Times sues Perplexity for producing ‘verbatim’ copies of its work – The Verge

Credit: NYT Times, gettyimages-2249036304

The New York Times sues Perplexity for producing ‘verbatim’ copies of its work

The NYT alleges Perplexity ‘unlawfully crawls, scrapes, copies, and distributes’ work from its website.

by Emma Roth, Dec 5, 2025, 7:42 AM PS, Emma Roth is a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO.

The New York Times has escalated its legal battle against the AI startup Perplexity, as it’s now suing the AI “answer engine” for allegedly producing and profiting from responses that are “verbatim or substantially similar copies” of the publication’s work.

The lawsuit, filed in a New York federal court on Friday, claims Perplexity “unlawfully crawls, scrapes, copies, and distributes” content from the NYT. It comes after the outlet’s repeated demands for Perplexity to stop using content from its website, as the NYT sent cease-and-desist notices to the AI startup last year and most recently in July, according to the lawsuit. The Chicago Tribune also filed a copyright lawsuit against Perplexity on Thursday.

The New York Times sued OpenAI for copyright infringement in December 2023, and later inked a deal with Amazon, bringing its content to products like Alexa.

Perplexity became the subject of several lawsuits after reporting from Forbes and Wired revealed that the startup had been skirting websites’ paywalls to provide AI-generated summaries — and in some cases, copies — of their work. TheNYT makes similar accusations in its lawsuit, stating that Perplexity’s crawlers “have intentionally ignored or evaded technical content protection measures,” such as the robots.txt file, which indicates the parts of a website crawlers can access.

Perplexity attempted to smooth things over by launching a program to share ad revenue with publishers last year, which it later expanded to include its Comet web browser in August.

Related

“By copying The Times’s copyrighted content and creating substitutive output derived from its works, obviating the need for users to visit The Times’s website or purchase its newspaper, Perplexity is misappropriating substantial subscription, advertising, licensing, and affiliate revenue opportunities that belong rightfully and exclusively to The Times,” the lawsuit states.

Continue/Read Original Article Here: The New York Times sues Perplexity for producing ‘verbatim’ copies of its work | The Verge

#AI #artificialIntelligence #Copyright #Crawlers #Distribution #Lawsuit #NYTWork #OpenAI #Perplexity #RobotsTxt #Scrapping #Sues #TheNewYorkTimes #TheVerge #VerbatimCopies