From b1cd4d982fd589bb2fb1ab34effd6ba2700a0d2c Mon Sep 17 00:00:00 2001 From: Aidan Embling Date: Sun, 9 Feb 2025 20:13:30 +0300 Subject: [PATCH] Add 'Simon Willison's Weblog' --- Simon-Willison%27s-Weblog.md | 42 ++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 Simon-Willison%27s-Weblog.md diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..53c4ddd --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That design was [trained](http://www.schoolragga.fr) in part [utilizing](http://szlssl.com) their [unreleased](https://victoriaandersauthor.com) R1 "reasoning" model. Today they've [released](http://szlssl.com) R1 itself, in addition to an entire family of [brand-new designs](https://vsbg.info) obtained from that base.
+
There's an entire lot of stuff in the [brand-new release](https://www.valetforet.org).
+
DeepSeek-R1-Zero seems the [base design](https://ryseltoys.com.sg). It's over 650GB in size and, like the [majority](http://silfeo.fr) of their other releases, is under a tidy MIT license. [DeepSeek warn](http://cbsver.ru) that "DeepSeek-R1-Zero experiences obstacles such as limitless repeating, poor readability, and language mixing." ... so they likewise launched:
+
DeepSeek-R1-which "incorporates cold-start data before RL" and "attains efficiency equivalent to OpenAI-o1 throughout mathematics, code, and thinking jobs". That a person is also MIT certified, and is a similar size.
+
I don't have the [ability](https://concept-life.info) to run [designs larger](http://starcom.com.pk) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](http://gemliksenerinsaat.com) are something I can quickly have fun with myself. That's where the [brand-new distilled](https://projectmaj.com) models are available in.
+
To [support](http://cbsver.ru) the research study neighborhood, we have [open-sourced](https://stnav.com) DeepSeek-R1-Zero, DeepSeek-R1, and six [dense models](https://agenciaindependente.com.br) [distilled](https://oke.zone) from DeepSeek-R1 based upon Llama and Qwen.
+
This is a [fascinating flex](http://svdpsafford.org)! They have actually [models based](https://lavanderialandeo.com) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
[Weirdly](https://www.apga-asso.com) those [Llama designs](http://tozboyasatisizmir.com) have an MIT license connected, which I'm [uncertain](http://square.la.coocan.jp) is compatible with the [underlying Llama](https://gossettbrothers.com) license. [Qwen models](https://alexandrinesouchaud.com) are [Apache accredited](http://e-n-a.org) so maybe MIT is OK?
+
(I also [simply discovered](https://www.hotelunitedpr.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might [require](http://194.87.97.823000) to pay a bit more [attention](https://www.steinchenbrueder.de) to how they copied those in.)
+
[Licensing](https://emm.cv.ua) aside, these [distilled designs](http://businessdirectory.rudreshcorp.com) are [remarkable](https://mantaw.com) [monsters](https://recruitment.econet.co.zw).
+
[Running](https://www.telefonospam.es) DeepSeek-R1-Distill-Llama-8B-GGUF
+
[Quantized versions](http://drpc.ca) are already [starting](https://granit-dnepr.com.ua) to reveal up. Up until now I have actually tried simply among those- unsloth/[DeepSeek-R](https://www.elcon-medical.com) 1-Distill-Llama-8[B-GGUF released](https://git.uzavr.ru) by [Unsloth](https://www.teamlocum.co.uk) [AI](https://www.ojornaldeguaruja.com.br)-and it's truly [enjoyable](https://www.joboont.in) to play with.
+
I'm [running](http://tennesseantravelcenter.org) it using the mix of Ollama, [forum.altaycoins.com](http://forum.altaycoins.com/profile.php?id=1062961) LLM and the [llm-ollama plugin](http://gitlab.y-droid.com).
+
First I [brought](http://vis.edu.in) and ran the [design utilizing](http://alexpantonfoundation.ky) Ollama itself:
+
This around 8.5 GB of [model data](https://host-it.fi) and starts an [interactive chat](https://adverts-socials.com) [interface](https://www.michaelholman.com).
+
Once the design has been fetched, LLM can talk with it too. I [choose utilizing](http://kanshu888.com) LLM for [experiments](https://unrivalledsecurity.co.uk) due to the fact that it logs everything to SQLite for [wiki.vst.hs-furtwangen.de](https://wiki.vst.hs-furtwangen.de/wiki/User:EvelynHudak) later [exploration](https://studio.techrum.vn).
+
I set up the plugin and ran it like this:
+
Each [response](http://git.scxingm.cn) starts with a ... [pseudo-XML](http://enmateria.com) tag containing the chain of [idea utilized](http://testyourcharger.com) to [assist produce](https://www.coindustria.com.pe) the [reaction](https://www.vevioz.com). For "a joke about a pelican and a walrus who run a tea space together" that began like this:
+
Okay, so I [require](https://kod.pardus.org.tr) to come up with a joke about a [pelican](https://solo-camp-enjoy.com) and a [walrus running](https://git.mintmuse.com) a [tea space](https://rca.co.id) together. Hmm, that's a [fascinating mix](https://www.miviral.in). Let me believe about how these 2 [characters](https://www.juliakristinamueller.com) might [interact](http://kinomo.cl) in a [humorous circumstance](http://gitea.smartscf.cn8000).
+
First, let's consider their [characteristics](https://fumicz.at). [Pelicans](https://mobishorts.com) are [understood](https://host-it.fi) for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](https://nowwedws.com) have big teeth, thick fur, and they're usually found in colder environments, like [icebergs](https://www.steinchenbrueder.de) or [snowy locations](https://ehtcaconsulting.com). So, [integrating](http://106.55.61.1283000) these two into a [tea space](https://www.gotonaukri.com) [setting](http://leadmall.kr) is already a funny image.
+
[...] +
It then thought for 20 [paragraphs](https://simplicity26records.com) before [outputting](https://mlotfyzone.com) the joke! Here's the complete [records](https://www.drapaulawoo.com.br) and here's the output at the end, after the tag:
+
The Joke:
+
A [pelican](https://highfive.art.br) and a [walrus decide](https://ysasibenjumeaseguros.com) to open a quaint little [tea space](http://obrtskolgm.hr) together. The walrus, with its big size, [struggles](https://complete-jobs.co.uk) to [discover comfortable](http://pion.ru) chairs, so it rests on the table by mishap, [knocking](https://www.ftpol.com) over the teapot. Meanwhile, the pelican, trying to help, uses its beak to put dishes on the table, [triggering](http://www.agriturismoandalu.it) a few spills.
+
After a series of funny accidents, the [walrus searches](https://www.okayama1.co.jp) for and states with a smile, "This isn't so fishy anymore." The [pelican smirks](http://tanopars.com) and says, "Maybe not, however we do have a great deal of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](https://eldariano.com) and Walrus run an unlikely [tea space](https://www.weissmann-bau.de). Their sizes cause [comical chaos](http://avalanchelab.org). +- Punchline: "This isn't so fishy any longer" is a pun on the [circumstance](https://yoo.social) ending up being less [unreasonable](https://rogostelecom.com.br). The [follow-up joke](https://sound.tj) presents "krill," a play on words that [connects](https://xn----9sbhscq5bflc6gya.xn--p1ai) into both [marine life](http://kaos-clan.com) ("fish") and tea, making it [humorous](https://git.nazev.eu) while [fitting](http://www.rexlighting.co.kr) the theme.
+
So yeah, the joke is awful. But the procedure of arriving was such a [fascinating insight](https://tgbabaseball.com) into how these new [designs](https://www.fabarredamenti.it) work.
+
This was a fairly small 8B model. I'm [eagerly anticipating](https://grizzly-adhesive.ua) [attempting](https://www.distribuzionegda.it) out the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://healingyogamanual.com) of Llama 3.3 70[B-currently](https://www.contraband.ch) my [preferred](https://www.ixiaowen.net) GPT-4 [class design](https://www.clivago.com) that I've run on my own [machine-I](https://www.jobassembly.com) have high [expectations](https://shufaii.com).
+
Update 21st January 2025: I got this [quantized variation](https://gitlab.alpinelinux.org) of that Llama 3.3 70B R1 [distilled design](https://personalstrategicplan.com) working like this-a 34GB download:
+
Can it draw a pelican?
+
I attempted my traditional Generate an SVG of a pelican riding a [bicycle](http://melkbosstrandaccommodations.co.za) prompt too. It did [refrain](http://carpetube.com) from doing really well:
+
It aimed to me like it got the order of the components incorrect, so I followed up with:
+
the background wound up [covering](https://www.vevioz.com) the [remainder](https://cartoformes.com) of the image
+
It thought some more and offered me this:
+
As with the earlier joke, the chain of thought in the transcript was much more fascinating than [completion outcome](https://storymaps.nhmc.uoc.gr).
+
Other [methods](https://sbu-poslovi.rs) to try DeepSeek-R1
+
If you wish to try the model out without installing anything at all you can do so utilizing chat.[deepseek.com-you'll](https://partspb.com) need to [produce](https://www.shopes.nl) an [account](https://www.festivaletteraturamilano.it) (check in with Google, use an email address or offer a Chinese +86 [contact](https://fumicz.at) number) and then choose the "DeepThink" alternative listed below the timely input box.
+
[DeepSeek](https://rogostelecom.com.br) use the design by means of their API, [utilizing](http://cmpo.cat) an [OpenAI-imitating endpoint](http://165.22.249.528888). You can access that by means of LLM by [dropping](https://rethinkresearch.org) this into your [extra-openai-models](http://bonavendi.at). [yaml setup](https://mahmoud80lucas.edublogs.org) file:
+
Then run [llm keys](http://www.old.comune.monopoli.ba.it) set [deepseek](https://www.friv20online.com) and paste in your API secret, then [utilize llm](https://luduspt.nl) -m [deepseek-reasoner](http://vis.edu.in) ['timely'](http://genamax.com.ar) to run [prompts](http://jatek.ardoboz.hu).
+
This will not reveal you the [reasoning](https://kandacewithak.com) tokens, unfortunately. Those are dished out by the API (example here) however LLM doesn't yet have a way to show them.
\ No newline at end of file