Looks like it now has Docling Content Extraction Support for RAG. Has anyone used Docling much?
- 1 Post
- 6 Comments
Oh and I typically get 16-20 tok/s running a 32b model on Ollama using Open WebUI. Also I have experienced issues with 4-bit quantization for the K/V cache on some models myself so just FYI
It really depends on how you quantize the model and the K/V cache as well. This is a useful calculator. https://smcleod.net/vram-estimator/ I can comfortably fit most 32b models quantized to 4-bit (usually KVM or IQ4XS) on my 3090’s 24 GB of VRAM with a reasonable context size. If you’re going to be needing a much larger context window to input large documents etc then you’d need to go smaller with the model size (14b, 27b etc) or get a multi GPU set up or something with unified memory and a lot of ram (like the Mac Minis others are mentioning).
FrankLaskey@lemmy.mlto
Technology@lemmy.ml•Nvidia teams up with DeepSeek for R1 optimizations on Blackwell, boosting revenue by 25xEnglish
31·10 months agoHopefully these improvements will become available to other Nvidia GPU architectures like Ada and Ampere in the future as well.
FrankLaskey@lemmy.mlto
Technology@beehaw.org•Apple Maps May Soon Feature Ads, But Not Everyone's Onboard - gHacks Tech NewsEnglish
1·10 months agoIs it possible to use StreetComplete on iOS?

Interesting project. Is it actually possible to track workouts using your phone or smartwatch without needing proprietary third-party apps like Strava or Garmin Connect though?