Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Discussion options

Hello,

I am exploring the development of an offline educational mobile app for students in areas where data internet is not really accessible.

The app would allow students (Grade 6 to University) to download the courses of a single year.

Each pack would include a small LLM model (or adapter) that runs fully offline on mid-range Android smartphones.

Once downloaded, the app should work 100% offline (no cloud access required), with good performance and minimal latency.

i want the LLM to be able to answer questions based on the course material and help students solve exercises, with minimal to low hallucinations.

My question:

Is this technically feasible on typical mid-range smartphones used in countries where the average phone is (3-8 GB RAM, ~128-256 GB storage) ?

Which model architecture strategy (quantization, LoRA adapters, small fine-tuned model, etc.) would you recommend for this use case?

Thanks.

You must be logged in to vote

Replies: 2 comments

Comment options

Hi there,
I am not a mobile developer, but I think that's exactly what small LLMs like Gemma 3 270M are developed for (https://github.com/rasbt/LLMs-from-scratch/tree/main/ch05/12_gemma3)

Before any fine-tuning, I would probably also consider RAG (without and then with quantization) here because the answers are strictly based on the documents you provide. And then after RAG, I'd probably continue pretraining and/or finetune and see whether it makes it better or worse.

You must be logged in to vote
0 replies
Comment options

Yes, it’s definitely possible to run an offline learning app on mid-range Android phones (3–8 GB RAM). The trick is to use a small, efficient model. Models in the 1–3B range, quantized to 4-bit (like Llama-3-3B
, Mistral-3B
, or Gemma-2B
) can run fine; 7B models are heavier and may feel slow.

The smart setup is:

Ship one base model once, then add tiny LoRA adapters (course packs, just a few MB each).

Use a local retrieval system (RAG) so the model always refers to the actual textbook instead of guessing.

Run it all with llama.cpp
, which is optimized for phones.

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
💡
Ideas
Labels
question Further information is requested
3 participants
Converted from issue

This discussion was converted from issue #824 on September 14, 2025 17:11.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.