Knowledgebase

Start here for common hosting, domain, and account questions. Open a ticket when a human needs to check your setup.

Support knowledgebase

Practical answers for current EZOS.Hosting services

These articles focus on the services we actively sell and support now: managed local AI, private RAG, secure business hosting, BYO server management, and onboarding. Retired product categories are intentionally no longer shown.

Managed Local AI

Private AI deployments, RTX 4000 Ada readiness, realistic model sizing, and local inference boundaries.

What does Managed Local AI include?

Managed Local AI is a scoped service for private AI workloads that should run close to your data instead of depending on a public chatbot account. Typical work includes GPU and runtime readiness, model selection, private access, monitoring, update planning, and practical usage guidance.

What can the RTX 4000 Ada with 20 GB VRAM run locally?

The current server class is suitable for focused local AI workloads, especially quantized chat models, embeddings, reranking, classification, extraction, and RAG pipelines. Large frontier-model behavior, high concurrency, long contexts, and heavy image or video generation need careful sizing and may need a different architecture.

Why is there a GPU readiness check before activation?

A local AI service depends on more than the GPU card. We verify that the operating system can see the GPU, the runtime stack is compatible, the inference service starts cleanly, the selected model fits, and a real prompt or retrieval task returns a usable answer.

What happens if the local inference runtime is not ready yet?

If the server cannot currently load and answer with the target model, the project is not sold as a live inference service. The first paid scope becomes diagnosis, repair, model selection, and benchmark evidence. Production activation only starts after a smoke test proves that the selected workflow runs on the actual server.

Can my data stay on the server?

Yes, that is the point of the local approach when the workload fits. Documents, prompts, indexes, and generated answers can stay on infrastructure we operate or manage for you. We still define access rules, retention, backups, and any optional external API use during onboarding.

How does onboarding for a local AI project work?

We start with the business task, data sensitivity, expected users, target response time, and budget. Then we check the server baseline, run a representative benchmark, and launch only when the measured result supports the production scope.

Which model sizes should I choose for a private assistant?

Start smaller than you think. For many business workflows, retrieval quality, clean source documents, and reliable prompts matter more than choosing the largest model. We compare model size, quantization, context length, concurrency, latency, and cost before recommending a recurring plan.

Can you run chat, embeddings, and RAG on one server?

Often yes, but not without limits. A single server can host a compact chat model, embedding jobs, a vector index, and a private RAG interface when usage is controlled. We separate background indexing from live chat, set upload and context limits, and monitor VRAM, RAM, disk, and response times.

What information do you need before quoting a local AI project?

Useful inputs are the business process, example questions, ideal answers, document types, approximate volume, update frequency, expected users, privacy requirements, and any existing server or authentication constraints. This lets us quote a realistic baseline review or benchmark instead of guessing.

Team RAG and Visual Knowledge Systems

Document ingestion, retrieval quality audits, permissions, visual evidence search, and practical team workflows.

What is Team RAG?

Team RAG is a private assistant that answers questions from approved company documents. The system retrieves relevant source context and asks the model to answer from that evidence, which is useful for support drafts, internal process questions, policy lookup, and sales preparation.

Which documents can be indexed?

PDF, DOCX, Markdown, plain text, HTML exports, CSV notes, ticket exports, and curated website pages can usually be processed. Database-backed systems are normally exported as selected records instead of giving the assistant unrestricted database access.

Can Team RAG use screenshots, scans, diagrams, and product images?

Yes, when the workflow is designed for visual evidence. A visual RAG workflow can extract useful text, generate descriptions, and store image-aware evidence next to normal documents. Qwen3-VL-Embedding and Qwen3-VL-Reranker are candidates we can benchmark, but production use still depends on the actual data, latency target, and server readiness.

How do you keep answers accurate?

We use conservative prompts, source references, real test questions, and failure review. RAG is not a magic truth engine: if the source material is wrong, outdated, or missing, the system must be corrected at the source or retrieval layer.

How are permissions handled?

We design around least privilege. Smaller teams may use one curated assistant. Larger teams may need separate indexes, groups, or identity-provider integration so users only retrieve sources they are allowed to see.

What makes a good first Team RAG dataset?

The best first dataset is a focused collection that answers a real recurring question. Avoid unreviewed file dumps, conflicting versions, documents users are not allowed to see, and huge archives before retrieval quality has been measured.

What is the RAG Retrieval Quality Audit?

The RAG Retrieval Quality Audit is a paid first step for teams that want private knowledge search without committing to a full rollout before the evidence is clear. We test a representative document set, fixed user questions, retrieval results, citation quality, and reranker gains.

On the current EZOS server profile we plan around a NVIDIA RTX 4000 Ada GPU with 20 GB VRAM. For most first audits, practical candidates are Qwen3-Embedding 0.6B or 4B plus Qwen3-Reranker 0.6B or 4B. Larger 8B candidates are considered only when the corpus, latency target, and memory budget justify the tradeoff.

The output is top-k misses, weak source coverage, storage notes, permission boundaries, privacy considerations, and a go or stop recommendation for Team RAG. Production local inference is only promised after the GPU driver, Ollama service, target models, and workflow smoke tests pass on the live server.

Business Secure Hosting

Hardening, monitoring, backups, SSL, DNS, and production readiness for business websites.

What is included?

The baseline is security-conscious setup, SSL, backup planning, update coordination, uptime monitoring, and a clear support path. Exact scope depends on the software stack and the business impact of downtime.

How do backups and restores work?

We define what is backed up, how often, where it is stored, retention, and how restore testing works. Backups may include web files, database dumps, configuration, uploaded assets, DNS notes, and selected service configuration.

Do you monitor uptime, logs, and certificates?

When monitoring is part of the plan, we can watch HTTP status, certificate expiry, selected error logs, disk pressure, and service-specific signals. For AI and custom applications, we prefer a health check that proves useful work, not just HTTP 200.

Can you host email or large mailing lists?

Email and high-volume sending require a readiness review. We check consent, sender reputation, DNS policy, bounce handling, complaint handling, queue limits, and abuse controls before production mail is moved.

How do security reviews fit into hosting and AI projects?

Security is part of the operating model. We look at access, updates, backups, exposed services, logs, secrets handling, and practical incident recovery. The depth of review depends on the plan and risk level, but findings should stay actionable.

BYO Server Management

Managed help for existing Linux, Proxmox, web, database, and AI runtime servers.

Can EZOS manage my existing server?

Yes, after a baseline review. We check access, backups, update state, exposed services, disk layout, monitoring, and business criticality before making production changes.

What access do you need?

Usually SSH with sudo or a defined privileged path, provider or control-panel access when DNS or rescue mode may be needed, application admin access if the app itself must change, and a current backup or permission to create one.

Can you repair GPU, Ollama, or AI runtime issues?

Yes, if the server and driver stack are supportable. GPU AI issues often involve kernel modules, driver packages, userspace libraries, runtime services, model storage, and reverse proxy configuration. We verify with a real model or health check before calling a repair complete.

How are migrations from older hosting setups handled?

We start with an inventory of domains, databases, files, mail, cron jobs, DNS, certificates, and external dependencies. Then we decide what should be kept, retired, or rebuilt, test the target environment, and keep rollback options until the new setup is verified.

What is outside the managed scope?

We do not promise responsibility for unknown custom code, unlicensed software, unsupported operating systems, unverified third-party plugins, workloads that violate rules or law, or performance the hardware cannot deliver.

Billing, Onboarding and Support

How to choose an offer, start a project, open useful tickets, and adjust scope.

How do I choose the right offer?

Use the business problem first. Choose Managed Local AI or Team RAG for private AI work, Business Secure Hosting for safer production hosting, and BYO Server Management when you already have infrastructure that needs operational care.

How fast can a project start?

Simple reviews can usually start after order confirmation and access handover. Production migrations, security-sensitive systems, GPU runtime work, and AI deployments need a baseline check before implementation work is scheduled.

What should I include in a support ticket?

Include the affected domain or service, expected result, actual result, exact error text, when it started, recent changes, business impact, urgency, and the safe access handover method. Exact URLs and timestamps are usually more useful than screenshots alone.

What should I expect after placing an order?

The next useful step is usually a short clarification or access checklist. Use the ticket portal for project details, attach exact errors when relevant, state deadlines and business impact, and do not send passwords in plain text.

Can I change scope or cancel later?

Scope can change when the system reality changes. If work is smaller than expected, we reduce scope. If the server is riskier than expected, we explain the added work before continuing. Billing terms depend on the product and already delivered work.

Open a support ticket View Managed Local AI