I would like to provision compute (servers), gpus (say 2 A100 80GB or H100), storage and network (may be 100GbE) to run OpenApaca 7B (https://huggingface.co/openlm-research/open_llama_7b) model.
How do I go about sizing this? AWS/GCP cluster sizing is okay too.
Now I need to run some containerized neural net code (what is preferred containers or bare metal)
- What should be the type of CPU, RAM, GPU for each node?
- How many containers per node?
- How many such nodes?
- What about HA, DP and DR in this case?
- How much time (expected) for model to train?
- What could be typical response time for inferencing?
- Network considerations?
- How will this model scale?
– techele Jun 13 '23 at 21:07