cszhzleo
/

Meta-Llama-3.1-8B-Instruct-nc2-bs1-token4096-neuron-220

Model card Files Files and versions Community

Meta-Llama-3.1-8B-Instruct-nc2-bs1-token4096-neuron-220 / README.md

cszhzleo's picture

Update README.md

eb38a88 verified 3 months ago

|

history blame contribute delete

1.12 kB

	---
	license: mit
	---
	### environment
	optimum-neuron 0.0.25.dev0(deperated)

	optimum-neuron 0.0.25

	neuron 2.20.0

	transformers-neuronx 0.12.313

	transformers 4.43.2


	### export
	```
	optimum-cli export neuron --model NousResearch/Meta-Llama-3.1-8B-Instruct --batch_size 1 --sequence_length 4096 --num_cores 2 --auto_cast_type fp16 ./models-hf/NousResearch/Meta-Llama-3.1-8B-Instruct

	```

	### run
	```
	docker run -it --name llama-31 --rm \
	-p 8080:80 \
	-v /home/ec2-user/models-hf/:/models \
	-e HF_MODEL_ID=/models/NousResearch/Meta-Llama-3.1-8B-Instruct \
	-e MAX_INPUT_TOKENS=256 \
	-e MAX_TOTAL_TOKENS=4096 \
	-e MAX_BATCH_SIZE=1 \
	-e LOG_LEVEL="info,text_generation_router=debug,text_generation_launcher=debug" \
	--device=/dev/neuron0 \
	neuronx-tgi:latest \
	--model-id /models/NousResearch/Meta-Llama-3.1-8B-Instruct \
	--max-batch-size 1 \
	--max-input-tokens 256 \
	--max-total-tokens 4096

	```

	### test
	```
	curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'
	```