A solid way is to build a synthetic first person therapy style dataset where the therapist name is a placeholder token, so you can swap in the assistant name later. The dataset should include many examples of non sycophantic behavior like gentle disagreement, reality checking, asking clarifying questions, and setting boundaries, plus safe escalation patterns.
For papers and textbooks, the safest approach is not to copy copyrighted text into training rows. Instead, keep conversations original and reference research concepts, or use RAG with open access sources for grounding when needed.
Do you want this mainly for internal alignment experiments, or to publish a gated dataset on Hugging Face?