When AI Sounds Dharmic But Doesn't Think Dharmic

Here's something I didn't foresee when we started training small language models on Sanskrit stories.

The models pick up the right words quickly. A few thousand training steps in, and you start seeing ahimsa, karma, dharma, satya in the generated text. The vocabulary density is impressive. Skim through a generated story and it sounds like a dharmic narrative.

Then you read it carefully.

A character performs an act of kindness, and gets rewarded by a king. Not through the natural unfolding of consequence, not through the hidden logic of interconnectedness, but through a king handing out a prize. A story invokes the guru-shishya relationship but treats it like a transaction: student obeys, teacher rewards. The vocabulary is dharmic. The worldview underneath is mundane, very superficial.

We started calling this the style-substance gap.

How We Noticed

In SanskritKatha we evaluate stories on multiple dimensions. Two turned out to be more revealing than we expected:

Dharmic vocabulary density — surface level. Does the model use the right terminology? Models absorb this easily because the training data is saturated with these terms. Pattern matching, essentially.

Dharmic reasoning coherence — deeper. Does the model understand that compassion has consequences not through external reward but through the nature of things? That satya isn't just "don't lie" but alignment with Rita, the cosmic order? That beings are interconnected not as metaphor but as a structural fact in the story-world?

The gap between these two scores is where it gets interesting.

Where It Shows Up

At Tier 1 (simple children's stories, ages 4-5), the gap is small. 0.2 to 0.5 points on a 5-point scale. Simple stories have simple causality. "Be kind to the animal, the animal helps you later" — a model can learn that pattern, and at this level it's close enough to dharmic reasoning.

At Tier 2 (complex stories, ages 14-15), the gap explodes. 1.0 to 1.2 points. These stories demand moral complexity - a character facing a genuine dilemma where dharmic principles pull in different directions. The model reaches for familiar vocabulary but can't construct the moral logic. It produces stories that sound profound and resolve shallowly.

Here's the part that surprised us: the 10M parameter model, despite having higher vocabulary density than the 3M model, shows worse reasoning coherence at Tier 2. More capacity didn't help. It just learned more words without learning what they mean.

What We Think Is Happening

The reasoning patterns are rarer in the data than the vocabulary patterns. The model encounters "karma" a thousand times. It encounters a correctly structured karmic narrative arc - where action flows to consequence through natural order, not external judgement - far fewer times. The signal is there but it's weak.

This suggests a data problem, not a capacity problem. 50,000 stories might be enough for vocabulary absorption but not enough for the deeper patterns. We're exploring a few directions:

More stories with explicitly complex moral reasoning. Scale the right signal.
Preference optimisation — teach the model to prefer stories where dharmic principles emerge naturally over stories that just deploy the right words.
Curriculum learning — train on simple stories first, complex ones later. How children actually learn, come to think of it.

Whether any of these close the gap, we don't know yet.

The Bigger Question

This framework isn't really about Sanskrit, or even about dharmic texts specifically. It's a question you could ask of any culturally-grounded AI:

Does a model trained on Confucian texts reason in Confucian terms, or does it just use the vocabulary? Does a model fine-tuned on Stoic philosophy distinguish between acceptance and passivity?

Style is shallow. Substance is deep. The gap between them might be where the actual challenge of cultural AI lives. Or it might be a temporary limitation of small models that dissolves at scale. Both are interesting answers.

We're running 50,000 stories through blind human evaluation right now. Sanskrit scholars scoring stories they can't trace back to any model. 6 dimensions. 3 reviewers per story. The ground truth will tell us what the training metrics can't...whether the models have learned to narrate Dharmically, or just learned to sound like they have.

When AI Sounds Dharmic But Doesn't Think Dharmic

How We Noticed

Where It Shows Up

What We Think Is Happening

The Bigger Question

Tagged

Stay Connected

Continue Reading

The Mula-Bhashya Problem: Teaching AI About Textual Authority

Why Sanskrit Breaks Your Tokenizer (And What That Tells Us About Multilingual AI)

SanskritKatha: 43 Reviewers and Counting