Structuring Text Data for Reliable AI Pulls
- Share this discussion
- Copy link
- Share on X (Twitter)
Summary
In CommunityOne Builders Help this announcement covers how to structure long txt data for multiple pulls to improve consistency and reduce AI hallucinations. It recommends limiting double newlines so embeddings keep related content together, notes markdown import newline issues, and asks about using agents for processing.
For long data structures that will have multiple pulls, how should I be structuri g the data? This is an example of my current structure
I'm wanting the layout and data to remain consistent, however I noticed that the Ai sometimes hallucinates
huh, this one i am not sure, i know in previous version, when we were using chatgpt, having it in paragrah works well because this is how we chunk the content, but any thoughts on this
Is this a common ocurrence? I see that you have a lot of text lines separated with double newlines like this:
### Alt-Fire
**Damage:** impact: 58.2, slash: 29.1, puncture: 9.7
Crit Chance: 21.00% | Crit Multiplier: 2.3x | Status Chance: 33.00% | Fire Rate: 1.5
### Alt-Fire Explosion
**Damage:** blast: 789
Crit Chance: 21.00% | Crit Multiplier: 2.3x | Status Chance: 33.00% | Fire Rate: 1.5
We do use double newlines to separate chunks of text for our vector embeddings. So maybe keeping related content separated with a single newline at most would help, like this:
### Alt-Fire
**Damage:** impact: 58.2, slash: 29.1, puncture: 9.7
Crit Chance: 21.00% | Crit Multiplier: 2.3x | Status Chance: 33.00% | Fire Rate: 1.5
### Alt-Fire Explosion
**Damage:** blast: 789
Crit Chance: 21.00% | Crit Multiplier: 2.3x | Status Chance: 33.00% | Fire Rate: 1.5
I'll try that. Thank you
It was separated because I was using an MD conversion from an import, I just forgot to account for new lines
btw, on a seperate note, are you familar with agents?
I haven't used specific ones