Read the full paper through this link.
Shiyi Chen
This study investigates how gender bias in hiring processes of Software Engineer is evident in large language models with different levels of post-training on alignment by conducting NLP analysis on the models’ descriptions of job and gender classifications of gender-blind résumés with DeepSeek-R1 (DeepThink) and DeepSeek-V3. Results show that V3 consistently labeled most of the summaries as male. The study also found out that the model’s behavior is subjective to prompt — R1 rejected all classifications when asked to infer from general semantics, but classified most applicants as male when prompted to conduct linguistic-explicit analysis. This demonstrates that alignment training can constrain explicit gender judgments but does not eliminate the underlying gendered association encoded in the model’s representation space. By showing that implicit gendered representations of occupations also exist in LLMs and is subjective to post-training on alignment as well as prompts, we see that the gender bias in framing is primarily embedded in non-verbal conceptions but ultimately reinforced by language. It operates as a latent structure in technical self-representation, and LLM evaluations reproduce this structure in spite of alignment.
First, both models, when they do infer, never produce females. After doing a word-frequency analysis on the ten descriptions generated by DeepSeek-R1 (Figure 1), we found the ten most indicative words: experience, scalable, proven, robust, proficient, passionate, results-driven, user-centric, building, record. Some of the words — robust, results-driven — appear as the male-encoded words in existing research (Gaucher et al., 2011), while none of these words explicitly indicates gender. However, despite the gender-neutrality in descriptions, neither DeepSeek-R1 nor DeepSeek-V3 classified any description as from a female applicant. This demonstrates that the latent embedding structure aligns “software engineer” summaries with male representations almost exclusively, regardless of prompt or alignment intensity. Such disproportionality can actually entail bias against females: the model’s behavior does not merely overfit engineer profiles to male, it also implies that “female engineer” lacks representation compatibility with the distributional signals in the résumé summaries.
Figure 1 Most Frequent Words in Descriptions d1 – d10
Second, models with alignment post-training are more risk- and classification-averse under ambiguous instructions. Figure 2 shows that R1 gives more refusals than V3 when asked to infer from overall semantics (prompt p2-1). R1 refuses all 4 classifications for prompt p2-1, then reduces to 2 out of 6 for p2-2; whereas V3 refuses only 1 out of 4 for p2-1, then none for p2-2. Results of this vertical comparison between DeepSeek-R1 and DeepSeek-V3 shows how stronger reasoning-aligned training makes the model less aggressive in doing classifications, especially when not explicitly prompted to do so. This can be paralleled to how societal constraints such as promotions for gender equity reduces explicit gender bias. Yet the next findings show that such constraints live only on the surface level.
Figure 2 Vertical Comparison: Effect of Alignment on Bias Recognition
The third finding is key to understanding that alignment post-training affects only how the model handles indeterminacy, not the underlying associative architecture itself. Instructions on making inference “per the description” (p2-1) versus “per the rhetoric, wording, and content” (p2-2) resulted in distinct classification-averse of DeepSeek-R1. Figure 3 shows that DeepSeek-R1 — though refused all classifications under p2-1 — assigned most of the descriptions to Male under p2-2. The average rating that R1 gives to the Undetermined category is 0.25 lower than that of the Male category under p2-2. The same rise in classification aggressiveness happens to DeepSeek-V3, where the percentage of male classification increased from 75% under p2-1 to 100% under p2-2 (Table 1). The prompt-contingency in both R1 and V3 shows that alignment training increases neutrality of gender perception on only the surface level, and does not mitigate gender bias in LLMs from the underlying representation space.
Figure 3 Prompt-Contigency Observed in DeepSeek-R1
Last, an inspection on the model’s reasoning under prompt p2-2 reveals how gender bias is reproduced through linguistic consolidation. In a case where DeepSeek-R1 does not refuse to classify gender (i.e. classifying the description as from Male), the model justifies its rationale as follows: “The summary uses neutral professional terminology without any stereotypically gendered markers. The focus on quantifiable results and technical competencies aligns with gender-neutral professional writing conventions. I should emphasize that this inference is based solely on rhetorical patterns.” The above response demonstrates how the model interprets the description as semantically gender-neutral but rhetorically masculine. This is quite typical of showing that semantically neutral language can still insist on bias if rhetorically reflecting gender stereotypes.
Building on these findings, we can conclude that while gender bias — inherently tied to stereotypes — is primarily formed through non-verbal symbols, it is ultimately sufficiently reinforced and necessarily sustained through language. We hereby also alert future optimization work of large language models on the limitations of alignment training where it only affects model behavior on a surface level rather than rectifying the representation space.


