ENH: Unique key detection function

### Feature Type

- [X] Adding new functionality to pandas

- [ ] Changing existing functionality in pandas

- [ ] Removing existing functionality in pandas


### Problem Description

Hello,
This is a feature I haven't seen in any data prepation/etl. The core feature is to detect the unique key in a dataframe. More than often, you have to deal with a dataset without knowing what's make a row unique. This can lead to misinterpret the data, cartesian product at join and other funny stuff.

### Feature Description

How do I imagine that ?

Entry parameters; one dataframe, ability to specify a max number of field for combination (empty or 0=no max). 
Algo : it tests the count distinct every combination of field versus the count of rows

Result : a dataframe with one row by field combination that works. If no result : "no field combination is unique. check for duplicate or need for aggregation upstream".


ex : 

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/saubert/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/saubert/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
<style>

</style>
</head>

<body link="#0563C1" vlink="#954F72">


order_id | line_id | amount | customer | site
-- | -- | -- | -- | --
1 | 1 | 100 | A | U_250
1 | 2 | 12 | A | U_250
1 | 3 | 45 | A | U_250
2 | 1 | 75 | A | U_250
2 | 2 | 12 | A | U_250
3 | 1 | 15 | B | U_250
4 | 1 | 45 | B | U_251



</body>

</html>

The user will previously select every field but excluding Amount (he knows that Amount would have no sense in key)

The algo will test the following key
-each separate field
-each combination of two fields
-each combination of three fields
-each combination of four fields

to match the number of row (7)
And gives something like that

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/saubert/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/saubert/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
<style>

</style>
</head>

<body link="#0563C1" vlink="#954F72">


choice | number of fields | field combination
-- | -- | --
very good | 2 | order_id,line_id
average | 3 | order_id,line_id, customer
average | 3 | order_id,line_id, site
bad | 4 | order_id,line_id, site, customer
… | … | ….



</body>

</html>


Best regards,

Simon

### Alternative Solutions

N/A

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Unique key detection function #60298

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

order_id	line_id	amount	customer	site
1	1	100	A	U_250
1	2	12	A	U_250
1	3	45	A	U_250
2	1	75	A	U_250
2	2	12	A	U_250
3	1	15	B	U_250
4	1	45	B	U_251

choice	number of fields	field combination
very good	2	order_id,line_id
average	3	order_id,line_id, customer
average	3	order_id,line_id, site
bad	4	order_id,line_id, site, customer
…	…	….

Uh oh!

ENH: Unique key detection function #60298

Description

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions