๋ฐ์—”์œผ๋กœ ์„ฑ์žฅ์ค‘ ๐ŸŒฑ

Python/[๊ธฐ์ดˆ ๊ฐ•์˜ ์ •๋ฆฌ]

python ๊ธฐ์ดˆ 10

์จ๋ฐ 2023. 3. 3. 11:04

โœ๐Ÿป ๋ฐฐ์šด์ 

 

ํ†ต๊ณ„ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์™€ ๋‹ค๋ฅธ ํŒŒ์ƒ ๋ณ€์ˆ˜๋ฅผ ๋งŒ๋“ค์–ด ์œ ์˜๋ฏธํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ฒŒ ๋˜์—ˆ๋‹ค.

 

 

 

 


 

 

 

๋“ค์–ด๊ฐ€๋ฉฐ

 

5๊ฐœ ํŽธ์˜์  ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ „๋‹ฌ ๋ฐ›์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์™„๋ฒฝํ•˜์ง€ ์•Š์•„์„œ, ์ถ”ํ›„์— ๊ฐ ํŽธ์˜์  ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๋‹ค์‹œ ํฌ์ŠคํŒ… ํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

 

 

 

 

 

 

 

5๊ฐœ ๋ธŒ๋žœ๋“œ ํŽธ์˜์  ๋ฐ์ดํ„ฐ ํ™•์ธํ•˜๊ธฐ

 

 

์ „๊ตญ 5๊ฐœ ํŽธ์˜์  ์ •๋ณด ๋ฐ์ดํ„ฐ๊ฐ€ ๋‹ด๊ธด pkl ํŒŒ์ผ์„ ์—ด์–ด, ๋ฐ์ดํ„ฐ๋ฅผ ํƒ์ƒ‰ํ•ด๋ณด์ž.

 

import pandas as pd
import numpy as np

df = dp.read_pickle("./5store.pkl")

 

๋จผ์ €, ๋ฐ์ดํ„ฐ์˜ ๊ฒฐ์ธก์น˜์˜ ๊ฐœ์ˆ˜๋ฅผ ํ™•์ธํ•ด๋ณด์ž.

 

df.isnull().sum()

 

DataFrame ์—์„œ null ๊ฐ’์„ ์ฐพ๊ณ  column ๋‹จ์œ„๋กœ ๋ฌถ์–ด ๊ฐœ์ˆ˜๋ฅผ ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

 

๊ทธ๋ฆฌ๊ณ , ํŽธ์˜์  ๋ธŒ๋žœ๋“œ๋ณ„ ์ด ์ ํฌ ๊ฐœ์ˆ˜๋„ ๊ตฌํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

df['brand'].value_counts()

 

๋ธŒ๋žœ๋“œ๋ณ„ value ๊ฐ’ ๊ฐœ์ˆ˜๋“ค์„ ์นด์šดํŠธํ•ด์ค€๋‹ค.

 

 

ํŽธ์˜์ ๋“ค์„ ์ง€์—ญ๋ณ„๋กœ ๊ตฌ๋ถ„ํ•˜๊ธฐ ์œ„ํ•ด์„œ, ์ฃผ์†Œ column ์—์„œ ์ง€์—ญ์— ํ•ด๋‹นํ•˜๋Š” ๋ฌธ์ž์—ด๋งŒ ๋ฝ‘์•„ ํ†ต์ผํ•ด์ฃผ๋Š” ์ž‘์—…์ด ํ•„์š”ํ•˜๋‹ค.

 

# ๋จผ์ €, ๊ณ ์œ ํ•œ ๊ฐ’๋งŒ ์ฐพ์•„๋ณธ๋‹ค. (๋™์‹œ์—, ์ด์ƒํ•œ ๊ฐ’ ์žˆ๋Š”์ง€ ํ™•์ธ ๊ฐ€๋Šฅ)
df['address'].apply(lambda x : x.split()[0]).unique()

# ์‹œ/๋„ ๊ฐ’ ๊ณ ์œ  ์ด ๊ฐœ์ˆ˜ ์ฐพ๊ธฐ
df['address'].apply(lambda x : x.split()[0]).value_counts()

 

์ฃผ์†Œ๊ฐ€ ๋‹ด๊ธด ๋ฌธ์ž์—ด์„ ๊ณต๋ฐฑ ๊ธฐ์ค€ ์ž๋ฅด๊ณ  ๋จผ์ € ๋‚˜์˜ค๋Š” ๊ฐ’์ด ์‹œ/๋„ ๊ฐ€ ๋‚˜์™€์•ผํ•œ๋‹ค.

 

์ด์ƒ์น˜๋‚˜ ์ค‘์˜์  ํ‘œํ˜„๋“ค์„ ํ•˜๋‚˜๋กœ ํ†ต์ผํ•ด์ฃผ์ž.

 

 

๋จผ์ €, '๊ฒฝ๊ธฐ'๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋ชจ๋“  ์ฃผ์†Œ๋ฅผ ์ฐพ์•„๋ณด์ž.

 

 # '๊ฒฝ๊ธฐ'๊ฐ€ 0๋ฒˆ์งธ index ์— ์žˆ๋Š” ๊ฐ’์—์„œ ์ฃผ์†Œ๊ฐ€ ๊ณ ์œ ํ•œ ๊ฐ’ ์ฐพ๊ธฐ
df[df['address'].str.find("๊ฒฝ๊ธฐ") == 0]['address'].apply(lambda x : x.split()[0]).unique()

 

'๊ฒฝ๊ธฐ' ๋กœ ์‹œ์ž‘ํ•˜๋Š” address ์˜ ๊ณ ์œ ๊ฐ’๋งŒ ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค.

 

์ด์ œ, ์ด์ƒ์น˜๋“ค์„ ์•Œ์•˜์œผ๋‹ˆ ๋ฐ”๊ฟ”์ฃผ๋„๋ก ํ•˜์ž.

 

 

๋ฐ”๊พธ๊ธฐ ์ „, ์ฃผ์†Œ๊ฐ’๋“ค์˜ ์–‘์˜† ๊ณต๋ฐฑ๋“ค์„ ์ œ๊ฑฐํ•ด์ค€๋‹ค.

 

df['address'] = df['address'].apply(lambda x: x.strip())

 

strip() ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฃผ์†Œ๊ฐ’์˜ ์–‘์˜† ๊ณต๋ฐฑ๋“ค์„ ์ œ๊ฑฐํ–ˆ๋‹ค.

 

 

df.loc[df['address'].str.find("๊ฒฝ๊ธฐ") == 0, '์‹œ'] = '๊ฒฝ๊ธฐ๋„'

# '์‹œ' column ํ™•์ธ
df['์‹œ'].value_counts()

 

 

๊ทธ๋ฆฌ๊ณ  '๊ฒฝ๊ธฐ'๋กœ ์‹œ์ž‘ํ•˜๋Š” ์ฃผ์†Œ๋“ค์„ ์ฐพ๊ณ  '์‹œ' column ์„ ๋งŒ๋“ค์–ด '์‹œ' column ์— ๋ชจ๋‘ '๊ฒฝ๊ธฐ๋„' ๋กœ ์ €์žฅ๋œ๋‹ค.

 

 

df.loc[(df['address'].str.find("๋ถ€์‚ฐ") == 0), '์‹œ'] = '๋ถ€์‚ฐ๊ด‘์—ญ์‹œ'
df.loc[(df['address'].str.find("๋Œ€์ „") == 0), '์‹œ'] = '๋Œ€์ „๊ด‘์—ญ์‹œ'
df.loc[(df['address'].str.find("๊ด‘์ฃผ") == 0), '์‹œ'] = '๊ด‘์ฃผ๊ด‘์—ญ์‹œ'
df.loc[(df['address'].str.find("๋Œ€๊ตฌ") == 0), '์‹œ'] = '๋Œ€๊ตฌ๊ด‘์—ญ์‹œ'
df.loc[(df['address'].str.find("์šธ์‚ฐ") == 0), '์‹œ'] = '์šธ์‚ฐ๊ด‘์—ญ์‹œ'
df.loc[(df['address'].str.find("์ธ์ฒœ") == 0), '์‹œ'] = '์ธ์ฒœ๊ด‘์—ญ์‹œ'
df.loc[(df['address'].str.find("๊ฐ•์›") == 0), '์‹œ'] = '๊ฐ•์›๋„'
df.loc[(df['address'].str.find("์„ธ์ข…") == 0), '์‹œ'] = '์„ธ์ข…ํŠน๋ณ„์ž์น˜์‹œ'
df.loc[(df['address'].str.find("์ œ์ฃผ") == 0), '์‹œ'] = '์ œ์ฃผํŠน๋ณ„์ž์น˜๋„'
df.loc[(df['address'].str.find("๊ฒฝ๋ถ") == 0) |
        (df['address'].str.find("๊ฒฝ์ƒ๋ถ๋„") == 0), '์‹œ'] = '๊ฒฝ์ƒ๋ถ๋„'
df.loc[(df['address'].str.find("์ „๋ถ") == 0) |
        (df['address'].str.find("์ „๋ผ๋ถ๋„") == 0), '์‹œ'] = '์ „๋ผ๋ถ๋„'
df.loc[(df['address'].str.find("์ „๋‚จ") == 0) |
        (df['address'].str.find("์ „๋ผ๋‚จ๋„") == 0), '์‹œ'] = '์ „๋ผ๋‚จ๋„'
df.loc[(df['address'].str.find("์ถฉ๋ถ") == 0) |
        (df['address'].str.find("์ถฉ์ฒญ๋ถ๋„") == 0), '์‹œ'] = '์ถฉ์ฒญ๋ถ๋„'
df.loc[(df['address'].str.find("์ถฉ๋‚จ") == 0) |
        (df['address'].str.find("์ถฉ์ฒญ๋‚จ๋„") == 0), '์‹œ'] = '์ถฉ์ฒญ๋‚จ๋„'
df.loc[(df['address'].str.find("๊ฒฝ๋‚จ") == 0) |
        (df['address'].str.find("๊ฒฝ์ƒ๋‚จ๋„") == 0), '์‹œ'] = '๊ฒฝ์ƒ๋‚จ๋„'

 

๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ๋‹ค๋ฅธ ์ฃผ์†Œ๋“ค๋„ ์‹œ,๋„ ํ†ตํ•ฉ์„ ์ง„ํ–‰ํ•ด์ฃผ์—ˆ๋‹ค.

 

df = df[~df[['brand', 'shopName']].duplicated()].copy()

 

๊ทธ๋ฆฌ๊ณ  ๋ธŒ๋žœ๋“œ์™€ ์ง€์ ๋ช…์ด ๋™์ผํ•œ ์ค‘๋ณต๊ฐ’๋“ค์„ ์ฐพ์•„ ์ œ๊ฑฐํ•ด์ค€ ๊ฒƒ์œผ๋กœ ๋‹ค์‹œ df ์— ์ €์žฅํ•œ๋‹ค.

 

 

์ด์ œ, '์‹œ' column ์— null ๊ฐ’์œผ๋กœ ๋“ค์–ด๊ฐ„ ๋ฐ์ดํ„ฐ๋“ค์„ ์ฐพ์•„๋ณด์ž.

 

df[df['์‹œ'].isnull()]

 

์ฃผ์†Œ๊ฐ’์ด ์ €์žฅ๋œ ํ˜•ํƒœ๊ฐ€ ์ด์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์—, '์‹œ' column ์— Nan ๊ฐ’์ด ๋“ค์–ด๊ฐ„ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๋‚˜์ค‘์— ๋”ฐ๋กœ ์ฒ˜๋ฆฌํ•ด์ฃผ์–ด์•ผ ํ•˜๋Š” ๊ฐ’๋“ค์ด๋‹ค. (index ์กฐ์ • ํ›„ ์ง„ํ–‰)

 

df ์˜ index ๋ฅผ ๋ณด๋ฉด ์•ž์—์„œ ์ง„ํ–‰ํ•œ ๊ณผ์ • ๋•Œ๋ฌธ์—, ๋’ค์ฃฝ๋ฐ•์ฃฝ์ด๋ผ ์žฌ์„ค์ •ํ•˜๋Š” ๊ณผ์ •์ด ํ•„์š”ํ•˜๋‹ค.

 

# df index ์žฌ์„ค์ •
# drop = True : ์ƒˆ๋กœ์šด column ์œผ๋กœ ์ถ”๊ฐ€๋˜์ง€ ์•Š๋Š”๋‹ค.
# inplace = True : ๋ฐ”๋กœ ํ•ด๋‹น df ์— ์ ์šฉ๋œ๋‹ค.
df.reset_index(drop=True, inplace=True)

 

reset_index() ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, df ์˜ index ๋ฅผ ๋‹ค์‹œ ์žฌ์ •์˜ ํ–ˆ๋‹ค.

 

df.loc[24549, "address"] = "์„œ์šธ์‹œ ์„œ๋Œ€๋ฌธ๊ตฌ ์ถฉ์ •๋กœ7 ๊ตฌ์„ธ๊ตฐ๋นŒ๋”ฉ1์ธต"
df.loc[df['address'].str.find("์ฐฝ์›") == 0, "์‹œ"] = '๊ฒฝ์ƒ๋‚จ๋„'
df.loc[30712, '์‹œ'] = '๊ฒฝ๊ธฐ๋„'
df.loc[33001, '์‹œ'] = '๋ถ€์‚ฐ๊ด‘์—ญ์‹œ'
df.loc[33723, '์‹œ'] = '์ถฉ์ฒญ๋‚จ๋„'
df.loc[33985, '์‹œ'] = '์ถฉ์ฒญ๋‚จ๋„'

df[df['์‹œ'].isnull()] # null ๊ฐ’ ์—†์Œ

 

๋งˆ์ง€๋ง‰์œผ๋กœ, ๋‚˜์ค‘์— ๋”ฐ๋กœ ์ฒ˜๋ฆฌํ•ด์ฃผ๊ธฐ๋กœ ํ•œ ๊ฐ’์„ ์ฒ˜๋ฆฌํ–ˆ๋‹ค.

 

์ด์ œ, ์‹œ, ๋„ ๋ณ„ ๊ตฌ๋ถ„์„ ์™„๋ฃŒ ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ „๊ตญ ์‹œ,๋„๋ณ„ ํŽธ์˜์  ์ ์œ ์œจ์„ ๊ตฌํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

df['์‹œ'].value_counts(normalize=True) # ์‹œ, ๋„๋ณ„ ํŽธ์˜์  ๋น„์œจ

 

17๊ฐœ์˜ ์‹œ, ๋„ ๋ณ„ ๋น„์œจ๊ฐ’์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. 0๊ณผ 1 ์‚ฌ์ด์˜ ๋น„์œจ๋กœ ๋‚˜์˜ค๋ฉฐ, ์ „์ฒด ๋น„์œจ์˜ ํ•ฉ์€ 1์ด๋‹ค.

 

 

 

 

ํ†ต๊ณ„์ฒญ ์ธ๊ตฌ์ˆ˜(์—ฐ๋ น๋ณ„ ์ธ๊ตฌํ˜„ํ™ฉ) ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ

 

์ธ๊ตฌ ์ˆ˜ ๋Œ€๋น„ ํŽธ์˜์  ์ ์œ ์œจ์„ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด, ํ†ต๊ณ„์ฒญ ์ธ๊ตฌ์ˆ˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ค๋„๋ก ํ•˜์ž.

 

 

https://jumin.mois.go.kr/ageStatMonth.do

 

ํ•ด๋‹น ํŽ˜์ด์ง€์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. ๊ธฐ๊ฐ„์„ ์„ ํƒํ•˜๊ณ , ๊ฒ€์ƒ‰ ๋ฒ„ํŠผ์„ ๋ˆ„๋ฅด๋ฉด payload ๊ฐ’์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

2023๋…„ 2์›” ๊ธฐ์ค€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ค๋„๋ก ํ•˜์ž.

 

import requests

stat_url = "https://jumin.mois.go.kr/ageStatMonth.do"

pay = {"tableChart": "T",
        "sltOrgType": "1",
        "sltOrgLvl1": "A",
        "sltOrgLvl2": "A",
        "sltUndefType": "",
        "nowYear": "2023",
        "searchYearMonth": "year",
        "searchYearStart": "2023",
        "searchMonthStart": "02",
        "searchYearEnd": "2023",
        "searchMonthEnd": "02",
        "sum": "sum",
        "gender": "gender",
        "sltOrderType": "1",
        "sltOrderValue": "ASC",
        "sltArgTypes": "10",
        "sltArgTypeA": "0",
        "sltArgTypeB": "100",}
        
r = requests.post(stat_url, data=pay)

korea = pd.read_html(r.text)[2]

 

post ๋ฐฉ์‹์œผ๋กœ ์ ‘๊ทผํ•˜๊ณ , ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ html ์˜ table ๊ตฌ์กฐ ์•ˆ์— ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, read_html ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ธ๊ตฌ์ˆ˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค.

 

 

 

 

์ธ๊ตฌ์ˆ˜ ๋Œ€๋น„ ํŽธ์˜์  ์ ์œ ์œจ ๊ตฌํ•˜๊ธฐ

 

์ธ๊ตฌ์ˆ˜ ๋ฐ์ดํ„ฐ์™€ ํŽธ์˜์  ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋ฅผ ๋งž์ถฐ์ฃผ๊ณ  ๋‘ DataFrame ์„ ํ•ฉ์น˜๋„๋ก ํ•˜์ž.

# ์ธ๊ตฌ์ˆ˜ ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ(ํ–‰์ •๊ธฐ๊ด€ ๋ฐ ํ•ด๋‹น๊ธฐ๊ฐ„ ์ธ๊ตฌ์ˆ˜)
์ธ๊ตฌ = korea.iloc[1: [0, 1]]

# ์‹œ,๋„๋ณ„ ํŽธ์˜์  ์ด ๊ฐœ์ˆ˜ ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ
ํŽธ์˜์  = pd.DataFrame(df['์‹œ'].value_counts())


# ํ–‰์ •๊ธฐ๊ด€ clolumn ์„ index ๋กœ ์ง€์ •ํ•ด์ฃผ๊ธฐ
์ธ๊ตฌ.set_index("ํ–‰์ •๊ธฐ๊ด€", inplace=True)

# index ๋กœ ์ž˜ ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ
์ธ๊ตฌ.head()

 

์ธ๊ตฌ์ˆ˜ ๋ฐ์ดํ„ฐ์™€ ํŽธ์˜์  ๋ฐ์ดํ„ฐ์˜ ๊ตฌ์กฐ๋ฅผ ๋งž์ท„์œผ๋‹ˆ, ์ธ๊ตฌ๋Œ€๋น„ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

 

# ์ธ๊ตฌ df ๊ธฐ์ค€์œผ๋กœ, ํŽธ์˜์  df ๋ฅผ ํ•ฉ์นœ๋‹ค.
์ธ๊ตฌ๋Œ€๋น„ = pd.merge(์ธ๊ตฌ, ํŽธ์˜์ , left_index=True, right_index=True, how='left')

# ์ธ๊ตฌ๋Œ€๋น„ columns ๋ช…์„ ์„ค์ •ํ•ด์ค€๋‹ค.
์ธ๊ตฌ๋Œ€๋น„.columns = ['์ธ๊ตฌ์ˆ˜', 'ํŽธ์˜์ ์ˆ˜']

# ์ธ๊ตฌ๋Œ€๋น„ ํŽธ์˜์  ์ ์œ ์œจ columns ์„ ์ถ”๊ฐ€ํ•ด์ค€๋‹ค.
์ธ๊ตฌ๋Œ€๋น„['์ ์œ ์œจ'] = ์ธ๊ตฌ๋Œ€๋น„['์ธ๊ตฌ์ˆ˜'] / ์ธ๊ตฌ๋Œ€๋น„['ํŽธ์˜์ ์ˆ˜']

 

์ธ๊ตฌ df ์™€ ํŽธ์˜์  df ๋ฅผ merge() ๋ฉ”์„œ๋“œ๋ฅผ ํ†ตํ•ด df ๋ฅผ ํ•˜๋‚˜๋กœ ํ†ตํ•ฉํ•ด์ฃผ๊ณ , ์ธ๊ตฌ๋Œ€๋น„ ํŽธ์˜์  ์ ์œ ์œจ column ์„ ์ถ”๊ฐ€ํ•ด์ฃผ์—ˆ๋‹ค.

 

์ด์ œ, ์ธ๊ตฌ๋Œ€๋น„ ์ ์œ ์œจ์ด ๊ฐ€์žฅ ๋†’์€ ์‹œ,๋„๋ฅผ ์•Œ์•„๋ณด์ž.

 

์ธ๊ตฌ๋Œ€๋น„.sort_values(by=['์ ์œ ์œจ'], ascending=False)

 

ํ˜„์žฌ(2023๋…„ 2์›” ๋ง)๊ธฐ์ค€์œผ๋กœ, ์„ธ์ข…ํŠน๋ณ„์ž์น˜์‹œ๊ฐ€ ์ธ๊ตฌ๋Œ€๋น„ ํŽธ์˜์  ์ˆ˜๊ฐ€ ๋งŽ์€ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค.

 

 

 

์„œ์šธ์‹œ ๊ตฌ๋ณ„ ๊ฐ€์žฅ ๋งŽ์ด ์กด์žฌํ•˜๋Š” ํŽธ์˜์  ๋ธŒ๋žœ๋“œ ์ฐพ๊ธฐ

 

์ด๋ฒˆ์—๋Š” ์„œ์šธํŠน๋ณ„์‹œ์—์„œ ๊ตฌ๋งˆ๋‹ค ๊ฐ€์žฅ ๋งŽ์ด ์กด์žฌํ•˜๋Š” ํŽธ์˜์  ๋ธŒ๋žœ๋“œ๋ฅผ ์ฐพ์•„๋ณด๋„๋ก ํ•˜์ž.

 

๋จผ์ €, ์„œ์šธํŠน๋ณ„์‹œ์— ํ•ด๋‹นํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ค๊ณ , '๊ตฌ' ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค์–ด์•ผ ํ•œ๋‹ค.

 

# ์„œ์šธํŠน๋ณ„์‹œ์— ํ•ด๋‹นํ•˜๋Š” ํŽธ์˜์  ๋ฐ์ดํ„ฐ๋งŒ ๊ฐ€์ ธ์˜ค๊ณ  ์ €์žฅํ•œ๋‹ค. *copy() ํ•˜์ง€ ์•Š์œผ๋ฉด ๊ฒฝ๊ณ ๋ฐœ์ƒ
seoul = df.query("์‹œ" == "์„œ์šธํŠน๋ณ„์‹œ").copy()

# ์„œ์šธ์‹œ '๊ตฌ' ๋ฐ์ดํ„ฐ ๋งŒ๋“ค๊ธฐ
seoul['๊ตฌ'] = seoul['address'].apply(lambda x : x.split()[1])

# ํ™•์ธ์šฉ
seoul['๊ตฌ'].unique()

 

๊ตฌ column ๋„ ์ถ”๊ฐ€ํ•ด์ฃผ์—ˆ์œผ๋‹ˆ, ์ด์ œ ๊ตฌ ๋ณ„ ๊ฐ€์žฅ ๋งŽ์ด ์กด์žฌํ•˜๋Š” ํŽธ์˜์  ๋ธŒ๋žœ๋“œ๋ฅผ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

 

 

# ๊ตฌ, brand ๋ณ„ ํŽธ์˜์  ๊ฐœ์ˆ˜ : ๋ฐฉ๋ฒ• 1
seoul.gropuby(['๊ตฌ', 'brand'])[['๊ตฌ']].count()

# ๊ตฌ, brand ๋ณ„ ํŽธ์˜์  ๊ฐœ์ˆ˜ : ๋ฐฉ๋ฒ• 2
seoul_2 = seoul.groupby(['๊ตฌ', 'brand'], as_index=False)[['shopName']].count()

# ๊ฐ€์žฅ ๋งŽ์€ ํŽธ์˜์  ์ˆ˜๋กœ, ๊ตฌ์™€ brand ๋ช…, ๊ฐ€๊ฒŒ ์ˆ˜
seoul_2.sort_values(by=['shopName'], ascending=False)

# ๊ตฌ๋ณ„ ๊ฐœ์ˆ˜ ๊ฐ€์žฅ ๋งŽ์€ ํŽธ์˜์  - ์ฒซ๋ฒˆ์งธ ํ–‰๋งŒ ๊ฐ€์ ธ์˜ด
seoul_2.sort_values(by=['shopName'], ascending=False).groupby(['๊ตฌ']).first()

# ๊ตฌ์—์„œ ๊ฐ€์žฅ ๋งŽ์€ ํŽธ์˜์  - ์ฒซ๋ฒˆ์งธ ํ–‰๋งŒ ๊ฐ€์ ธ์˜ด
seoul_2.sort_values(by=['shopName'], ascending=False).groupby(['๊ตฌ']).nth(1)

# ๊ตฌ๋ณ„ ๊ฐ€์žฅ ์ ์€ ํŽธ์˜์  brand ์™€ ํ•ด๋‹น brand ๋ช… - ๋งˆ์ง€๋ง‰ ํ–‰๋งŒ ๊ฐ€์ ธ์˜ด
seoul_2.sort_values(by=['shopName'], ascending=False).groupby(['๊ตฌ']).nth(-1)

 

์ •๋ ฌ ๋ฐฉ๋ฒ•๊ณผ ์‚ฌ์šฉํ•˜๋Š” ๋ฉ”์„œ๋“œ์— ๋”ฐ๋ผ, ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ ๊ฒฐ๊ณผ๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

'Python > [๊ธฐ์ดˆ ๊ฐ•์˜ ์ •๋ฆฌ]' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

python ๊ธฐ์ดˆ 12  (3) 2023.03.06
python ๊ธฐ์ดˆ 11  (0) 2023.03.03
python ๊ธฐ์ดˆ 9  (0) 2023.03.01
python ๊ธฐ์ดˆ 8  (0) 2023.02.26
python ๊ธฐ์ดˆ 7  (0) 2023.02.26