Hi, i'm using PDI 18.104.22.168-12 as an ETL tool to my datawarehouse. One dataset contains the address and it has been entered manually, so i have an unstructered string variable with all possible entries.
exemple : 12 , rue ibn rochd, avenue moulay smail, Casablanca
BOULEVARD MASSIRA, OULFA
barnoussi, bloc 12 imm 5 app 6.
12 rue ibn koutaiba, casa.
(note : addresses are in casablanca, morocco)
Is there any way to extract the street and the district from an unstructued address perhaps using NLP?
I was thinking of creating a table (from an open source dataset) with all the street and district names of casablanca (including short forms) and then if any word in the the dataset matches with a street/district or its short form it fills this into a new column in the target table.