Govt. of India Census, 2001 District-Wise

Datasets Description

One billion hearts, a single CSV
# Context Census of India is a rich database which can tell stories of over a billion Indians. It is important not only for research point of view, but commercially as well for the organizations that want to understand India's complex yet strongly knitted heterogeneity. However, nowhere on the web, there exists a single database that combines the district- wise information of all the variables (most include no more than 4-5 out of over 50 variables!). Extracting and using data from Census of India 2001 is quite a laborious task since all data is made available in scattered PDFs district wise. Individual PDFs can be extracted from # Content This database has been extracted from Census of 2001 and includes data of 590 districts, having around 80 variables each. In case of confusion regarding the context of the variable, refer to the following PDF and you will be able to make sense out of it: All the extraction work can be found @ The final CSV can be found at finalCSV/all.csv The subtle hack that was used to automate extraction to a great extent was the the URLs of all the PDFs were same except the four digits (that were respective state and district codes). A few abbreviations used for states: AN- Andaman and Nicobar CG- Chhattisgarh D_D- Daman and Diu D_N_H- Dadra and Nagar Haveli JK- Jammu and Kashmir MP- Madhya Pradesh TN- Tamil Nadu UP- Uttar Pradesh WB- West Bengal A few variables for clarification: Growth..1991...2001- population growth from 1991 to 2001 X0..4 years- People in age group 0 to 4 years SC1- Scheduled Class with highest population # Acknowledgements # Inspiration This is a massive dataset which can be used to explain the interplay between education, caste, development, gender and much more. It really can explain a lot about India and propel data driven research. Happy Number Crunching!