1 Historical Conflict (30 points)

In this first part, you will clean and prepare the Historical Conflict Event Dataset (HCED), available for download here.

Download and cite the data source (2 points)

  • Download the data (HCED Data v2.csv) and save it in your project folder
  • Find the accompanying paper published in 2022 in the Journal of Conflict Resolution
  • Add references for both the paper and the data to your bibliography and cite them in your document.
  • Briefly describe the dataset - what does it cover, what information is included, what are the original sources?

Clean the HCED dataset (10 points)

Inspect the raw dataset

  • Load the dataset and summarize its structure, including the number of rows and columns.

  • Use the janitor package to clean variable names.

  • Identify missing or inconsistent values and describe any issues you discover.

  • Clean the latitude and longitude variables (2 points): Remove any special characters or blank spaces, convert the cleaned variables to numeric format.

  • Standardize the theatre variable (3 points): Remove extra spaces, correct capitalization, and fix inconsistent labels (e.g., merge “Sea and air” with “Air and sea”).

  • Fix the year variable: Extract the starting year from the year column (which may contain intervals, e.g., “1940-1945”). Make sure the year_start variable is numeric and codes negative years for BCE dates. HINT: use str_extract() with the appropriate regex syntax (ask LLM for support). Save the cleaned data for the next chapter.

Summarize the HCED Dataset (10 points)

  • Group battles by theatre: Count the number of unique battles in each theatre and calculate their share of total battles, present the results in a tidy table.
  • Temporal summary: Provide the range of years covered by the dataset. Calculate and report the number of battles recorded before and after the year 0 CE.
  • Battle characteristics by theatre: Compare the distribution of the Lehmann-Zhukov scale across different theatres of war. Highlight at least one meaningful insight from the plot (e.g., which theatre had the largest or smallest battles on average).
  • Battles and massacres over time: Create a line chart showing two time series, the total number of battles and the number of battles followed by a massacre by decade. Briefly comment on the observed trends. What does the change in the massacre variable over time suggest? Relate this to potential changes in the quality or coverage of historical data.

Additional visualisation (5 points)

  • Pick one (or more) of the other variables included in the HCED and try to produce some insightful visualization.

Inspect battles (3 points)

  • Pick one battle and try to validate the recorded information in the original sources or online. Briefly describe what you find and comment on any uncertainties regarding the time and location, as well as other details.