Untitled design (1)

Data Science in Banking & Finance

The banking and finance sector offers one of the most powerful applications of Responsible Data Science. Through partnerships with organizations like First National Bank (FNB), PNC, and BNY Mellon, RDS@Pitt explores how data can drive transparency, risk management, and customer trust in highly regulated environments. Students engage with real-world financial datasets and scenarios that emphasize ethical modeling, data governance, and accountability—preparing them to lead responsibly in data-driven financial institutions.

Context and Datasets

  • News.Bank.001 Transaction Data for Banking Operations

    • Data Science in Banking & Finance

    The banking industry faces a growing threat from AI-enhanced fraud schemes that bypass traditional detection systems. TrustNet Bank, a mid-sized financial institution, serves both personal and small business custom

  • News.Bank.002 Fraud Detection Dataset

    • Data Science in Banking & Finance

    The rise of sophisticated deepfake technology has increased risks of identity fraud in the financial services industry. A leading financial institution is working to strengthen its fraud detection capabilities to safeguard customer assets and maintain trust.

  • News.Bank.003 Assets and Liabilities of Banks US

    • Data Science in Banking & Finance

    The banking industry faces rapid transformation driven by economic shifts, regulatory changes, and technological innovation

  • FNB.Ban.001 Secure Customer Support

    • Data Science in Banking & Finance

    The banking industry is undergoing a significant transformation driven by large investments in artificial intelligence and advanced technologies aimed at improving customer experience and operational efficiency.

  • News.Bank.004 Equifax

    • Data Science in Banking & Finance

    A mysterious “coding issue” in Equifax’s AI-driven credit scoring system led to consumers receiving inaccurate credit scores in early 2022, illustrating risks within the financial services and data infrastructure industry.

  • Banking.PGH.002 Mortgage Foreclosures in Allegheny County

    • Data Science in Banking & Finance

    The available data table provides the dates of mortgage foreclosure filings in Allegheny County, the parcel identifiers (which allow linking to many other records, including assessed property value, sales history, and the nature of the property (e.g., home vs. apartment building vs. business location)), the amount that the lender is suing the property owner for, and the name of the lender. This data provides a history of foreclosure attempts from January of 2009 to the present day. ((Narratives to avoid: “Determine risk of mortgage based on various aggregate property attributes.” That way lies discriminatory redlining!))

  • Banking.PGH.001 Bank Locations and Deposits over Time

    • Data Science in Banking & Finance

    Every year, the FDIC publishes an inventory of all U.S. banks and branches. At the institution level (aggregating over all branches), the data includes whether the bank is FDIC insured, the total assets of the bank, the amount of deposits in checking accounts (demand accounts), and the amount of deposits in time (e.g., CDs) and savings accounts. At the branch level, this data includes the address and geocoordinates of the branch, the date that the branch was established, and the the total deposits at that branch. ((This data could support historical analyses, going back to 1994 of the number, locations, and holdings of bank branch offices. As routine bank operations such as withdrawing cash or depositing checks can be increasingly done without going to branches, the demand for branch offices has decreased and the total number of branches in Allegheny County has fallen by 25% over the last 30 years. One option here is to use Census data to predict population shifts over the next 10 years and then, taking the role of one of the banks with the most branches, decide how to redistribute branch offices to best serve Allegheny County’s population.))