![]() Thus, in the above example, the “Role ID” attribute in the Person table is a foreign key, pointing to the primary key of the Role table. Every relation (table) in the database must have exactly one primary key.Ī foreign key is an attribute that “points” to the primary key of another table. It is common to have a single column (like we do here as the “ID” column) serve as the primary key, but that is not required the primary key can consist of multiple columns so long as they are unique in every row. A primary key is a unique identifier for each row in the table. In the above examples, “ID” attribute serves as what is called a primary key. This brings us to the concepts of keys and constraints. 4, that doesn’t have a corresponding entry in the Role table. Of course, we haven’t really gained that much benfit here, since it’s still possible for the “Role ID” attribute to contain some number, e.g. A better alternative is to create a separate “Role” table, that lists the allowable roles for the course: What is the data type of this attribute? A string? If so, we could enter anything we want here, which would not be ideal for maintaining a large collection of data, say if each student has to remember to type their role exactly as “Student” (with capitalization intact, of course). ![]() For example, considering the case above, the “Role” attribute in our Person table is not particularly well defined. Where relational data becomes interesting is when we have multiple tables and explicit relationships between them. Of course, a single table isn’t particular interesting as a rich data source we could have done (and in fact do) store the exact same information in a CSV file. Columns in the relation are called attributes, and specify some feature contained in each of the tuples. This uniqueness requirement can be trivially satisfied by having a unique ID identifier in each tuple, and we’ll discuss this more below. In our context, rows are called tuples or records, and they represent a single instance of this relation as mentioned above, the entirety of the row must also be unique (i.e., there cannot be two rows that have identical entries in all columns). This kind of data representation is so familiar to us that it hardly necessitates any explanation, but a few points of terminology in relational database speak are useful. To make this concerete, let’s consider a “Person” table for people involved with this class. The basic unit in any relational data is the notion of a “relation”, but since no one uses that term anymore, just note that this actually is interchangeable with the common definition we have of “table” (though a relation has the additional constraints the rows in the table cannot be duplicates). Creating and managing such a database, let alone actually coding one, are not topics we’ll consider here. ![]() While obviously databases are a topic that can’t be done any kind of justice in one lecture, these notes will focus on some of the basic ideas of relational databases, and ideally will give you some hints about how to efficiently get data out of a relational database. And although schemaless/NoSQL/non-relational databases are popular for some applications, in a large number of cases you’re going to be dealing with data in a standard relational database (and this format honestly makes sense for the vast majority of use cases that require a database). Chances are, as a data scientist, if you’re going to be processing data stored in any more structured form than simple flat files, it’s going to be in some kind of database. These notes provide a very brief introduction to the relational data and databases. ⭳ Download Jupyter Notebook Relational Data
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |