|
|
Line 29: |
Line 29: |
| <br/><br/> | | <br/><br/> |
| </p> | | </p> |
− | </div>
| |
| | | |
− | <div style="margin:20px; padding: 10px; background: #ffffff; font-family: Trebuchet MS, sans-serif; font-size: 95%;-webkit-border-radius: 15px;-webkit-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96); -moz-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);"> | + | ==<div style="background: #800000; line-height: 0.5em; font-family:'Helvetica'; border-left: #FFB6C1 solid 15px;"><div style="border-left: #F2F1EF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF">FEATURE ENGINEERING</font></div></div>== |
− | <font size =3 face=Georgia > | |
− | <b><span style="color:#FF91A4">Data Preparation and Exploration</span></b>
| |
− | </font> | |
− | <p>[[File:JFI draftdrawing.png|400px|center]] ''Data sources'' — We have two key data sets from the sponsor: A Microsoft Access file containing information on the geological properties of boreholes, as well as their locations; and a set of engineering drawings comprising 5 stations of Downtown Line 3, including reference tables to boreholes and the corresponding engineering design features. <br/><br/> | |
| | | |
− | [[File:JFI pivot.png|400px|center]] ''Pivot table'' — The first thing we needed to do was to pivot the borehole data such that stratum data is grouped by borehole, instead of stratum. Each borehole had different depths and number of strata, presumably dug and measured based on the surveyor’s insights on what is necessary for the project and possible on the site. <br/><br/>
| |
| | | |
− | [[File:JFI strata_distr.png|300px|center]] What’s immediately observable is that there are a few outlier (n=8) boreholes with unusually many strata identified, especially one with 64 strata. This drastically increases the number of features after pivoting, especially after including not only the baseline of each strata, but also the soil type. The feature space of the pivoted table is also consequentially considerably more sparse than the original ungrouped table. When attempting to build a predictive model in the future, the sparse matrix must be taken into consideration when selecting both the model and the implementation of the model.<br/><br/>
| + | ==<div style="background: #800000; line-height: 0.5em; font-family:'Helvetica'; border-left: #FFB6C1 solid 15px;"><div style="border-left: #F2F1EF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF">MODEL SELECTION</font></div></div>== |
− | <br/><br/> | |
| | | |
− | ''CRS conversion'' — The other transformation that needed to be performed was to convert the borehole coordinates from the SVY21 projection to the WGS84 projection for easier visualization, and standardization of the coordinate reference system.
| + | ==<div style="background: #800000; line-height: 0.5em; font-family:'Helvetica'; border-left: #FFB6C1 solid 15px;"><div style="border-left: #F2F1EF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF">MODEL INTERPRETATION</font></div></div>== |
− | <br/><br/>
| |
− | | |
− | [[File:JFI boreholemap.png|300px|center]] This transformation was performed with the aid of SLA’s OneMap Coordinate Convertor web API, and overlaid onto a base map of Singapore for reference. The first noticeable feature is that some boreholes are referenced to be in unusual locations far from the main body of boreholes close to the Downtown Line tunnels and stations. Having confirmed that the coordinate systems used to project the data set are correct, we must raise the concern of whether other boreholes that we wish to use for analysis have b een correctly labeled, and how to verify this.
| |
− | <br/><br/>
| |
− | | |
− | [[File:JFI station_boreholes.png|400px|center]] Secondly, comparing the full dataset of boreholes to the boreholes referenced in the design draft drawings, it’s clear that only a subset (marked in blue in the example figure below) are referenced when the engineers perform their design, or at least it seems so based on the design documents. This compounds the importance of clarifying the significance of borehole data with the engineers performing design, i.e. how they are selected, how they influence design.
| |
− | <br/><br/>
| |
− | </p>
| |
− | </div>
| |
− | | |
− | | |
− | <div style="margin:20px; padding: 10px; background: #ffffff; font-family: Trebuchet MS, sans-serif; font-size: 95%;-webkit-border-radius: 15px;-webkit-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96); -moz-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);"> | |
− | <font size =3 face=Georgia > | |
− | <b><span style="color:#FF91A4">Data Engineering & Subsequent Steps</span></b>
| |
− | </font>
| |
− | <p>[[File:JFI borehole_depth.png|400px|center]] ''Goals'' — Given the unfeasibility of the previous objective of predicting design from borehole data, we must now pivot to obtaining a better understanding of the borehole and design data independently, to lay a foundation for prediction in the future, with supplementary data sets and better understanding of the engineering design process and relationships between features.<br/><br/>
| |
− | | |
− | [[File:JFI borehole_eg.png|400px|center]] ''Boreholes'' — Intuitively, there is a temptation to analyze the distribution of the depth, thickness, and soil type of each stratum belonging to each borehole, but such a representation is unlikely to shed meaningful light on the properties of the geological conditions toward the goal of engineering underground infrastructure. This is because the boreholes have a geospatial relationship to one another, and a general population analysis would ignore this crucial aspect. Instead, moving forward, we will seek to visualize the boreholes relative to one another, and possibly seek a way to identify which boreholes are significant toward engineering designed, based on the boreholes that were selected in the design drawings out of the general population.<br/><br/>
| |
− | | |
− | [[File:JFI borehole_loc.png|400px|center]] ''Design Documents'' — The design documents specify various structural parameters of the walls, such as their thickness, material, load capacity etc. But there is additional information available in the drawings that is not captured in the tables: a) more detailed information about the orientation and distance of walls to the associated boreholes, and b) the geometrical network of walls in relation to one another, i.e. adjacent walls and their orientation. If the drawings could be engineered into a data matrix describing their dimensions and orientation, it would become easier to perform analysis on them in relation to other data.
| |
− | <br/><br/>
| |
− | </p>
| |
− | </div>
| |
− | | |
− | | |
− | <div style="margin:20px; padding: 10px; background: #ffffff; font-family: Trebuchet MS, sans-serif; font-size: 95%;-webkit-border-radius: 15px;-webkit-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96); -moz-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);">
| |
− | <font size =3 face=Georgia > | |
− | <b><span style="color:#FF91A4">Insights and Next Milestones</span></b>
| |
− | </font> | |
− | <p> ''New Deliverables'' — Instead of trying to build a predictive model, from here out we will instead be building out understanding of the borehole and station data independently, engineering features that can help engineers more clearly understand the factors at play, and seeking a way to visualize the data, especially the three-dimensional geological borehole data, in a clear and concise manner.<br/><br/> | |
− | | |
− | ''Laying the foundation for prediction'' — The feature engineering and visualization of both the independent variable(geology) and dependent variables from past design (station design drawings and tables) should be undertaken in an open-ended framework that will permit the inclusion of additional data, e.g. engineers’ logbooks, data on other factors that determine location, depth, orientation etc. of station boxes and tunnels. In this way, they can serve as the first phase toward a future where procedurally generated infrastructure engineering design may be possible, and engineers can spend their time more productively designing the rules by which design happens, instead of calculating the parameters individually.
| |
− | <br/><br/>
| |
− | </p>
| |
− | </div> | |